Timing with random strings of ASCII printables:
from inspect import getsource
from random import sample
import re
from string import printable
from timeit import timeit
pattern_single = re.compile[r'[\W]']
pattern_repeat = re.compile[r'[\W]+']
translation_tb = str.maketrans['', '', ''.join[c for c in map[chr, range[256]] if not c.isalnum[]]]
def generate_test_string[length]:
return ''.join[sample[printable, length]]
def main[]:
for i in range[0, 60, 10]:
for test in [
lambda: ''.join[c for c in generate_test_string[i] if c.isalnum[]],
lambda: ''.join[filter[str.isalnum, generate_test_string[i]]],
lambda: re.sub[r'[\W]', '', generate_test_string[i]],
lambda: re.sub[r'[\W]+', '', generate_test_string[i]],
lambda: pattern_single.sub['', generate_test_string[i]],
lambda: pattern_repeat.sub['', generate_test_string[i]],
lambda: generate_test_string[i].translate[translation_tb],
]:
print[timeit[test], i, getsource[test].lstrip[' lambda: '].rstrip[',\n'], sep='\t']
if __name__ == '__main__':
main[]
Result [Python 3.7]:
Time Length Code
6.3716264850008880 00 ''.join[c for c in generate_test_string[i] if c.isalnum[]]
5.7285426190064750 00 ''.join[filter[str.isalnum, generate_test_string[i]]]
8.1875841680011940 00 re.sub[r'[\W]', '', generate_test_string[i]]
8.0002205439959650 00 re.sub[r'[\W]+', '', generate_test_string[i]]
5.5290945199958510 00 pattern_single.sub['', generate_test_string[i]]
5.4417179649972240 00 pattern_repeat.sub['', generate_test_string[i]]
4.6772285089973590 00 generate_test_string[i].translate[translation_tb]
23.574712151996210 10 ''.join[c for c in generate_test_string[i] if c.isalnum[]]
22.829975890002970 10 ''.join[filter[str.isalnum, generate_test_string[i]]]
27.210196289997840 10 re.sub[r'[\W]', '', generate_test_string[i]]
27.203713296003116 10 re.sub[r'[\W]+', '', generate_test_string[i]]
24.008979928999906 10 pattern_single.sub['', generate_test_string[i]]
23.945240008994006 10 pattern_repeat.sub['', generate_test_string[i]]
21.830899796994345 10 generate_test_string[i].translate[translation_tb]
38.731336012999236 20 ''.join[c for c in generate_test_string[i] if c.isalnum[]]
37.942474347000825 20 ''.join[filter[str.isalnum, generate_test_string[i]]]
42.169366310001350 20 re.sub[r'[\W]', '', generate_test_string[i]]
41.933375883003464 20 re.sub[r'[\W]+', '', generate_test_string[i]]
38.899814646996674 20 pattern_single.sub['', generate_test_string[i]]
38.636144253003295 20 pattern_repeat.sub['', generate_test_string[i]]
36.201238164998360 20 generate_test_string[i].translate[translation_tb]
49.377356811004574 30 ''.join[c for c in generate_test_string[i] if c.isalnum[]]
48.408927293996385 30 ''.join[filter[str.isalnum, generate_test_string[i]]]
53.901889764994850 30 re.sub[r'[\W]', '', generate_test_string[i]]
52.130339455994545 30 re.sub[r'[\W]+', '', generate_test_string[i]]
50.061149017004940 30 pattern_single.sub['', generate_test_string[i]]
49.366573111998150 30 pattern_repeat.sub['', generate_test_string[i]]
46.649754120997386 30 generate_test_string[i].translate[translation_tb]
63.107938601999194 40 ''.join[c for c in generate_test_string[i] if c.isalnum[]]
65.116287978999030 40 ''.join[filter[str.isalnum, generate_test_string[i]]]
71.477421126997800 40 re.sub[r'[\W]', '', generate_test_string[i]]
66.027950693998720 40 re.sub[r'[\W]+', '', generate_test_string[i]]
63.315361931003280 40 pattern_single.sub['', generate_test_string[i]]
62.342320287003530 40 pattern_repeat.sub['', generate_test_string[i]]
58.249303059004890 40 generate_test_string[i].translate[translation_tb]
73.810345625002810 50 ''.join[c for c in generate_test_string[i] if c.isalnum[]]
72.593953348005020 50 ''.join[filter[str.isalnum, generate_test_string[i]]]
76.048324580995540 50 re.sub[r'[\W]', '', generate_test_string[i]]
75.106637657001560 50 re.sub[r'[\W]+', '', generate_test_string[i]]
74.681338128997600 50 pattern_single.sub['', generate_test_string[i]]
72.430461594005460 50 pattern_repeat.sub['', generate_test_string[i]]
69.394243567003290 50 generate_test_string[i].translate[translation_tb]
str.maketrans
& str.translate
is fastest, but includes all non-ASCII characters. re.compile
& pattern.sub
is slower, but is somehow faster than ''.join
& filter
.
Created: May-28, 2021
- Use the
isalnum[]
Method to Remove All Non-Alphanumeric Characters in Python String - Use the
filter[]
Function to Remove All Non-Alphanumeric Characters in Python String - Use Regular Expressions to Remove All Non-Alphanumeric Characters in Python String
Alphanumeric characters contain the blend of the 26 characters of the letter set and the numbers 0 to 9. Non-alphanumeric characters include characters that are not letters or digits, like +
and @
.
In this tutorial, we will discuss how to remove non-alphanumeric characters from a string in Python.
Use the isalnum[]
Method to Remove All Non-Alphanumeric Characters in Python String
We can use the isalnum[]
method to check whether a given character or string is alphanumeric or not. We can compare each character individually from a string, and if it is alphanumeric, then we combine it using the join[]
function.
For example,
string_value = "alphanumeric@123__"
s = ''.join[ch for ch in string_value if ch.isalnum[]]
print[s]
Output:
alphanumeric123
Use the filter[]
Function to Remove All Non-Alphanumeric Characters in Python String
The filter[]
function is used to construct an iterator from components of the iterable object and filters the object’s elements using a function.
For our problem, the string is our object, and we will use the isalnum[]
function, which checks whether a given string contains alphanumeric characters or not by
checking each character. The join[]
function combines all the characters to return a string.
For example,
string_value = "alphanumeric@123__"
s = ''.join[filter[str.isalnum, string_value]]
print[s]
Output:
alphanumeric123
This method does not work with Python 3.
Use Regular Expressions to Remove All Non-Alphanumeric Characters in Python String
A regular expression is an exceptional grouping of characters that helps you match different strings or sets of strings, utilizing a specific syntax in a pattern. To use regular expressions, we import the re module.
We can use the sub[]
function from this module to replace all the string that matches a non-alphanumeric character by an empty character.
For example,
import re
string_value = "alphanumeric@123__"
s=re.sub[r'[\W_]+', '', string_value]
print[s]
Output:
alphanumeric123
Alternatively, we can also use the following pattern.
import re
string_value = "alphanumeric@123__"
s = re.sub[r'[^a-zA-Z0-9]', '', string_value]
print[s]
Output:
alphanumeric123