import re
s='abcd2343 abw34324 abc3243-23A'
re.split['[\d+]',s]
> ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A']
Or, if you want to split on the first occurrence of a digit:
re.findall['\d*\D+',s]
> ['abcd', '2343 abw', '34324 abc', '3243-', '23A']
\d+
matches 1-or-more digits.\d*\D+
matches 0-or-more digits followed by 1-or-more non-digits.\d+|\D+
matches 1-or-more digits or 1-or-more non-digits.
Consult the docs for more about Python's regex syntax.
re.split[pat, s]
will split the string s
using pat
as the delimiter. If pat
begins and ends with parentheses [so as to be a "capturing group"], then re.split
will return the substrings matched by pat
as well. For instance, compare:
re.split['\d+', s]
> ['abcd', ' abw', ' abc', '-', 'A'] # ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A'] # ['2343', '34324', '3243', '23']
Thus, if s
ends with a digit, you could avoid ending with an empty string by using re.findall['\d+|\D+', s]
instead of re.split['[\d+]', s]
:
s='abcd2343 abw34324 abc3243-23A 123'
re.split['[\d+]', s]
> ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A ', '123', '']
re.findall['\d+|\D+', s]
> ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A ', '123']
Split a string into text and number in Python #
Use the re.split[]
method to split a string into text and number, e.g. my_list = re.split[r'[\d+]', my_str]
. The re.split[]
method will split the string on the digits and will still include them in the list.
Copied!
import re my_str = 'hello123' my_list = re.split[r'[\d+]', my_str] # 👇️ ['hello', '123', ''] print[my_list]
Notice that we got an empty string at the end because the last character in the string is a digit.
You can use the filter[]
method to remove any empty strings from the list.
Copied!
import re my_str = 'hello123' my_list = list[filter[None, re.split[r'[\d+]', my_str]]] # 👇️ ['hello', '123'] print[my_list]
The filter function takes a function and an iterable as arguments and constructs an iterator from the elements of the iterable for which the function returns a truthy value.
If you pass None
for
the function argument, all falsy elements of the iterable are removed.
The re.split method takes a pattern and a string and splits the string on each occurrence of the pattern.
The parentheses in the regular expression match whatever is inside and indicate the start and end of a group.
The group's contents can still be retrieved after the match.
Even though we split the string on one or more digits, we still include the digits in the result.
The \d
character matches the digits [0-9]
[and many other digit characters].
The +
matches the preceding regular expression 1 or more times.
In other words, we match one or more digits using a group and still include them in the list of strings.
This approach also works if your string starts with digits and ends in characters.
Copied!
import re my_str = '123hello' my_list = list[filter[None, re.split[r'[\d+]', my_str]]] # 👇️ ['123', 'hello'] print[my_list]
If we didn't use the filter[]
function, we'd have an empty string element at the start of the list.
Note that the filter
function returns a filter object [not a list]. If you need to convert the filter
object to a list, pass it to the list[]
class.