Python split string by special character

Split a string on all special characters in Python #

Use the re.split() method to split a string on all special characters. The re.split() method takes a pattern and a string and splits the string on each occurrence of the pattern.

Copied!

import re my_str = "hellothree.four!five'six" my_list = re.split(r'[`!@#$%^&*()_+\-=\[\]{};\':"\\|,.<>\/?~]', my_str) # 👇️ ['hello', 'one', 'two', 'three', 'four', 'five', 'six'] print(my_list)

We used the re.split method to split a string on all occurrences of a special character.

The square brackets are used to indicate a set of characters.

Make sure that all characters you consider special characters are in the set.

You can add or remove characters according to your use case.

Alternatively, you can use a regular expression that matches any character that is not a letter, a digit or a space.

Copied!

import re my_str = "hellothree.four!five'six" my_list = re.split(r'[^a-zA-Z0-9\s]', my_str) # 👇️ ['hello', 'one', 'two', 'three', 'four', 'five', 'six'] print(my_list)

The caret ^ at the beginning of the set means "NOT". In other words, match all characters that are NOT lowercase letters a-z, uppercase letters A-Z, digits 0-9 or whitespace \s characters.

You can add any characters that you don't want to match between the square brackets of the regular expression.

You can tweak the regular expression according to your use case. This section of the docs has information regarding what each special character does.

This article describes how to split strings by delimiters, line breaks, regular expressions, and the number of characters in Python.

  • Split by delimiter: split()
    • Specify the delimiter: sep
    • Specify the maximum number of splits: maxsplit
  • Split from right by delimiter: rsplit()
  • Split by line break: splitlines()
  • Split by regex: re.split()
    • Split by multiple different delimiters
  • Concatenate a list of strings
  • Split based on the number of characters: slice

See the following article for more information on how to concatenate and extract strings.

  • Concatenate strings in Python (+ operator, join, etc.)
  • Extract a substring from a string in Python (position, regex)

Split by delimiter: split()

Use split() method to split by delimiter.

  • str.split() — Python 3.7.3 documentation

If the argument is omitted, it will be split by whitespace, such as spaces, newlines \n, and tabs \t. Consecutive whitespace is processed together.

A list of the words is returned.

s_blank = 'one two     three\nfour\tfive'
print(s_blank)
# one two     three
# four  five

print(s_blank.split())
# ['one', 'two', 'three', 'four', 'five']

print(type(s_blank.split()))
# 

Use join(), described below, to concatenate a list into a string.

Specify the delimiter: sep

Specify a delimiter for the first parameter sep.

s_comma = 'one,two,three,four,five'

print(s_comma.split(','))
# ['one', 'two', 'three', 'four', 'five']

print(s_comma.split('three'))
# ['one,two,', ',four,five']

If you want to specify multiple delimiters, use regular expressions as described later.

Specify the maximum number of splits: maxsplit

Specify the maximum number of splits for the second parameter maxsplit.

If maxsplit is given, at most, maxsplit splits are done.

print(s_comma.split(',', 2))
# ['one', 'two', 'three,four,five']

For example, it is useful to delete the first line from a string.

If sep='\n', maxsplit=1, you can get a list of strings split by the first newline character \n. The second element [1] of this list is a string excluding the first line. As it is the last element, it can be specified as [-1].

s_lines = 'one\ntwo\nthree\nfour'
print(s_lines)
# one
# two
# three
# four

print(s_lines.split('\n', 1))
# ['one', 'two\nthree\nfour']

print(s_lines.split('\n', 1)[0])
# one

print(s_lines.split('\n', 1)[1])
# two
# three
# four

print(s_lines.split('\n', 1)[-1])
# two
# three
# four

Similarly, to delete the first two lines:

print(s_lines.split('\n', 2)[-1])
# three
# four

Split from right by delimiter: rsplit()

rsplit() splits from the right of the string.

  • str.rsplit() — Python 3.7.3 documentation

The result is different from split() only when the second parameter maxsplit is given.

In the same way as split(), if you want to delete the last line, use rsplit().

print(s_lines.rsplit('\n', 1))
# ['one\ntwo\nthree', 'four']

print(s_lines.rsplit('\n', 1)[0])
# one
# two
# three

print(s_lines.rsplit('\n', 1)[1])
# four

To delete the last two lines:

print(s_lines.rsplit('\n', 2)[0])
# one
# two

Split by line break: splitlines()

There is also a splitlines() for splitting by line boundaries.

  • str.splitlines() — Python 3.7.3 documentation

As in the previous examples, split() and rsplit() split by default with whitespace including line break, and you can also specify line break with the parameter sep.

However, it is often better to use splitlines().

For example, split string that contains \n (LF, used in Unix OS including Mac) and \r\n (CR + LF, used in Windows OS).

s_lines_multi = '1 one\n2 two\r\n3 three\n'
print(s_lines_multi)
# 1 one
# 2 two
# 3 three

When split() is applied, by default, it is split not only by line breaks but also by spaces.

print(s_lines_multi.split())
# ['1', 'one', '2', 'two', '3', 'three']

Since only one newline character can be specified in sep, it cannot be split if there are mixed newline characters. It is also split at the end of the newline character.

print(s_lines_multi.split('\n'))
# ['1 one', '2 two\r', '3 three', '']

splitlines() splits at various newline characters but not at other whitespaces.

print(s_lines_multi.splitlines())
# ['1 one', '2 two', '3 three']

If the first argument, keepends, is set to True, the result includes a newline character at the end of the line.

print(s_lines_multi.splitlines(True))
# ['1 one\n', '2 two\r\n', '3 three\n']

See the following article for other operations with line breaks.

  • Handle line breaks (newlines) in Python

Split by regex: re.split()

split() and rsplit() split only when sep matches completely.

If you want to split a string that matches a regular expression (regex) instead of perfect match, use the split() of the re module.

  • re.split() — Regular expression operations — Python 3.7.3 documentation

In re.split(), specify the regex pattern in the first parameter and the target character string in the second parameter.

An example of split by consecutive numbers is as follows.

import re

s_nums = 'one1two22three333four'

print(re.split('\d+', s_nums))
# ['one', 'two', 'three', 'four']

The maximum number of splits can be specified in the third parameter, maxsplit.

print(re.split('\d+', s_nums, 2))
# ['one', 'two', 'three333four']

Split by multiple different delimiters

The following two are useful to remember even if you are not familiar with the regex.

Enclose a string with [] to match any single character in it. You can split string by multiple different characters.

s_marks = 'one-two+three#four'

print(re.split('[-+#]', s_marks))
# ['one', 'two', 'three', 'four']

If patterns are delimited by |, it matches any pattern. Of course, it is possible to use special characters of regex for each pattern, but it is OK even if normal string is specified as it is. You can split by multiple different strings.

s_strs = 'oneXXXtwoYYYthreeZZZfour'

print(re.split('XXX|YYY|ZZZ', s_strs))
# ['one', 'two', 'three', 'four']

Concatenate a list of strings

In the previous examples, you can split the string and got the list.

If you want to concatenate a list of strings into one string, use the string method, join().

Call join() from 'separator', and pass a list of strings to be concatenated to argument.

l = ['one', 'two', 'three']

print(','.join(l))
# one,two,three

print('\n'.join(l))
# one
# two
# three

print(''.join(l))
# onetwothree

See the following article for details of string concatenation.

  • Concatenate strings in Python (+ operator, join, etc.)

Split based on the number of characters: slice

Use slice to split strings based on the number of characters.

  • How to slice a list, string, tuple in Python

s = 'abcdefghij'

print(s[:5])
# abcde

print(s[5:])
# fghij

It can be obtained as a tuple or assigned to a variable respectively.

  • Multiple assignment in Python: Assign multiple values or the same value to multiple variables

s_tuple = s[:5], s[5:]

print(s_tuple)
# ('abcde', 'fghij')

print(type(s_tuple))
# 

s_first, s_last = s[:5], s[5:]

print(s_first)
# abcde

print(s_last)
# fghij

Split into three:

s_first, s_second, s_last = s[:3], s[3:6], s[6:]

print(s_first)
# abc

print(s_second)
# def

print(s_last)
# ghij

The number of characters can be obtained with the built-in function len(). It can also be split into halves using this.

half = len(s) // 2
print(half)
# 5

s_first, s_last = s[:half], s[half:]

print(s_first)
# abcde

print(s_last)
# fghij

If you want to concatenate strings, use the + operator.

print(s_first + s_last)
# abcdefghij

How do you split a string by special characters in Python?

To split the string on non-alphanumeric characters, you can use the special character \W , equivalent to [^a-zA-Z0-9_] .

Can you split a string by two characters in Python?

The Python standard library comes with a function for splitting strings: the split() function. This function can be used to split strings between characters. The split() function takes two parameters. The first is called the separator and it determines which character is used to split the string.

How do you separate characters numbers and special characters from given string in Python?

Determine the string's length..
Individually scan each character (ch) in a string. Add it to the res1 string if (ch is a digit). ... .
Print every string. We shall have three strings: one with a numeric component, one without a numeric component, and one with special characters...

How do you split a string into 3 parts in Python?

Python 3 - String split() Method The split() method returns a list of all the words in the string, using str as the separator (splits on all whitespace if left unspecified), optionally limiting the number of splits to num.