Python re split keep delimiter

In the below code, there is a simple, very efficient and well tested answer to this question. The code has comments explaining everything in it.

I promise it's not as scary as it looks - it's actually only 13 lines of code! The rest are all comments, docs and assertions

def split_including_delimiters(input: str, delimiter: str):
    """
    Splits an input string, while including the delimiters in the output
    
    Unlike str.split, we can use an empty string as a delimiter
    Unlike str.split, the output will not have any extra empty strings
    Conequently, len(''.split(delimiter))== 0 for all delimiters,
       whereas len(input.split(delimiter))>0 for all inputs and delimiters
    
    INPUTS:
        input: Can be any string
        delimiter: Can be any string

    EXAMPLES:
         >>> split_and_keep_delimiter('Hello World  ! ',' ')
        ans = ['Hello ', 'World ', ' ', '! ', ' ']
         >>> split_and_keep_delimiter("Hello**World**!***", "**")
        ans = ['Hello', '**', 'World', '**', '!', '**', '*']
    EXAMPLES:
        assert split_and_keep_delimiter('-xx-xx-','xx') == ['-', 'xx', '-', 'xx', '-'] # length 5
        assert split_and_keep_delimiter('xx-xx-' ,'xx') == ['xx', '-', 'xx', '-']      # length 4
        assert split_and_keep_delimiter('-xx-xx' ,'xx') == ['-', 'xx', '-', 'xx']      # length 4
        assert split_and_keep_delimiter('xx-xx'  ,'xx') == ['xx', '-', 'xx']           # length 3
        assert split_and_keep_delimiter('xxxx'   ,'xx') == ['xx', 'xx']                # length 2
        assert split_and_keep_delimiter('xxx'    ,'xx') == ['xx', 'x']                 # length 2
        assert split_and_keep_delimiter('x'      ,'xx') == ['x']                       # length 1
        assert split_and_keep_delimiter(''       ,'xx') == []                          # length 0
        assert split_and_keep_delimiter('aaa'    ,'xx') == ['aaa']                     # length 1
        assert split_and_keep_delimiter('aa'     ,'xx') == ['aa']                      # length 1
        assert split_and_keep_delimiter('a'      ,'xx') == ['a']                       # length 1
        assert split_and_keep_delimiter(''       ,''  ) == []                          # length 0
        assert split_and_keep_delimiter('a'      ,''  ) == ['a']                       # length 1
        assert split_and_keep_delimiter('aa'     ,''  ) == ['a', '', 'a']              # length 3
        assert split_and_keep_delimiter('aaa'    ,''  ) == ['a', '', 'a', '', 'a']     # length 5
    """

    # Input assertions
    assert isinstance(input,str), "input must be a string"
    assert isinstance(delimiter,str), "delimiter must be a string"

    if delimiter:
        # These tokens do not include the delimiter, but are computed quickly
        tokens = input.split(delimiter)
    else:
        # Edge case: if the delimiter is the empty string, split between the characters
        tokens = list(input)
        
    # The following assertions are always true for any string input and delimiter
    # For speed's sake, we disable this assertion
    # assert delimiter.join(tokens) == input

    output = tokens[:1]

    for token in tokens[1:]:
        output.append(delimiter)
        if token:
            output.append(token)
    
    # Don't let the first element be an empty string
    if output[:1]==['']:
        del output[0]
        
    # The only case where we should have an empty string in the output is if it is our delimiter
    # For speed's sake, we disable this assertion
    # assert delimiter=='' or '' not in output
        
    # The resulting strings should be combinable back into the original string
    # For speed's sake, we disable this assertion
    # assert ''.join(output) == input

    return output

Summary: To split a string and keep the delimiters/separators you can use one of the following methods:

  • Use a regex module and the split() method along with \W special character.
  • Use a regex module and the split() method along with a negative character set [^a-zA-Z0-9].
  • Use a regex module and the split() method along with the either-or metacharacter |.
  • Use a List Comprehension and append the separator.
  • Split using line break: splitlines()

You can try the first method in our interactive Python shell:

Exercise: Add more words with special delimiters to the text. Does it still work?


Let’s dive into the problem in a step-by-step manner!

Problem: Given a string in Python; how to split the string and also keep the separators/delimiter?

A sequence of one or more characters used to separate two or more parts of a given string or a data stream is known as a delimiter or a separator.

Example: Consider that there’s a given string as shown in this example below and you need to split it such that the separators/delimiters are also stored along with the word characters in a list. Please follow the example given below to get an overview of our problem statement.

text = 'finxter,[email protected]*1%every day'
somemethod(text)

Desired Output:

['finxter', ',', 'practice', '@', 'Python', '*', '1', '%', 'every', ' ', 'day']
Python re split keep delimiter
fig: The Blue Boxes represent the word characters/strings while the Yellow Boxes represent the delimiters/separators.

Now that we have an overview of our problem, let us dive into the solutions without any delay!

  • Using Regular Expressions (RegEx)
  • Method 1: Using ‘(\W)’
  • Method 2: Using [^] Set
  • Method 3: Using Either Or (|) Metacharacter To Specify The Delimiters
  • Method 4: Using a List Comprehension And Appending The Separator
  • Method 5: Split Using Line Break: splitlines()
  • Conclusion
  • Where to Go From Here?

Using Regular Expressions (RegEx)

The most efficient way of splitting the string and extract the characters along with the separators is to use regular expressions along with the split() function.

  • split() is an inbuilt method in Python which is used to split a string that matches a regular expression. You can learn more about the split() function by following this article.

Let us have a look at the different regular expressions that can be used to solve our problem:

Method 1: Using ‘(\W)’

One of the ways in which we can split the given string along with the delimiter is to import the regex module and then split the string using the split() function with the | meta-character.

import re

text = 'fnixter,[email protected]*1%every day'
print(re.split('(\W)', text))

Output

['finxter', ',', 'practice', '@', 'Python', '*', '1', '%', 'every', ' ', 'day']

Let us examine and discuss the expression used here:

  • () is used to keep or store the separators/delimiters along with the word characters.
  • \W is a special sequence that returns a match where it does not find any word characters in the given string. Here it is used to find the delimiters while splitting the string.

Method 2: Using [^] Set

Another way of splitting the string using regex is to split it using the split() function along with the ([^a-zA-Z0-9]) set.

Let us have a look at the following example to see how this works:

import re

text = 'finxter,[email protected]*1%every day'
print(re.split('([^a-zA-Z0-9])', text))

Output

['finxter', ',', 'practice', '@', 'Python', '*', '1', '%', 'every', ' ', 'day']

Let us examine the expression used here:

  • () is used to keep or store separators along with the word characters.
  • [] is used to match a set of characters within the string.
  • [^a-zA-Z0-9] is used to return a match for any character EXCEPT alphabets (both Capital Letters and Small Letters) and Numbers, i.e. it is used to find a delimiter/separator. In this case, the set is used to find a delimiter and split the string into word characters accordingly.

Method 3: Using Either Or (|) Metacharacter To Specify The Delimiters

Another approach to solving our problem is to split the string using the split() function along with the either-or metacharacter | to provide/specify multiple delimiters within the string according to which we want to split the string. A metacharacter is used to convey a special meaning to a regular expression.

In our case the delimiters that we need to specify using the | character are [,|@|%| |*]

Let us have a look at the following program to see how the either-or meta-character works:

import re

text = 'finxter,[email protected]*1%every day'
print(re.split('([,|@|%| |*])', text))

Output

['finxter', ',', 'practice', '@', 'Python', '*', '1', '%', 'every', ' ', 'day']

Now let us try a few methods which do not use regular expressions.

#Note

Two other methods need special mention in the list of our solutions. Though they are not the exact solutions to our problem statement. However, they might prove to be handy in different scenarios based on the requirement.

Let us discuss these methods:

Disclaimer: The following have a single type of separator in between the words.

Method 4: Using a List Comprehension And Appending The Separator

Considering the string has a single separator, for e.g:

ip = '192.168.10.32'

To split this string we can use a list comprehension to achieve a one-line solution as given below:

ip = '192.168.10.32'
print([u for x in ip.split('.') for u in (x, '.')])

Output

['192', '.', '168', '.', '10', '.', '32', '.']

Method 5: Split Using Line Break: splitlines()

In case the separator needed is a line break, we can use the splitlines() function to split the given string based on the line breaks. The splitlines() inbuilt function is used to split the string breaking at line boundaries.

Let us have a look at the following example to see how the splitlines() function works:

text = """1. This is the first line.
2. This is the second line.
3. This is the third line."""
# If the first argument is set to True, the result includes a newline character at the end of the line.
print(text.splitlines(True))

Output

['1. This is the first line.\n', '2. This is the second line.\n', '3. This is the third line.']

Conclusion

Therefore, in this article, we discussed various methods to split a string and store the word characters along with the separators/delimiters. I highly recommend you to read our Blog Tutorial if you want to master the concept of Python regular expressions.

I hope you enjoyed this article and it helps you in your Python coding journey. Please subscribe and stay tuned for more interesting articles!

Where to Go From Here?

Enough theory. Let’s get some practice!

Coders get paid six figures and more because they can solve problems more effectively using machine intelligence and automation.

To become more successful in coding, solve more real problems for real people. That’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?

You build high-value coding skills by working on practical coding projects!

Do you want to stop learning with toy projects and focus on practical code projects that earn you money and solve real problems for people?

🚀 If your answer is YES!, consider becoming a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.

If you just want to learn about the freelancing opportunity, feel free to watch my free webinar “How to Build Your High-Income Skill Python” and learn how I grew my coding business online and how you can, too—from the comfort of your own home.

Join the free webinar now!

Python re split keep delimiter

I am a professional Python Blogger and Content creator. I have published numerous articles and created courses over a period of time. Presently I am working as a full-time freelancer and I have experience in domains like Python, AWS, DevOps, and Networking.

You can contact me @:

UpWork
LinkedIn

How do you split a string without deleting delimiter in Python?

If you need to split the delimiters as separate items in the list, use the re. split() method..
Use the str. split() method to split the string into a list..
Use a list comprehension to iterate over the list..
On each iteration, add the delimiter to the item..

How do you split a string and keep the separator in Python?

Python String split() Method Syntax.
Syntax : str.split(separator, maxsplit).
Parameters :.
Returns : Returns a list of strings after breaking the given string by the specified separator..

How do you split a string and keep the separators?

How To Split A String And Keep The Separators?.
Use a regex module and the split() method along with \W special character..
Use a regex module and the split() method along with a negative character set [^a-zA-Z0-9] ..
Use a regex module and the split() method along with the either-or metacharacter | ..

What is re split () in Python?

The re. split() function splits the given string according to the occurrence of a particular character or pattern. Upon finding the pattern, this function returns the remaining characters from the string in a list.

Tải thêm tài liệu liên quan đến bài viết Python re split keep delimiter