Hướng dẫn python replace multiple punctuation
I would like to find multiple occurrences of exclamation marks, question marks and periods (such as Show
i.e. and Is this possible?
asked Jan 27, 2016 at 15:39
0
That is, remove any sequence of
answered Jan 27, 2016 at 15:49
khelwoodkhelwood 53k13 gold badges79 silver badges99 bronze badges
The way this works is that the parentheses create a capture group, allowing you to access the matched text via the answered Jan 27, 2016 at 15:58
Zachary SlossZachary Sloss 281 gold badge1 silver badge5 bronze badges All of these answers seem to be complicating things or not understanding regex very well. I recommend using special sequences to catch any and all punctuation you're trying to replace with spaces. Nội dung chính
My answer is a simplification of Jonathan's leveraging Python regex special sequences rather than a manual list of punctuation and spaces to catch.
Results:
Compact version:
What separates my version from Jonathan's is symbols like hyphens, tildes, parentheses, brackets, etc are all caught and removed, not just the list of given punctuation, catches any non-space whitespace, like tab, newline, etc. and converts to a single space. Jonathan's version is good if you want to remove a specific list of punctuation but not all punctuation, like my solution does. If you don't want to even allow underscores in your text, you can replace the special sequence
Special sequence explanation, from Python's documentation on regex: "The special sequences consist of
Many times while working with Python strings, we have a problem in which we need to remove certain characters from strings. This can have applications in data preprocessing in the Data Science domain and also in day-day programming. Let’s discuss certain ways in which we can perform this task using Python. Method 1: Remove Punctuation from a String with TranslateThe first two arguments for string.translate method is empty strings, and the third input is a Python list of the punctuation that should be removed. This instructs the Python method to eliminate punctuation from a string. This is one of the best ways to strip punctuation from a string. Python3
Output: Gfg is best for Geeks Method 2: Remove Punctuation from a String with Python loopThis is the brute way in which this task can be performed. In this, we check for the punctuations using a raw string that contain punctuations and then we construct a string removing those punctuations. Python3
Output: The original string is : Gfg, is best : for ! Geeks ; The string after punctuation filter : Gfg is best for Geeks Method 3: Remove Punctuation from a String with regexThe part of replacing with punctuation can also be performed using regex. In this, we replace all punctuation with an empty string using a certain regex. Python3
Output : The original string is : Gfg, is best : for ! Geeks ; The string after punctuation filter : Gfg is best for Geeks Method 4: Using for loop, punctuation string and not in operatorPython3
Output The original string is : Gfg, is best : for ! Geeks ; The string after punctuation filter : Gfg is best for Geeks The Time and Space Complexity for all the methods are the same: Time Complexity: O(n) Auxiliary Space: O(n) How do I get rid of punctuation in Python?We can use replace() method to remove punctuation from python string by replacing each punctuation mark by empty string. We will iterate over the entire punctuation marks one by one replace it by an empty string in our text string. How do I get rid of punctuation in pandas?To remove punctuation with Python Pandas, we can use the DataFrame's str. replace method. We call replace with a regex string that matches all punctuation characters and replace them with empty strings. replace returns a new DataFrame column and we assign that to df['text'] . How do you remove punctuation from Python using NLTK?Use nltk.. sentence = "Think and wonder, wonder and think.". tokenizer = nltk. RegexpTokenizer(r"\w+"). new_words = tokenizer. tokenize(sentence). print(new_words). Does string punctuation include space?Note The string. punctuation values do not include Unicode symbols or whitespace characters. Remove punctuation. |