I'm pulling Twitter data via their API and one of the tweets has a special character [the right apostrophe] and I keep getting an error saying that Python can't map or character map the character. I've looked all over the Internet, but I have yet to find a solution for this issue. I just want to replace that character with either an apostrophe that Python will recognize, or an empty string [essentially removing it]. I'm using Python 3.3. Any input on how to fix this problem? It may seem simple, but I'm a newbie at Python.
Edit: Here is the function I'm using to try to filter out the unicode characters that throw errors.
@staticmethod
def UnicodeFilter[var]:
temp = var
temp = temp.replace[chr[2019], "'"]
temp = Functions.ToSQL[temp]
return temp
Also, when running the program, my error is as follows.
'charmap' codec can't encode character '\u2019' in position 59: character maps to 'undefined'
Edit: Here is a sample of my source code:
import json
import mysql.connector
import unicodedata
from MySQLCL import MySQLCL
class Functions[object]:
"""This is a class for Python functions"""
@staticmethod
def Clean[string]:
temp = str[string]
temp = temp.replace["'", ""].replace["[", ""].replace["]", ""].replace[",", ""].strip[]
return temp
@staticmethod
def ParseTweet[string]:
for x in range[0, len[string]]:
tweetid = string[x]["id_str"]
tweetcreated = string[x]["created_at"]
tweettext = string[x]["text"]
tweetsource = string[x]["source"]
truncated = string[x]["truncated"]
inreplytostatusid = string[x]["in_reply_to_status_id"]
inreplytouserid = string[x]["in_reply_to_user_id"]
inreplytoscreenname = string[x]["in_reply_to_screen_name"]
geo = string[x]["geo"]
coordinates = string[x]["coordinates"]
place = string[x]["place"]
contributors = string[x]["contributors"]
isquotestatus = string[x]["is_quote_status"]
retweetcount = string[x]["retweet_count"]
favoritecount = string[x]["favorite_count"]
favorited = string[x]["favorited"]
retweeted = string[x]["retweeted"]
possiblysensitive = string[x]["possibly_sensitive"]
language = string[x]["lang"]
print[Functions.UnicodeFilter[tweettext]]
#print["INSERT INTO tweet[ExTweetID, TweetText, Truncated, InReplyToStatusID, InReplyToUserID, InReplyToScreenName, IsQuoteStatus, RetweetCount, FavoriteCount, Favorited, Retweeted, Language, TweetDate, TweetSource, PossiblySensitive] VALUES [" + str[tweetid] + ", '" + Functions.UnicodeFilter[tweettext] + "', " + str[truncated] + ", " + Functions.CheckNull[inreplytostatusid] + ", " + Functions.CheckNull[inreplytouserid] + ", '" + Functions.CheckNull[inreplytoscreenname] + "', " + str[isquotestatus] + ", " + str[retweetcount] + ", " + str[favoritecount] + ", " + str[favorited] + ", " + str[retweeted] + ", '" + str[language] + "', '" + Functions.ToSQL[tweetcreated] + "', '" + Functions.ToSQL[tweetsource] + "', " + str[possiblysensitive] + "]"]
#MySQLCL.Set["INSERT INTO tweet[ExTweetID, TweetText, Truncated, InReplyToStatusID, InReplyToUserID, InReplyToScreenName, IsQuoteStatus, RetweetCount, FavoriteCount, Favorited, Retweeted, Language, TweetDate, TweetSource, PossiblySensitive] VALUES [" + str[tweetid] + ", '" + tweettext + "', " + str[truncated] + ", " + Functions.CheckNull[inreplytostatusid] + ", " + Functions.CheckNull[inreplytouserid] + ", '" + Functions.CheckNull[inreplytoscreenname] + "', " + str[isquotestatus] + ", " + str[retweetcount] + ", " + str[favoritecount] + ", " + str[favorited] + ", " + str[retweeted] + ", '" + language + "', '" + tweetcreated + "', '" + str[tweetsource] + "', " + str[possiblysensitive] + "]"]
@staticmethod
def ToBool[variable]:
if variable.lower[] == 'true':
return True
elif variable.lower[] == 'false':
return False
@staticmethod
def CheckNull[var]:
if var == None:
return ""
else:
return var
@staticmethod
def ToSQL[var]:
temp = var
temp = temp.replace["'", "''"]
return str[temp]
@staticmethod
def UnicodeFilter[var]:
temp = var
#temp = temp.replace[chr[2019], "'"]
unicodestr = unicode[temp, 'utf-8']
if unicodestr != temp:
temp = "'"
temp = Functions.ToSQL[temp]
return temp
ekhumoro's response was correct.
Replacing two characters
I timed all the methods in the current answers along with one extra.
Nội dung chính
- Replacing two characters
- Replacing 17 characters
- Can we replace multiple characters in Python?
- How do you replace multiple characters in a column in Python?
- How do you replace multiple elements in a list in Python?
- How do you replace special characters in Python?
With an input string of abc&def#ghi
and replacing & -> \& and # -> \#, the fastest way was to chain together the replacements like this: text.replace['&', '\&'].replace['#', '\#']
.
Timings for each function:
- a] 1000000 loops, best of 3: 1.47 μs per loop
- b] 1000000 loops, best of 3: 1.51 μs per loop
- c] 100000 loops, best of 3: 12.3 μs per loop
- d] 100000 loops, best of 3: 12 μs per loop
- e] 100000 loops, best of 3: 3.27 μs per loop
- f] 1000000 loops, best of 3: 0.817 μs per loop
- g] 100000 loops, best of 3: 3.64 μs per loop
- h] 1000000 loops, best of 3: 0.927 μs per loop
- i] 1000000 loops, best of 3: 0.814 μs per loop
Here are the functions:
def a[text]:
chars = ""
for c in chars:
text = text.replace[c, "\\" + c]
def b[text]:
for ch in ['&','#']:
if ch in text:
text = text.replace[ch,"\\"+ch]
import re
def c[text]:
rx = re.compile['[[]]']
text = rx.sub[r'\\\1', text]
RX = re.compile['[[]]']
def d[text]:
text = RX.sub[r'\\\1', text]
def mk_esc[esc_chars]:
return lambda s: ''.join[['\\' + c if c in esc_chars else c for c in s]]
esc = mk_esc['']
def e[text]:
esc[text]
def f[text]:
text = text.replace['&', '\&'].replace['#', '\#']
def g[text]:
replacements = {"&": "\&", "#": "\#"}
text = "".join[[replacements.get[c, c] for c in text]]
def h[text]:
text = text.replace['&', r'\&']
text = text.replace['#', r'\#']
def i[text]:
text = text.replace['&', r'\&'].replace['#', r'\#']
Timed like this:
python -mtimeit -s"import time_functions" "time_functions.a['abc&def#ghi']"
python -mtimeit -s"import time_functions" "time_functions.b['abc&def#ghi']"
python -mtimeit -s"import time_functions" "time_functions.c['abc&def#ghi']"
python -mtimeit -s"import time_functions" "time_functions.d['abc&def#ghi']"
python -mtimeit -s"import time_functions" "time_functions.e['abc&def#ghi']"
python -mtimeit -s"import time_functions" "time_functions.f['abc&def#ghi']"
python -mtimeit -s"import time_functions" "time_functions.g['abc&def#ghi']"
python -mtimeit -s"import time_functions" "time_functions.h['abc&def#ghi']"
python -mtimeit -s"import time_functions" "time_functions.i['abc&def#ghi']"
Replacing 17 characters
Here's similar code to do the same but with more characters to escape [\`*_{}>#+-.!$]:
def a[text]:
chars = "\\`*_{}[][]>#+-.!$"
for c in chars:
text = text.replace[c, "\\" + c]
def b[text]:
for ch in ['\\','`','*','_','{','}','[',']','[',']','>','#','+','-','.','!','$','\'']:
if ch in text:
text = text.replace[ch,"\\"+ch]
import re
def c[text]:
rx = re.compile['[[]]']
text = rx.sub[r'\\\1', text]
RX = re.compile['[[\\`*_{}[][]>#+-.!$]]']
def d[text]:
text = RX.sub[r'\\\1', text]
def mk_esc[esc_chars]:
return lambda s: ''.join[['\\' + c if c in esc_chars else c for c in s]]
esc = mk_esc['\\`*_{}[][]>#+-.!$']
def e[text]:
esc[text]
def f[text]:
text = text.replace['\\', '\\\\'].replace['`', '\`'].replace['*', '\*'].replace['_', '\_'].replace['{', '\{'].replace['}', '\}'].replace['[', '\['].replace[']', '\]'].replace['[', '\['].replace[']', '\]'].replace['>', '\>'].replace['#', '\#'].replace['+', '\+'].replace['-', '\-'].replace['.', '\.'].replace['!', '\!'].replace['$', '\$']
def g[text]:
replacements = {
"\\": "\\\\",
"`": "\`",
"*": "\*",
"_": "\_",
"{": "\{",
"}": "\}",
"[": "\[",
"]": "\]",
"[": "\[",
"]": "\]",
">": "\>",
"#": "\#",
"+": "\+",
"-": "\-",
".": "\.",
"!": "\!",
"$": "\$",
}
text = "".join[[replacements.get[c, c] for c in text]]
def h[text]:
text = text.replace['\\', r'\\']
text = text.replace['`', r'\`']
text = text.replace['*', r'\*']
text = text.replace['_', r'\_']
text = text.replace['{', r'\{']
text = text.replace['}', r'\}']
text = text.replace['[', r'\[']
text = text.replace[']', r'\]']
text = text.replace['[', r'\[']
text = text.replace[']', r'\]']
text = text.replace['>', r'\>']
text = text.replace['#', r'\#']
text = text.replace['+', r'\+']
text = text.replace['-', r'\-']
text = text.replace['.', r'\.']
text = text.replace['!', r'\!']
text = text.replace['$', r'\$']
def i[text]:
text = text.replace['\\', r'\\'].replace['`', r'\`'].replace['*', r'\*'].replace['_', r'\_'].replace['{', r'\{'].replace['}', r'\}'].replace['[', r'\['].replace[']', r'\]'].replace['[', r'\['].replace[']', r'\]'].replace['>', r'\>'].replace['#', r'\#'].replace['+', r'\+'].replace['-', r'\-'].replace['.', r'\.'].replace['!', r'\!'].replace['$', r'\$']
Here's the results for the same input string abc&def#ghi
:
- a] 100000 loops, best of 3: 6.72 μs per loop
- b] 100000 loops, best of 3: 2.64 μs per loop
- c] 100000 loops, best of 3: 11.9 μs per loop
- d] 100000 loops, best of 3: 4.92 μs per loop
- e] 100000 loops, best of 3: 2.96 μs per loop
- f] 100000 loops, best of 3: 4.29 μs per loop
- g] 100000 loops, best of 3: 4.68 μs per loop
- h] 100000 loops, best of 3: 4.73 μs per loop
- i] 100000 loops, best of 3: 4.24 μs per loop
And with a longer input string [## *Something* and [another] thing in a longer sentence with {more} things to replace$
]:
- a] 100000 loops, best of 3: 7.59 μs per loop
- b] 100000 loops, best of 3: 6.54 μs per loop
- c] 100000 loops, best of 3: 16.9 μs per loop
- d] 100000 loops, best of 3: 7.29 μs per loop
- e] 100000 loops, best of 3: 12.2 μs per loop
- f] 100000 loops, best of 3: 5.38 μs per loop
- g] 10000 loops, best of 3: 21.7 μs per loop
- h] 100000 loops, best of 3: 5.7 μs per loop
- i] 100000 loops, best of 3: 5.13 μs per loop
Adding a couple of variants:
def ab[text]:
for ch in ['\\','`','*','_','{','}','[',']','[',']','>','#','+','-','.','!','$','\'']:
text = text.replace[ch,"\\"+ch]
def ba[text]:
chars = "\\`*_{}[][]>#+-.!$"
for c in chars:
if c in text:
text = text.replace[c, "\\" + c]
With the shorter input:
- ab] 100000 loops, best of 3: 7.05 μs per loop
- ba] 100000 loops, best of 3: 2.4 μs per loop
With the longer input:
- ab] 100000 loops, best of 3: 7.71 μs per loop
- ba] 100000 loops, best of 3: 6.08 μs per loop
So I'm going to use ba
for readability and speed.
Addendum
Prompted by haccks in the comments, one difference between ab
and ba
is the if c in text:
check. Let's test them against two more variants:
def ab_with_check[text]:
for ch in ['\\','`','*','_','{','}','[',']','[',']','>','#','+','-','.','!','$','\'']:
if ch in text:
text = text.replace[ch,"\\"+ch]
def ba_without_check[text]:
chars = "\\`*_{}[][]>#+-.!$"
for c in chars:
text = text.replace[c, "\\" + c]
Times in μs per loop on Python 2.7.14 and 3.6.3, and on a different machine from the earlier set, so cannot be compared directly.
╭────────────╥──────┬───────────────┬──────┬──────────────────╮
│ Py, input ║ ab │ ab_with_check │ ba │ ba_without_check │
╞════════════╬══════╪═══════════════╪══════╪══════════════════╡
│ Py2, short ║ 8.81 │ 4.22 │ 3.45 │ 8.01 │
│ Py3, short ║ 5.54 │ 1.34 │ 1.46 │ 5.34 │
├────────────╫──────┼───────────────┼──────┼──────────────────┤
│ Py2, long ║ 9.3 │ 7.15 │ 6.85 │ 8.55 │
│ Py3, long ║ 7.43 │ 4.38 │ 4.41 │ 7.02 │
└────────────╨──────┴───────────────┴──────┴──────────────────┘
We can conclude that:
Those with the check are up to 4x faster than those without the check
ab_with_check
is slightly in the lead on Python 3, butba
[with check] has a greater lead on Python 2However, the biggest lesson here is Python 3 is up to 3x faster than Python 2! There's not a huge difference between the slowest on Python 3 and fastest on Python 2!
Can we replace multiple characters in Python?
A character in Python is also a string. So, we can use the replace[] method to replace multiple characters in a string. It replaced all the occurrences of, Character 's' with 'X'.
How do you replace multiple characters in a column in Python?
Pandas replace multiple values in column replace. By using DataFrame. replace[] method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.
How do you replace multiple elements in a list in Python?
Replace Multiple Values in a Python List. There may be many times when you want to replace not just a single item, but multiple items. This can be done quite simply using the for loop method shown earlier.
How do you replace special characters in Python?
Using 'str. replace[] , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace[] method will replace all occurrences of the specific character mentioned.