Python requests post utf 8 encoding

I'm trying to post a snippet of text containing fancy unicode symbols to a web service using the requests library. I'm using Python 3.5.

text = "Två dagar kvar🎉🎉"
r = requests.post["//json-tagger.herokuapp.com/tag", data=text]
print[r.json[]

I get an UnicodeEncodeError, but I can't figure out what I'm doing wrong on my side, the docs for requests only talk about unicode in GET requests from what I see.

    UnicodeEncodeError                        Traceback [most recent call last]
 in []
     19         print["cleaned : " + line]
     20 
---> 21         r = requests.post["//json-tagger.herokuapp.com/tag", data=line]
     22         sentences = r.json[]['sentences']
     23         for sentence in sentences:

//anaconda/lib/python3.4/site-packages/requests/api.py in post[url, data, json, **kwargs]
    105     """
    106 
--> 107     return request['post', url, data=data, json=json, **kwargs]
    108 
    109 

//anaconda/lib/python3.4/site-packages/requests/api.py in request[method, url, **kwargs]
     51     # cases, and look like a memory leak in others.
     52     with sessions.Session[] as session:
---> 53         return session.request[method=method, url=url, **kwargs]
     54 
     55 

//anaconda/lib/python3.4/site-packages/requests/sessions.py in request[self, method, url, params, data, headers, cookies, files, auth,     timeout, allow_redirects, proxies, hooks, stream, verify, cert, json]
    466         }
    467         send_kwargs.update[settings]
--> 468         resp = self.send[prep, **send_kwargs]
    469 
    470         return resp

//anaconda/lib/python3.4/site-packages/requests/sessions.py in send[self, request, **kwargs]
    574 
    575         # Send the request
--> 576         r = adapter.send[request, **kwargs]
    577 
    578         # Total elapsed time of the request [approximately]

//anaconda/lib/python3.4/site-packages/requests/adapters.py in send[self, request, stream, timeout, verify, cert, proxies]
    374                     decode_content=False,
    375                     retries=self.max_retries,
--> 376                     timeout=timeout
    377                 ]
    378 

//anaconda/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py in urlopen[self, method, url, body, headers, retries,     redirect, assert_same_host, timeout, pool_timeout, release_conn, **response_kw]
    557             httplib_response = self._make_request[conn, method, url,
    558                                                   timeout=timeout_obj,
--> 559                                                   body=body, headers=headers]
    560 
    561             # If we're going to release the connection in ``finally:``, then

//anaconda/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py in _make_request[self, conn, method, url, timeout,     **httplib_request_kw]
    351         # conn.request[] calls httplib.*.request, not the method in
    352         # urllib3.request. It also calls makefile [recv] on the socket.
--> 353         conn.request[method, url, **httplib_request_kw]
    354 
    355         # Reset the timeout for the recv[] on the socket

//anaconda/lib/python3.4/http/client.py in request[self, method, url, body, headers]
   1086     def request[self, method, url, body=None, headers={}]:
   1087         """Send a complete request to the server."""
-> 1088         self._send_request[method, url, body, headers]
   1089 
   1090     def _set_content_length[self, body]:

//anaconda/lib/python3.4/http/client.py in _send_request[self, method, url, body, headers]
   1123             # RFC 2616 Section 3.7.1 says that text default has a
   1124             # default charset of iso-8859-1.
-> 1125             body = body.encode['iso-8859-1']
   1126         self.endheaders[body]
   1127 

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 14-15: ordinal not in range[256]

WORKAROUND: I remove all unicode characters from the text from the "emoticon" block, U+1F600 - U+1F64F and Symbols And Pictographs" block, U+1F300 - U+1F5FF according to this answer with the following code, since I don't need emoticons and pictures for the analysis:

text = re.sub[r'[^\u1F600-\u1F64F ]|[^\u1F300-\u1F5FF ]',"",text]

UPDATE The creator of the web service has fixed this now and updated the documentation. All you have to do is to send an encoded string, in Python 3:

""Två dagar kvar🎉🎉".encode["utf-8"]

Apr-25-2019, 05:25 PM [This post was last modified: Apr-25-2019, 06:00 PM by MaverinCode.]

I'm trying to send a unicode string, which contains chinese characters, from a python program to a web server.

The code for this is:

import json
import requests

    #get json
    data = json.dumps[{"Input": {"Row": [{"BusinessCard": "那是一句话"}]}}, ensure_ascii=False]
    #encode json to utf-8
    encoded_data = data.encode['utf-8']

    r = requests.post["//some-server.com/service", data=encoded_data,
                      headers={'Content-Type': 'application/json; charset=UTF-8'},
                      auth=requests.auth.HTTPBasicAuth["user", "password"]]

The encoded_data variable:

b'{"Input": {"Row": [{"BusinessCard": "\xe9\x82\xa3\xe6\x98\xaf\xe4\xb8\x80\xe5\x8f\xa5\xe8\xaf\x9d"}]}}'

My problem is, if send this to the server, the server doesn't recognize the utf-8 encoded data as chinese characters. The server just sees '"\xe9\x82\xa3...' and can't handle them.

But I'm forced to encode the data to utf-8 or the post method will give me the this unicode error:

Output:

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 37-41: Body ['那是一句话'] is not valid Latin-1. Use body.encode['utf-8'] if you want to send it encoded in UTF-8.

If I send the request from Postman, it works fine. The postman request is similiar, there is only the header for basic authentication and the content type header.

Also, if I send from the python program a test request to a simple spring boot server with a rest controller, the controller/spring recognizes the encoded values right as chinese characters.

How can I send the json data in the body like Postman does, so that the server understands the chinese characters?

I searched a few hours but couldn't find any answer to a similar case like this.

I would be really thankful for some help.

Greetings
MaverinCode

Ok, got some help from Stack Overflow.

I set the ensure_ascii parameter to 'True' and it works.

But I would be thankful for some further explanation if someone knows more.

The documentation states: "If ensure_ascii is true [the default], the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is."

Why have servers no trouble to encode escaped unicode signs which have been encoded to utf-8 but trouble with none escape unicode signs which have been also been encode to utf-8?

JaiM
Unladen Swallow

Posts: 1

Threads: 0

Joined: Nov 2020

Reputation: 0

  Nov-08-2020, 06:45 AM

Very Useful when we try to send data with Chinese Characters via REST API. Thank you for your post.

Chủ Đề