Hướng dẫn character encoding in python

Làm cách nào để in văn bản được mã hóa UTF-8 vào bảng điều khiển bằng Python 127. So you need some way to say "the next few characters mean something special" which is what the sequence "\x" does. It says: The next two characters are the code of a single character. "\u" does the same using four characters to encode Unicode up to 0xFFFF [65535].

So you can't directly write Unicode to ASCII [because ASCII simply doesn't contain the same characters]. You can write it as string escapes [as in f2]; in this case, the file can be represented as ASCII. Or you can write it as UTF-8, in which case, you need an 8-bit safe stream.

Your solution using decode['string-escape'] does work, but you must be aware how much memory you use: Three times the amount of using codecs.open[].

Remember that a file is just a sequence of bytes with 8 bits. Neither the bits nor the bytes have a meaning. It's you who says "65 means 'A'". Since \xc3\xa1 should become "à" but the computer has no means to know, you must tell it by specifying the encoding which was used when writing the file.

Chủ Đề