Convert csv file to utf 8 python

I am trying to create a duplicate CSV without a header. When I attempt this I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 1895: invalid start byte.

I've read the python CSV documentation on Unicode and UTF-8 encoding and have implemented it. However, my output file is being generated with no data in it. Not sure what I am doing wrong here.

import csv

path =  '/Users/johndoe/file.csv'

with open(path, 'r') as infile, open(path + 'final.csv', 'w') as outfile:

    def unicode_csv(infile, outfile):
        inputs = csv.reader(utf_8_encoder(infile))
        output = csv.writer(outfile)

        for index, row in enumerate(inputs):
            yield [unicode(cell, 'utf-8') for cell in row]
            if index == 0:
                 continue
        output.writerow(row)

    def utf_8_encoder(infile):
        for line in infile:
            yield line.encode('utf-8')

unicode_csv(infile, outfile)

asked Sep 4, 2015 at 17:04

user3062459user3062459

1,5376 gold badges26 silver badges36 bronze badges

The solution was to simply include two additional parameters to the

with open(path, 'r') as infile:

The two parameters are encoding ='UTF-8' and errors='ignore'. This allowed me to create a duplicate of original CSV without the headers and without the UnicodeDecodeError. Below is the completed code.

import csv

path =  '/Users/johndoe/file.csv'

with open(path, 'r', encoding='utf-8', errors='ignore') as infile, open(path + 'final.csv', 'w') as outfile:
     inputs = csv.reader(infile)
     output = csv.writer(outfile)

     for index, row in enumerate(inputs):
         # Create file with no header
         if index == 0:
             continue
         output.writerow(row)

answered Sep 5, 2015 at 2:08

user3062459user3062459

1,5376 gold badges26 silver badges36 bronze badges

Since the line

unicode_csv(infile,outfile)

isn't indented, it is out of the scope of the with command, and when it called, then infile and outfile are both closed.

The files should be opened when they are used, not when the functions are defined, so have:

with open(path, 'r') as infile, open(path + 'final.csv', 'w') as outfile:
    unicode_csv(infile,outfile)

answered Sep 4, 2015 at 19:39

Convert csv file to utf 8 python

James KJames K

3,5621 gold badge29 silver badges36 bronze badges

If you are able to use pandas, and you know the exact encoding of your file, you could try this:

import pandas as pd

path =  '/Users/johndoe/file.csv'

df = pd.read_csv(path, encoding='ISO-8859-1')
df.to_csv(path, encoding='utf-8', index=False)

answered May 18, 2020 at 10:37

This article concerns the conversion and handling of CSV file formats in combination with the UTF-8 encoding standard.

💡 The Unicode Transformation Format 8-Bit (UTF-8) is a variable-width character encoding used for electronic communication. UTF-8 can encode more than 1 million (more or less weird) characters using 1 to 4 byte code units. Example UTF-8 characters: ☈,☇,★,☃,☄,☍

UTF-8 is the default encoding standard on Windows, Linux, and macOS.

If you write a CSV file using Python’s standard file handling operations such as open() and file.write(), Python will automatically create a UTF-8 file.

So if you came to this website searching for “CSV to UTF-8”, my guess is that you read a different encoded CSV file format such as ASCII, ANSI, or UTF-16 with some “weird” characters.

Say, you want to read this ANSI file:

Convert csv file to utf 8 python

Now, you can simply convert this to an UTF-8 CSV file via the following approach:

  • CSV to UTF-8 Conversion in Python
  • CSV Reader/Writer – CSV to UTF-8 Conversion
  • Pandas – CSV to UTF-8 Conversion
  • ANSI to UTF-8

CSV to UTF-8 Conversion in Python

The no-library approach to convert a CSV file to a CSV UTF-8 file is to open the first file in the non-UTF-8 format and write its contents back in an UTF-8 file right away. You can use the open() function’s encoding argument to set the encoding of the file to be read.

with open('my_file.csv', 'r', encoding='ANSI', errors='ignore') as infile:
    with open('my_file_utf8.csv', 'w') as outfile:
     outfile.write(infile.read())

After conversion from ANSI to UTF-8 using the given approach, the new CSV file is now UTF-8 formatted:

Convert csv file to utf 8 python

CSV Reader/Writer – CSV to UTF-8 Conversion

You don’t need a CSV reader to convert a CSV to UTF-8 as shown in the previous example. However, if you wish to do so, make sure to pass the encoding argument when opening the file reader used to create the CSV Reader object.

import csv


with open('my_file.csv', 'r', encoding='ANSI', errors='ignore') as infile:
    with open('my_file_utf8.csv', 'w', newline='') as outfile:
        reader = csv.reader(infile)
        writer = csv.writer(outfile)
        for row in reader:
            print(row)
            writer.writerow(row)

The extra newline argument is there to prevent Windows adding an extra newline when writing each row.

The output is the same UTF-8 encoded CSV:

Convert csv file to utf 8 python

Pandas – CSV to UTF-8 Conversion

You can use the pandas.read_csv() and to_csv() functions to read and write a CSV file using various encodings (e.g., UTF-8, ASCII, ANSI, ISO) as defined in the encoding argument of both functions.

Here’s an example:

import pandas as pd


df = pd.read_csv('my_file.csv', encoding='ANSI')
df.to_csv('my_file_utf8.csv', encoding='utf-8', index=False)

ANSI to UTF-8

The no-library approach to convert an ANSI-encoded CSV file to a UTF-8-encoded CSV file is to open the first file in the ANSI format and write its contents back in an UTF-8 file. Use the open() function’s encoding argument to set the encoding of the file to be read.

Here’s an example:

with open('my_file.csv', 'r', encoding='ANSI', errors='ignore') as infile:
    with open('my_file_utf8.csv', 'w') as outfile:
     outfile.write(infile.read())

This converts the following ANSI file to an UTF-8 file:

Convert csv file to utf 8 python

Related Tu

Convert csv file to utf 8 python

While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.

To help students reach higher levels of Python success, he founded the programming education website Finxter.com. He’s author of the popular programming book Python One-Liners (NoStarch 2020), coauthor of the Coffee Break Python series of self-published books, computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.

His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.

How do I change a CSV file to UTF

UTF-8 Encoding in Microsoft Excel (Windows).
Open your CSV file in Microsoft Excel..
Click File in the top-left corner of your screen..
Select Save as....
Click the drop-down menu next to File format..
Select CSV UTF-8 (Comma delimited) (. csv) from the drop-down menu..
Click Save..

How do I change the encoding of a CSV file in Python?

CSV to UTF-8 Conversion in Python.
with open('my_file.csv', 'r', encoding='ANSI', errors='ignore') as infile:.
with open('my_file_utf8.csv', 'w') as outfile:.
outfile. write(infile. read()).

How do I convert data to UTF

How to Convert a String to UTF-8 in Python?.
string1 = "apple" string2 = "Preeti125" string3 = "12345" string4 = "pre@12".
string. encode(encoding = 'UTF-8', errors = 'strict').
# unicode string string = 'pythön!' # default encoding to utf-8 string_utf = string. encode() print('The encoded version is:', string_utf).

How do I check the encoding of a CSV file in Python?

The evaluated encoding of the open file will display on the bottom bar, far right side. The encodings supported can be seen by going to Settings -> Preferences -> New Document/Default Directory and looking in the drop down.