I'm trying to get data from a zipped csv file. Is there a way to do this without unzipping the whole files? If not, how can I unzip the files and read them efficiently?
Burhan Ali
2,2571 gold badge25 silver badges38 bronze badges
asked Nov 15, 2014 at 4:16
1
I used the zipfile
module to import the ZIP directly to pandas dataframe. Let's say the file name is "intfile" and it's in .zip named "THEZIPFILE":
import pandas as pd
import zipfile
zf = zipfile.ZipFile['C:/Users/Desktop/THEZIPFILE.zip']
df = pd.read_csv[zf.open['intfile.csv']]
ZygD
15.7k37 gold badges67 silver badges87 bronze badges
answered May 8, 2016 at 13:25
YaronYaron
1,51714 silver badges14 bronze badges
1
If you aren't using Pandas it can be done entirely with the standard lib. Here is Python 3.7 code:
import csv
from io import TextIOWrapper
from zipfile import ZipFile
with ZipFile['yourfile.zip'] as zf:
with zf.open['your_csv_inside_zip.csv', 'r'] as infile:
reader = csv.reader[TextIOWrapper[infile, 'utf-8']]
for row in reader:
# process the CSV here
print[row]
answered Jun 25, 2019 at 21:12
volker238volker238
2,0611 gold badge18 silver badges15 bronze badges
3
A quick solution can be using below code!
import pandas as pd
#pandas support zip file reads
df = pd.read_csv["/path/to/file.csv.zip"]
answered Oct 4, 2019 at 10:58
Hari PrasadHari Prasad
9021 gold badge8 silver badges10 bronze badges
1
zipfile also supports the with statement.
So adding onto yaron's answer of using pandas:
with zipfile.ZipFile['file.zip'] as zip:
with zip.open['file.csv'] as myZip:
df = pd.read_csv[myZip]
answered May 22, 2017 at 16:43
Thought Yaron had the best answer but thought I would add a code that iterated through multiple files inside a zip folder. It will then append the results:
import os
import pandas as pd
import zipfile
curDir = os.getcwd[]
zf = zipfile.ZipFile[curDir + '/targetfolder.zip']
text_files = zf.infolist[]
list_ = []
print ["Uncompressing and reading data... "]
for text_file in text_files:
print[text_file.filename]
df = pd.read_csv[zf.open[text_file.filename]
# do df manipulations
list_.append[df]
df = pd.concat[list_]
Xukrao
7,1744 gold badges25 silver badges50 bronze badges
answered Sep 13, 2017 at 18:14
Yes. You want the module 'zipfile'
You open the zip file itself with zipfile.ZipInfo[[filename[, date_time]]]
You can then use ZipFile.infolist[]
to enumerate each file within the zip, and extract it with ZipFile.open[name[, mode[, pwd]]]
answered Nov 15, 2014 at 4:30
brycembrycem
5833 silver badges9 bronze badges
this is the simplest thing I always use.
import pandas as pd
df = pd.read_csv["Train.zip",compression='zip']
SHR
7,6409 gold badges36 silver badges56 bronze badges
answered Nov 4, 2020 at 11:01
Supposing you are downloading a zip file that contains a CSV and you don't want to use temporary storage. Here is what a sample implementation looks like:
#!/usr/bin/env python3
from csv import DictReader
from io import TextIOWrapper, BytesIO
from zipfile import ZipFile
import requests
def all_tickers[]:
url = "//simfin.com/api/bulk/bulk.php?dataset=industries&variant=null"
r = requests.get[url]
zip_ref = ZipFile[BytesIO[r.content]]
for name in zip_ref.namelist[]:
print[name]
with zip_ref.open[name] as file_contents:
reader = DictReader[TextIOWrapper[file_contents, 'utf-8'], delimiter=';']
for item in reader:
print[item]
This takes care of all python3 bytes/str issues.
answered Feb 2, 2021 at 3:09
hughdbrownhughdbrown
46k20 gold badges81 silver badges106 bronze badges
1
If you have a file name: my_big_file.csv
and you zip it with the same name my_big_file.zip
you may simply do this:
df = pd.read_csv["my_big_file.zip"]
Note: check your pandas version first [not applicable for older versions]
answered Mar 9, 2021 at 16:29
adhgadhg
9,90711 gold badges56 silver badges93 bronze badges