Read all json files in directory python

One option is listing all files in a directory with os.listdir and then finding only those that end in '.json':

import os, json
import pandas as pd

path_to_json = 'somedir/'
json_files = [pos_json for pos_json in os.listdir[path_to_json] if pos_json.endswith['.json']]
print[json_files]  # for me this prints ['foo.json']

Now you can use pandas DataFrame.from_dict to read in the json [a python dictionary at this point] to a pandas dataframe:

montreal_json = pd.DataFrame.from_dict[many_jsons[0]]
print montreal_json['features'][0]['geometry']

Prints:

{u'type': u'Point', u'coordinates': [-73.6051013, 45.5115944]}

In this case I had appended some jsons to a list many_jsons. The first json in my list is actually a geojson with some geo data on Montreal. I'm familiar with the content already so I print out the 'geometry' which gives me the lon/lat of Montreal.

The following code sums up everything above:

import os, json
import pandas as pd

# this finds our json files
path_to_json = 'json/'
json_files = [pos_json for pos_json in os.listdir[path_to_json] if pos_json.endswith['.json']]

# here I define my pandas Dataframe with the columns I want to get from the json
jsons_data = pd.DataFrame[columns=['country', 'city', 'long/lat']]

# we need both the json and an index number so use enumerate[]
for index, js in enumerate[json_files]:
    with open[os.path.join[path_to_json, js]] as json_file:
        json_text = json.load[json_file]

        # here you need to know the layout of your json and each json has to have
        # the same structure [obviously not the structure I have here]
        country = json_text['features'][0]['properties']['country']
        city = json_text['features'][0]['properties']['name']
        lonlat = json_text['features'][0]['geometry']['coordinates']
        # here I push a list of data into a pandas DataFrame at row given by 'index'
        jsons_data.loc[index] = [country, city, lonlat]

# now that we have the pertinent json data in our DataFrame let's look at it
print[jsons_data]

for me this prints:

  country           city                   long/lat
0  Canada  Montreal city  [-73.6051013, 45.5115944]
1  Canada        Toronto  [-79.3849008, 43.6529206]

It may be helpful to know that for this code I had two geojsons in a directory name 'json'. Each json had the following structure:

{"features":
[{"properties":
{"osm_key":"boundary","extent":
[-73.9729016,45.7047897,-73.4734865,45.4100756],
"name":"Montreal city","state":"Quebec","osm_id":1634158,
"osm_type":"R","osm_value":"administrative","country":"Canada"},
"type":"Feature","geometry":
{"type":"Point","coordinates":
[-73.6051013,45.5115944]}}],
"type":"FeatureCollection"}

Comparing data from multiple JSON files can get unweildy – unless you leverage Python to give you the data you need.

I often monitor key page speed metrics by testing web pages using WebPagetest or Google Lighthouse using their CLI or Node tools. I save test results as JSON, which is fine for looking at individual snapshots at a later time. But I often end up with folders full of data that cannot really be analyzed manually:

working_directory
└───data
    ├───export1.json
    ├───export2.json
    ├───export3.json
    ├───...

For example, how to compare changes in those metrics over time? Or how to look for a peak in the data?

The following handy little Python 3 script is useful for sifting through a directory full of JSON files and exporting specific values to a CSV for an ad-hoc analysis. It only uses built-in Python modules. I just drop it in my working directory and run it via command line with python3 json-to-csv-exporter.py:

json-to-csv-exporter.py

#!/usr/bin/env python3# Place this Python script in your working directory when you have JSON files in a subdirectory.
# To run the script via command line: "python3 json-to-csv-exporter.py"
import json
import glob
from datetime import datetime
import csv
# Place your JSON data in a directory named 'data/'
src = "data/"
date = datetime.now[]
data = []
# Change the glob if you want to only look through files with specific names
files = glob.glob['data/*', recursive=True]

  
# Loop through files
for single_file in files:
  with open[single_file, 'r'] as f:
    # Use 'try-except' to skip files that may be missing data
    try:
      json_file = json.load[f]
      data.append[[
        json_file['requestedUrl'],
        json_file['fetchTime'],
        json_file['categories']['performance']['score'],
        json_file['audits']['largest-contentful-paint']['numericValue'],
        json_file['audits']['speed-index']['numericValue'],
        json_file['audits']['max-potential-fid']['numericValue'],
        json_file['audits']['cumulative-layout-shift']['numericValue'],
        json_file['audits']['first-cpu-idle']['numericValue'],
        json_file['audits']['total-byte-weight']['numericValue']
      ]]
    except KeyError:
      print[f'Skipping {single_file}']
# Sort the data
data.sort[]
# Add headers
data.insert[0, ['Requested URL', 'Date', 'Performance Score', 'LCP', 'Speed Index', 'FID', 'CLS', 'CPU Idle', 'Total Byte Weight']]
# Export to CSV.
# Add the date to the file name to avoid overwriting it each time.
csv_filename = f'{str[date]}.csv'
with open[csv_filename, "w", newline=""] as f:
    writer = csv.writer[f]
    writer.writerows[data]
print["Updated CSV"]

That gives you a CSV that you can use to create charts or analyze to your heart’s content.

| Requested URL           | Date                     | Performance Score | LCP                | Speed Index        | FID | CLS                 | CPU Idle           | Total Byte Weight |
| ----------------------- | ------------------------ | ----------------- | ------------------ | ------------------ | --- | ------------------- | ------------------ | ----------------- |
| //www.example.com | 2020-08-26T11:19:42.608Z | 0.96              | 1523.257           | 1311.5760337571400 | 153 | 0.5311671549479170  | 1419.257 301319    | 301319            |
| //www.example.com | 2020-08-26T11:32:16.197Z | 0.99              | 1825.5990000000000 | 2656.8016986395200 | 496 | 0.06589290364583330 | 1993.5990000000000 | 301282            |

How do I read all JSON files in a directory in Python?

Read JSON file in Python.

Import json module..

Open the file using the name of the json file witn open[] function..

Read the json file using load[] and put the json data into a variable..

How do I read multiple JSON files in a directory in Python?

Python Parse multiple JSON objects from file.

Create an empty list called jsonList..

Read the file line by line because each line contains valid JSON. i.e., read one JSON object at a time..

Convert each JSON object into Python dict using a json. loads[].

Save this dictionary into a list called result jsonList..

How do I read all files in a directory in Python?

listdir[] method gets the list of all files and directories in a specified directory.

How do I read multiple JSON files in Pyspark?

When you use format["json"] method, you can also specify the Data sources by their fully qualified name as below..

# Read JSON file into dataframe df = spark. read. ... .

# Read multiline json file multiline_df = spark. read. ... .

# Read multiple files df2 = spark. read. ... .

# Read all JSON files from a folder df3 = spark. read. ... .

How do I read all JSON files in a directory in Python?

How do I read multiple JSON files in a directory in Python?

How do I read all files in a directory in Python?

How do I read multiple JSON files in Pyspark?

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề