One option is listing all files in a directory with os.listdir and then finding only those that end in '.json':
import os, json
import pandas as pd
path_to_json = 'somedir/'
json_files = [pos_json for pos_json in os.listdir[path_to_json] if pos_json.endswith['.json']]
print[json_files] # for me this prints ['foo.json']
Now you can use pandas DataFrame.from_dict to read in the json [a python dictionary at this point] to a pandas dataframe:
montreal_json = pd.DataFrame.from_dict[many_jsons[0]]
print montreal_json['features'][0]['geometry']
Prints:
{u'type': u'Point', u'coordinates': [-73.6051013, 45.5115944]}
In this case I had appended some jsons to a list many_jsons
. The first json in my list is actually a geojson with some geo data on Montreal. I'm familiar with the content already so I print out the 'geometry' which gives me the lon/lat of Montreal.
The following code sums up everything above:
import os, json
import pandas as pd
# this finds our json files
path_to_json = 'json/'
json_files = [pos_json for pos_json in os.listdir[path_to_json] if pos_json.endswith['.json']]
# here I define my pandas Dataframe with the columns I want to get from the json
jsons_data = pd.DataFrame[columns=['country', 'city', 'long/lat']]
# we need both the json and an index number so use enumerate[]
for index, js in enumerate[json_files]:
with open[os.path.join[path_to_json, js]] as json_file:
json_text = json.load[json_file]
# here you need to know the layout of your json and each json has to have
# the same structure [obviously not the structure I have here]
country = json_text['features'][0]['properties']['country']
city = json_text['features'][0]['properties']['name']
lonlat = json_text['features'][0]['geometry']['coordinates']
# here I push a list of data into a pandas DataFrame at row given by 'index'
jsons_data.loc[index] = [country, city, lonlat]
# now that we have the pertinent json data in our DataFrame let's look at it
print[jsons_data]
for me this prints:
country city long/lat
0 Canada Montreal city [-73.6051013, 45.5115944]
1 Canada Toronto [-79.3849008, 43.6529206]
It may be helpful to know that for this code I had two geojsons in a directory name 'json'. Each json had the following structure:
{"features":
[{"properties":
{"osm_key":"boundary","extent":
[-73.9729016,45.7047897,-73.4734865,45.4100756],
"name":"Montreal city","state":"Quebec","osm_id":1634158,
"osm_type":"R","osm_value":"administrative","country":"Canada"},
"type":"Feature","geometry":
{"type":"Point","coordinates":
[-73.6051013,45.5115944]}}],
"type":"FeatureCollection"}
Comparing data from multiple JSON files can get unweildy – unless you leverage Python to give you the data you need.
I often monitor key page speed metrics by testing web pages using WebPagetest or Google Lighthouse using their CLI or Node tools. I save test results as JSON, which is fine for looking at individual snapshots at a later time. But I often end up with folders full of data that cannot really be analyzed manually:
working_directory
└───data
├───export1.json
├───export2.json
├───export3.json
├───...
For example, how to compare changes in those metrics over time? Or how to look for a peak in the data?
The following handy little Python 3 script is useful for sifting through a directory full of JSON files and exporting specific values to a CSV for an ad-hoc analysis. It only uses built-in Python modules. I just drop it in my working directory and run it via command line with python3 json-to-csv-exporter.py
:
json-to-csv-exporter.py
#!/usr/bin/env python3# Place this Python script in your working directory when you have JSON files in a subdirectory.
# To run the script via command line: "python3 json-to-csv-exporter.py"
import json
import glob
from datetime import datetime
import csv
# Place your JSON data in a directory named 'data/'
src = "data/"
date = datetime.now[]
data = []
# Change the glob if you want to only look through files with specific names
files = glob.glob['data/*', recursive=True]
# Loop through files
for single_file in files:
with open[single_file, 'r'] as f:
# Use 'try-except' to skip files that may be missing data
try:
json_file = json.load[f]
data.append[[
json_file['requestedUrl'],
json_file['fetchTime'],
json_file['categories']['performance']['score'],
json_file['audits']['largest-contentful-paint']['numericValue'],
json_file['audits']['speed-index']['numericValue'],
json_file['audits']['max-potential-fid']['numericValue'],
json_file['audits']['cumulative-layout-shift']['numericValue'],
json_file['audits']['first-cpu-idle']['numericValue'],
json_file['audits']['total-byte-weight']['numericValue']
]]
except KeyError:
print[f'Skipping {single_file}']
# Sort the data
data.sort[]
# Add headers
data.insert[0, ['Requested URL', 'Date', 'Performance Score', 'LCP', 'Speed Index', 'FID', 'CLS', 'CPU Idle', 'Total Byte Weight']]
# Export to CSV.
# Add the date to the file name to avoid overwriting it each time.
csv_filename = f'{str[date]}.csv'
with open[csv_filename, "w", newline=""] as f:
writer = csv.writer[f]
writer.writerows[data]
print["Updated CSV"]
That gives you a CSV that you can use to create charts or analyze to your heart’s content.
| Requested URL | Date | Performance Score | LCP | Speed Index | FID | CLS | CPU Idle | Total Byte Weight |
| ----------------------- | ------------------------ | ----------------- | ------------------ | ------------------ | --- | ------------------- | ------------------ | ----------------- |
| //www.example.com | 2020-08-26T11:19:42.608Z | 0.96 | 1523.257 | 1311.5760337571400 | 153 | 0.5311671549479170 | 1419.257 301319 | 301319 |
| //www.example.com | 2020-08-26T11:32:16.197Z | 0.99 | 1825.5990000000000 | 2656.8016986395200 | 496 | 0.06589290364583330 | 1993.5990000000000 | 301282 |