How to compare two yaml files in python

If we have two yaml files how would we compare keys and print mismatched and/or missing keys? I tried DeepDiff but it takes dictionaries, iterables, etc, how would I convert yaml files to dictionary and use DeepDiff or any other method?

How to compare two yaml files in python

martineau

115k25 gold badges160 silver badges284 bronze badges

asked Sep 2, 2020 at 9:04

0

Following worked for me:

import yaml
from deepdiff import DeepDiff

def yaml_as_dict(my_file):
    my_dict = {}
    with open(my_file, 'r') as fp:
        docs = yaml.safe_load_all(fp)
        for doc in docs:
            for key, value in doc.items():
                my_dict[key] = value
    return my_dict

if __name__ == '__main__':
    a = yaml_as_dict(yaml_file1)
    b = yaml_as_dict(yaml_file2)
    ddiff = DeepDiff(a, b, ignore_order=True)
    print(ddiff)

answered Sep 2, 2020 at 10:09

FaisalFaisal

1491 silver badge12 bronze badges

1

Try out this package deepdiff.I had a similar usecase and found it very helpfull.

answered Sep 2, 2020 at 9:11

OriginOrigin

7341 gold badge6 silver badges16 bronze badges

Use PyYAML To convert to flattened dict, then compare.

answered Sep 2, 2020 at 9:10

How to compare two yaml files in python

Abhijit SarkarAbhijit Sarkar

19.8k16 gold badges101 silver badges186 bronze badges

To load a yaml file as a dictionary you can use PyYAML:

import yaml

with open("example.yaml", 'r') as fp:
    d = yaml.safe_load(fp)

answered Sep 2, 2020 at 9:11

bp7070bp7070

3022 silver badges6 bronze badges

Python is a powerful programming language widely used in many applications. One of its many practical applications is in working with YAML files. 

In this article, we will learn how to use the yaml module, ruamel.yaml and some other ways to compare two YAML files in python to see if they are equivalent.

What is a YAML file?

A YAML file is used for storing data in the YAML format. It is a human-readable data serialization format that helps store data in a structured way. YAML files are often used as configuration, data, and script files.

These files are easy to read and understand; they can be edited with any text editor and are often used in conjunction with other data files, such as JSON files.

They help store data of variety of formats, including:

  • Strings
  • Integers
  • Floats
  • Booleans
  • Arrays
  • Objects

Some of the characteristics of YAML files include:

  • Human-readable
  • Machine-readable
  • Easy to edit
  • Easy to parse
  • Well suited for use in configuration files
  • Well suited for use in localization files

How to compare two yaml files in python

Python provides several modules for working with XML. Two of the most popular are the standard library’s …

Comparing YAML Files in Python

Let’s compare two Yaml files using different options that are available in python. Consider having two Yaml files (named file1.yaml and file2.yaml) with the following data.

---
name: John
age: 25
city: New York
---
name: Jane
age: 24
city: Paris
hobby:
 - cricket
 - hockey
 - football

Now let’s see different approaches to compare two yaml files in python.

Advertisements

01.

Using difflib.unified_diff()

If we want to compare the two files line by line, we can use the difflib.unified_diff() function. Here python compares files and shows differences between 2 yaml files.

import difflib

# open the two files to be compared
file1 = open('file1.yaml', 'r')
file2 = open('file2.yaml', 'r')

# read the two files
text1 = file1.readlines()
text2 = file2.readlines()

# compare the two files using unified_diff()
for line in difflib.unified_diff(text1, text2, fromfile='file1.yaml', tofile='file2.yaml'):
   print(line)

# close the files
file1.close()
file2.close()

If we want to compare the two files character by character, we can use the difflib.ndiff() function. Here python compares files and shows differences between 2 yaml files character by character.

import difflib
import yaml

data1 = '''
name: John
age: 25
city: New York
'''

data2 = '''
name: Jane
age: 24
city: Paris
'''

yaml1 = yaml.safe_load(data1)
yaml2 = yaml.safe_load(data2)

diff = difflib.ndiff(yaml.dump(yaml1, default_flow_style=False),
                    yaml.dump(yaml2, default_flow_style=False))
print(''.join(diff))

03.

Using yaml python library

Advertisements

The code below reads the two files, compares them, and determines whether the two files are identical or not. Here it does not print the character or line differences between two files.

import yaml

file1 = yaml.safe_load(open("file1.yaml"))
file2 = yaml.safe_load(open("file2.yaml"))

if file1 == file2:
   print("The files are the same.")
else:
   print("The files are different.")

04.

Without using the yaml library

Here also, the code reads the yaml file and prints the result as “files are the same” if they are equal; otherwise, “files are different.” Here we have not used any library to read the yaml files.

Advertisements
with open('file1.yaml') as f1, open('file2.yaml') as f2:
   for line1, line2 in zip(f1, f2):
       if line1 != line2:
           print("files are different")
           break
   else:
       print("files are the same")

05.

Using Ruamel.yaml library

Here we are comparing two yaml files using the ruamel.yaml python library. The code does not tell the character or line difference between two files, instead it just tells if two files are identical or different.

import ruamel.yaml

def compare_yaml_files(file1, file2):
  with open(file1) as f1:
      with open(file2) as f2:
          data1 = ruamel.yaml.load(f1, Loader=ruamel.yaml.RoundTripLoader)
          data2 = ruamel.yaml.load(f2, Loader=ruamel.yaml.RoundTripLoader)
          if data1 != data2:
              print(file1,'and',file2,'are not identical.')
          else:
              print(file1,'and',file2,'are identical.')

if __name__ == '__main__':
  file1 = "file1.yaml"
  file2 = "file2.yaml"
  compare_yaml_files(file1, file2)

Over here, let’s consider the two yaml files that have the below data.

---
name: Jane
age: 24
city: Paris
hobby:
 - cricket
 - hockey
 - football
---
name: Jane
age: 25
city: Paris
hobby:
 - cricket
 - hockey
 - football

In file2.yaml we have age = 24 but in file2_copy.yaml we have age=25. And rest data is the same. Now let’s see how we can spot this difference.

import yaml

file1 = 'file2.yaml'
file2 = 'file2_copy.yaml'

with open(file1) as f1, open(file2) as f2:
   data1 = yaml.safe_load(f1)
   data2 = yaml.safe_load(f2)

diff = {}
for key in data1:
   if key not in data2:
       diff[key] = data1[key]
   elif data1[key] != data2[key]:
       diff[key] = (data1[key], data2[key])

for key in data2:
   if key not in data1:
       diff[key] = data2[key]

print(diff)

The above code will print the below output:

{'age': (24, 25)}

As you can see, it has successfully found the difference between the two files.

How do I compare two YAML files?

If we want to compare the two files character by character, we can use the difflib. ndiff() function. Here python compares files and shows differences between 2 yaml files character by character.

How do I view a YAML file in Python?

Follow the below instructions:.
Open PyYAML GitHub repository..
Click on the code section, and download the ZIP file..
Unpack or Extract the Zip archive..
Open command prompt or terminal..
Change the PyYAML directory where the zip file is extracted..
Run a python setup.py install command to install PyYAML..

How do you parse and extract data from a YAML file in Python?

How to parse and extract data from a YAML file in Python.
a_yaml_file = open("example.yaml").
parsed_yaml_file = yaml. load(a_yaml_file, Loader=yaml. FullLoader).
print(parsed_yaml_file["a_dictionary"]).
print(parsed_yaml_file. get("a_list")).

Does Python have a built in YAML parser?

However, Python lacks built-in support for the YAML data format, commonly used for configuration and serialization, despite clear similarities between the two languages. In this tutorial, you'll learn how to work with YAML in Python using the available third-party libraries, with a focus on PyYAML.