Mat file to image python

I managed to convert one image, use a loop to convert all.

Please read the comments.

import matplotlib.pyplot as plt
import numpy as np
import h5py
from PIL import Image

#reading v 7.3 mat file in python
#https://stackoverflow.com/questions/17316880/reading-v-7-3-mat-file-in-python

filepath = '1.mat';
f = h5py.File(filepath, 'r') #Open mat file for reading

#In MATLAB the data is arranged as follows:
#cjdata is a MATLAB struct
#cjdata.image is a matrix of type int16

#Before update: read only image data.   
####################################################################
#Read cjdata struct, get image member and convert numpy ndarray of type float
#image = np.array(f['cjdata'].get('image')).astype(np.float64) #In MATLAB: image = cjdata.image
#f.close()
####################################################################

#Update: Read all elements of cjdata struct
####################################################################
#Read cjdata struct
cjdata = f['cjdata'] #

# In MATLAB cjdata = 
# struct with fields:
#   label: 1
#   PID: '100360'
#   image: [512×512 int16]
#   tumorBorder: [38×1 double]
#   tumorMask: [512×512 logical]

#get image member and convert numpy ndarray of type float
image = np.array(cjdata.get('image')).astype(np.float64) #In MATLAB: image = cjdata.image

label = cjdata.get('label')[0,0] #Use [0,0] indexing in order to convert lable to scalar

PID = cjdata.get('PID') # 
PID = ''.join(chr(c) for c in PID) #Convert to string https://stackoverflow.com/questions/12036304/loading-hdf5-matlab-strings-into-python

tumorBorder = np.array(cjdata.get('tumorBorder'))[0] #Use [0] indexing - convert from 2D array to 1D array.

tumorMask = np.array(cjdata.get('tumorMask'))

f.close()
####################################################################

#Convert image to uint8 (before saving as jpeg - jpeg doesn't support int16 format).
#Use simple linear conversion: subtract minimum, and divide by range.
#Note: the conversion is not optimal - you should find a better way.
#Multiply by 255 to set values in uint8 range [0, 255], and covert to type uint8.
hi = np.max(image)
lo = np.min(image)
image = (((image - lo)/(hi-lo))*255).astype(np.uint8)

#Save as jpeg
#https://stackoverflow.com/questions/902761/saving-a-numpy-array-as-an-image
im = Image.fromarray(image)
im.save("1.jpg")

#Display image for testing
imgplot = plt.imshow(image)
plt.show()

Note:
Each mat file contains a struct named cjdata.
Fields of cjdata struct:

cjdata = 

struct with fields:

      label: 1
        PID: '100360'
      image: [512×512 int16]
tumorBorder: [38×1 double]
  tumorMask: [512×512 logical]

When converting images to jpeg, you are loosing information...

A large number of datasets for data science and research, utilize .mat files. In this article, we’ll learn to work with .mat files in Python and explore them in detail.

Why do we use .mat files in Python?

The purpose of a .mat file may not seem obvious right off the bat. But when working with large datasets, the information contained within these files is absolutely crucial for data science/machine learning projects!

This is because the .mat files contain the metadata of every object/record in the dataset.

While the files are not exactly designed for the sole purpose of creating annotations, a lot of researchers use MATLAB for their research and data collection, causing a lot of the annotations that we use in Machine Learning to be present in the form of .mat files.

So, it’s important for a data scientist to understand how to use the .mat files for your projects. These also help you better work with training and testing data sets instead of working with regular CSV files.

Let’s get started!

By default, Python is not capable of reading .mat files. We need to import a library that knows how to handle the file format.

1. Install scipy

Similar to how we use the CSV module to work with .csv files, we’ll import the scipy libary to work with .mat files in Python.

If you don’t already have scipy, you can use the pip command to install the same

Now that we have scipy set up and ready to use, the next step is to open up your python script to finally get the data required from the file.

2. Import the scipy.io.loadmat module

In this example, I will be using the accordion annotations provided by Caltech, in 101 Object Categories.

from scipy.io import loadmat
annots = loadmat('annotation_0001.mat')
print(annots)

Upon execution, printing out annots would provide us with this as the output.

{'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN, Created on: Tue Dec 14 15:57:03 2004', '__version__': '1.0', '__globals__': [], 'box_coord': array([[  2, 300,   1, 260]], dtype=uint16), 'obj_contour': array([[ 37.16574586,  61.94475138,  89.47697974, 126.92081031,
        169.32044199, 226.03683241, 259.07550645, 258.52486188,
        203.46040516, 177.5801105 , 147.84530387, 117.0092081 ,
          1.37384899,   1.37384899,   7.98158379,   0.82320442,
         16.2412523 ,  31.65930018,  38.81767956,  38.81767956],
       [ 58.59300184,  44.27624309,  23.90239411,   0.77532228,
          2.97790055,  61.34622468, 126.87292818, 214.97605893,
        267.83793738, 270.59116022, 298.67403315, 298.67403315,
        187.99447514,  94.93554328,  90.53038674,  77.31491713,
         62.44751381,  62.99815838,  56.94106814,  56.94106814]])}

Starting off, you can see that this single .mat file provides information regarding the version of MATLAB used, the platform, the date of its creation, and a lot more.

The part that we should be focusing on is, however, the box_coord, and the obj_contour.

3. Parse the .mat file structure

If you’ve gone through the information regarding the Annotations provided by Caltech, you’d know that these numbers are the outlines of the corresponding image in the dataset.

In a little more detail, this means that the object present in image 0001, consists of these outlines. A little further down in the article, we’ll be sorting through the numbers, so, don’t worry about it for now.

Parsing through this file structure, we could assign all the contour values to a new Python list.

con_list = [[element for element in upperElement] for upperElement in annots['obj_contour']]

If we printed out con_list, we would receive a simple 2D array.

[[37.16574585635357, 61.94475138121544, 89.47697974217309, 126.92081031307546, 169.32044198895025, 226.03683241252295, 259.0755064456721, 258.52486187845295, 203.4604051565377, 177.58011049723754, 147.84530386740326, 117.0092081031307, 1.3738489871086301, 1.3738489871086301, 7.98158379373848, 0.8232044198894926, 16.24125230202577, 31.65930018416205, 38.81767955801104, 38.81767955801104], [58.59300184162066, 44.27624309392269, 23.90239410681403, 0.7753222836096256, 2.9779005524862328, 61.34622467771641, 126.87292817679563, 214.97605893186008, 267.83793738489874, 270.59116022099454, 298.6740331491713, 298.6740331491713, 187.9944751381216, 94.93554327808477, 90.53038674033152, 77.31491712707185, 62.44751381215474, 62.998158379373876, 56.94106813996319, 56.94106813996319]]

4. Use Pandas dataframes to work with the data

Now that you have the information and the data retrieved, how would you work with it? Continue to use lists? Definitely not.

We use Dataframes as the structure to work with, in that it functions much like a table of data. Neat to look at, and extremely simple to use.

Now, to work with Dataframes, we’ll need to import yet another module, Pandas.

Pandas is an open source data analysis tool, that is used by machine learning enthusiasts and data scientists throughout the world. The operations provided by it are considered vital and fundamental in a lot of data science applications.

We’ll only be working with DataFrames in this article, but, keep in mind that the opportunities provided by Pandas are immense.

Working with the data we’ve received above can be simplified by using pandas to construct a data frame with rows and columns for the data.

# zip provides us with both the x and y in a tuple.
newData = list(zip(con_list[0], con_list[1]))
columns = ['obj_contour_x', 'obj_contour_y']
df = pd.DataFrame(newData, columns=columns)

Now, we have our data in a neat DataFrame!

    obj_contour_x  obj_contour_y
0       37.165746      58.593002
1       61.944751      44.276243
2       89.476980      23.902394
3      126.920810       0.775322
4      169.320442       2.977901
5      226.036832      61.346225
6      259.075506     126.872928
7      258.524862     214.976059
8      203.460405     267.837937
9      177.580110     270.591160
10     147.845304     298.674033
11     117.009208     298.674033
12       1.373849     187.994475
13       1.373849      94.935543
14       7.981584      90.530387
15       0.823204      77.314917
16      16.241252      62.447514
17      31.659300      62.998158
18      38.817680      56.941068
19      38.817680      56.941068

As you can see, we have the X and Y coordinates for the image’s outline in a simple DataFrame of two columns.

This should provide you with some clarity about the nature of the data in the file.

The process of creating DataFrames for each .mat file is different but, with experience and practice, creating them out of .mat files should come naturally to you.

That’s all for this article!

Conclusion

You now know how to work with .mat files in Python, and how to create dataframes in pandas with its content.

The next steps to work with this data would be to and create your own models, or employ existing ones for training or testing your copy of the dataset.

References

  1. Official Scipy.io Documentation
  2. Official Pandas DataFrame Documentation

How do I convert a mat file to an image?

Convert Mat To Jpg.
Download and install the latest version of Filestar..
Right click on one or more Mat file(s) on your desktop and select Convert with Filestar..
Type convert to jpg in the search box..
Press Convert..

How do I read a .MAT image in Python?

By default, Python is not capable of reading . mat files. We need to import a library that knows how to handle the file format..
Install scipy. Similar to how we use the CSV module to work with . ... .
Import the scipy. io. ... .
Parse the . ... .
Use Pandas dataframes to work with the data..

How do you process a .MAT file in Python?

mat files in Python?.
First read the documentation..
Use a hex editor (such as HxD) and look into a reference . mat -file you want to parse..
Try to figure out the meaning of each byte by saving the bytes to a . ... .
Use classes to save each data element (such as miCOMPRESSED , miMATRIX , mxDOUBLE , or miINT32 ).

How do you convert .MAT to .jpg in MATLAB?

Direct link to this answer.
% Load the image variable from the MAT file into a structure s..
s = load(yourFileName);.
% Extract the image from the structure..
rgbImage = s. ... .
% Write to disk in a jpg format, which you should never use for image analysis..
imwrite(rgbImage, 'my image.jpg');.