Save sparse matrix to file python

scipy.sparse.save_npz(file, matrix, compressed=True)[source]#

Save a sparse matrix to a file using .npz format.

Parametersfilestr or file-like object

Either the file name (string) or an open file (file-like object) where the data will be saved. If file is a string, the .npz extension will be appended to the file name if it is not already there.

matrix: spmatrix (format: ``csc``, ``csr``, ``bsr``, ``dia`` or coo``)

The sparse matrix to save.

compressedbool, optional

Allow compressing the file. Default: True

Examples

Store sparse matrix to disk, and load it again:

>>> import scipy.sparse
>>> sparse_matrix = scipy.sparse.csc_matrix(np.array([[0, 0, 3], [4, 0, 0]]))
>>> sparse_matrix
<2x3 sparse matrix of type ''
   with 2 stored elements in Compressed Sparse Column format>
>>> sparse_matrix.toarray()
array([[0, 0, 3],
       [4, 0, 0]], dtype=int64)

>>> scipy.sparse.save_npz('/tmp/sparse_matrix.npz', sparse_matrix)
>>> sparse_matrix = scipy.sparse.load_npz('/tmp/sparse_matrix.npz')

>>> sparse_matrix
<2x3 sparse matrix of type ''
   with 2 stored elements in Compressed Sparse Column format>
>>> sparse_matrix.toarray()
array([[0, 0, 3],
       [4, 0, 0]], dtype=int64)

Here is performance comparison of the three most upvoted answers using Jupyter notebook. The input is a 1M x 100K random sparse matrix with density 0.001, containing 100M non-zero values:

from scipy.sparse import random
matrix = random(1000000, 100000, density=0.001, format='csr')

matrix
<1000000x100000 sparse matrix of type ''
with 100000000 stored elements in Compressed Sparse Row format>

io.mmwrite / io.mmread

from scipy.sparse import io

%time io.mmwrite('test_io.mtx', matrix)
CPU times: user 4min 37s, sys: 2.37 s, total: 4min 39s
Wall time: 4min 39s

%time matrix = io.mmread('test_io.mtx')
CPU times: user 2min 41s, sys: 1.63 s, total: 2min 43s
Wall time: 2min 43s    

matrix
<1000000x100000 sparse matrix of type ''
with 100000000 stored elements in COOrdinate format>    

Filesize: 3.0G.

(note that the format has been changed from csr to coo).

np.savez / np.load

import numpy as np
from scipy.sparse import csr_matrix

def save_sparse_csr(filename, array):
    # note that .npz extension is added automatically
    np.savez(filename, data=array.data, indices=array.indices,
             indptr=array.indptr, shape=array.shape)

def load_sparse_csr(filename):
    # here we need to add .npz extension manually
    loader = np.load(filename + '.npz')
    return csr_matrix((loader['data'], loader['indices'], loader['indptr']),
                      shape=loader['shape'])


%time save_sparse_csr('test_savez', matrix)
CPU times: user 1.26 s, sys: 1.48 s, total: 2.74 s
Wall time: 2.74 s    

%time matrix = load_sparse_csr('test_savez')
CPU times: user 1.18 s, sys: 548 ms, total: 1.73 s
Wall time: 1.73 s

matrix
<1000000x100000 sparse matrix of type ''
with 100000000 stored elements in Compressed Sparse Row format>

Filesize: 1.1G.

cPickle

import cPickle as pickle

def save_pickle(matrix, filename):
    with open(filename, 'wb') as outfile:
        pickle.dump(matrix, outfile, pickle.HIGHEST_PROTOCOL)
def load_pickle(filename):
    with open(filename, 'rb') as infile:
        matrix = pickle.load(infile)    
    return matrix    

%time save_pickle(matrix, 'test_pickle.mtx')
CPU times: user 260 ms, sys: 888 ms, total: 1.15 s
Wall time: 1.15 s    

%time matrix = load_pickle('test_pickle.mtx')
CPU times: user 376 ms, sys: 988 ms, total: 1.36 s
Wall time: 1.37 s    

matrix
<1000000x100000 sparse matrix of type ''
with 100000000 stored elements in Compressed Sparse Row format>

Filesize: 1.1G.

Note: cPickle does not work with very large objects (see this answer). In my experience, it didn't work for a 2.7M x 50k matrix with 270M non-zero values. np.savez solution worked well.

Conclusion

(based on this simple test for CSR matrices) cPickle is the fastest method, but it doesn't work with very large matrices, np.savez is only slightly slower, while io.mmwrite is much slower, produces bigger file and restores to the wrong format. So np.savez is the winner here.

How do I save a sparse matrix in Python?

Save a sparse matrix to a file using . npz format. Either the file name (string) or an open file (file-like object) where the data will be saved.

How do you read a sparse matrix in Python?

To check whether a matrix is a sparse matrix, we only need to check the total number of elements that are equal to zero. If this count is more than (m * n)/2, we return true.

How do you convert sparse to matrix in Python?

Approach:.
Create an empty list which will represent the sparse matrix list..
Iterate through the 2D matrix to find non zero elements..
If an element is non zero, create a temporary empty list..
Append the row value, column value, and the non zero element itself into the temporary list..

What is Lil_matrix?

lil_matrix((M, N), [dtype]) to construct an empty matrix with shape (M, N) dtype is optional, defaulting to dtype='d'. Notes. Sparse matrices can be used in arithmetic operations: they support addition, subtraction, multiplication, division, and matrix power.