Fast and full-featured Matrix Market file I/O package for Python.
Fastest way to read and write any Matrix Market .mtx
file into a SciPy sparse matrix, sparse coordinate (triplet) ndarray
s, or a dense ndarray
.
Implemented as a Python binding of the C++ fast_matrix_market library.
pip install fast_matrix_market
conda install fast_matrix_market
As of version 1.12, scipy.io.mmread
and scipy.io.mmwrite
are based on fast_matrix_market. If those methods suit your needs then there is no need to use this package.
The following are extra features supported by the stand-alone FMM:
-
Directly write CSC/CSR matrices with no COO intermediary.
-
Vector files
Read 1D vector files.scipy.io.mmread()
throws aValueError
. -
longdouble
Read and writelongdouble
/longcomplex
values for more floating-point precision on platforms that support it (e.g. 80-bit floats).Just pass
long_type=True
argument to any read method to uselongdouble
arrays. SciPy can writelongdouble
matrices but reads usedouble
precision.Note: Many platforms do not offer any precision greater than
double
even if thelongdouble
type exists. On those platformslongdouble == double
so check your NumPy for support. As of writing only Linux tends to havelongdouble > double
. Deprecation Warning: this type is going away in future versions of NumPy and SciPy.
FMM also ships wheels for PyPy and for some older Python versions only supported by older versions of SciPy.
The fast_matrix_market.mmread()
and mmwrite()
methods are direct replacements for scipy.io.mmread
and mmwrite
.
Compared to SciPy v1.10.0:
-
Significant performance boost
The bytes in the plot refer to MatrixMarket file length. All cores on the system are used by default, use the
parallelism
argument to override. SciPy's routines are single-threaded. -
64-bit indices, but only if the matrix dimensions require it.
scipy.io.mmread()
crashes on large matrices (dimensions > 231) because it uses 32-bit indices on most platforms. -
See comparison with SciPy 1.12.
scipy.io.mmwrite()
will search the matrix for symmetry if thesymmetry
argument is not specified. This is a very slow process that significantly impacts writing time for all matrices, including non-symmetric ones. It can be disabled by settingsymmetry="general"
, but that is easily forgotten.fast_matrix_market.mmwrite()
only looks for symmetries if thefind_symmetry=True
argument is passed.
import fast_matrix_market as fmm
>>> a = fmm.mmread("eye3.mtx")
>>> a
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in COOrdinate format>
>>> print(a)
(0, 0) 1.0
(1, 1) 1.0
(2, 2) 1.0
>>> (data, (rows, cols)), shape = fmm.read_coo("eye3.mtx")
>>> rows, cols, data
(array([0, 1, 2], dtype=int32), array([0, 1, 2], dtype=int32), array([1., 1., 1.]))
>>> a = fmm.read_array("eye3.mtx")
>>> a
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
>>> fmm.mmwrite("matrix_out.mtx", a)
>>> bio = io.BytesIO()
>>> fmm.mmwrite(bio, a)
>>> header = fmm.read_header("eye3.mtx")
header(shape=(3, 3), nnz=3, comment="3-by-3 identity matrix", object="matrix", format="coordinate", field="real", symmetry="general")
>>> header.shape
(3, 3)
>>> header.to_dict()
{'shape': (3, 3), 'nnz': 3, 'comment': '3-by-3 identity matrix', 'object': 'matrix', 'format': 'coordinate', 'field': 'real', 'symmetry': 'general'}
All methods other than read_header
and mminfo
accept a parallelism
parameter that controls the number of threads used. Default parallelism is equal to the core count of the system.
mat = fmm.mmread("matrix.mtx", parallelism=2) # will use 2 threads
Alternatively, use threadpoolctl:
with threadpoolctl.threadpool_limits(limits=2):
mat = fmm.mmread("matrix.mtx") # will use 2 threads
Replace scipy.io.mmread
with fast_matrix_market.mmread
to quickly see if your scripts would benefit from a refactor:
import scipy.io
import fast_matrix_market as fmm
scipy.io.mmread = fmm.mmread
scipy.io.mmwrite = fmm.mmwrite
- No dependencies to read/write MatrixMarket headers (i.e.
read_header()
,mminfo()
). numpy
to read/write arrays (i.e.read_array()
andread_coo()
). SciPy is not required.scipy
to read/writescipy.sparse
sparse matrices (i.e.read_scipy()
andmmread()
).
Neither numpy
nor scipy
are listed as package dependencies, and those packages are imported only by the methods that need them.
This means that you may use read_coo()
without having SciPy installed.
This Python binding is implemented using pybind11 and built with scikit-build-core.
All code is in the python/ directory. If you make any changes simply install the package directory to build it:
pip install python/ -v