Skip to content

Heat 1.5 Release: distributed matrix factorization and more

Latest
Compare
Choose a tag to compare
@ClaudiaComito ClaudiaComito released this 28 Oct 12:31
· 28 commits to refs/heads/release/1.5.x since this release
7e15ad2

Heat 1.5 Release Notes


Overview

With Heat 1.5 we release the first set of features developed within the ESAPCA project funded by the European Space Agency (ESA).

The main focus of this release is on distributed linear algebra operations, such as tall-skinny SVD, batch matrix multiplication, and triangular solver. We also introduce vectorization via vmap across MPI processes, and batch-parallel random number generation as default for distributed operations.

This release also includes a new class for distributed Compressed Sparse Column matrices, paving the way for future implementation of distributed sparse matrix multiplication.

On the performance side, our new array redistribution via MPI Custom Datatypes provides significant speed-up in operations that require it, such as FFTs (see Dalcin et al., 2018).

We are grateful to our community of users, students, open-source contributors, the European Space Agency and the Helmholtz Association for their support and feedback.

Highlights

  • [ESAPCA] Distributed tall-skinny SVD: ht.linalg.svd (by @mrfh92)
  • Distributed batch matrix multiplication: ht.linalg.matmul (by @FOsterfeld)
  • Distributed solver for triangular systems: ht.linalg.solve_triangular (by @FOsterfeld)
  • Vectorization across MPI processes: ht.vmap (by @mrfh92)

Other Changes

Performance Improvements

  • #1493 Redistribution speed-up via MPI Custom Datatypes available by default in ht.resplit (by @JuanPedroGHM)

Sparse

  • #1377 New class: Distributed Compressed Sparse Column Matrix ht.sparse.DCSC_matrix() (by @Mystic-Slice)

Signal Processing

RNG

  • #1508 Introduce batch-parallel RNG as default for distributed operations (by @mrfh92)

Statistics

  • #1420
    Support sketched percentile/median for large datasets with ht.percentile(sketched=True) (and ht.median) (by @mrhf92)
  • #1510 Support multiple axes for distributed ht.percentile and ht.median (by @ClaudiaComito)

Manipulations

I/O

  • #1602 Improve load balancing when loading .npy files from path (by @Reisii)
  • #1551 Improve load balancing when loading .csv files from path (by @Reisii)

Machine Learning

  • #1593 Improved batch-parallel clustering ht.cluster.BatchParallelKMeans and ht.cluster.BatchParallelKMedians (by @mrfh92)

Deep Learning

Other Updates

Contributors

@mrfh92, @FOsterfeld, @JuanPedroGHM, @Mystic-Slice, @ClaudiaComito, @Reisii, @mtar and @krajsek