Skip to content

AWS ParallelCluster v2.4.1

Compare
Choose a tag to compare
@demartinofra demartinofra released this 29 Jul 10:42
· 205 commits to master since this release
8f5359f

We're excited to announce the release of AWS ParallelCluster 2.4.1.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

Docs

New docs are available here: https://docs.aws.amazon.com/parallelcluster/latest/ug/

Enhancements

  • Add support for ap-east-1 region (Hong Kong)
  • Add possibility to specify instance type to use when building custom AMIs with pcluster createami
  • Speed up cluster creation by having compute nodes starting together with master node
  • Enable ASG CloudWatch metrics for the ASG managing compute nodes
  • Install Intel MPI 2019u4 on Amazon Linux, Centos 7 and Ubuntu 1604
  • Upgrade Elastic Fabric Adapter (EFA) to version 1.4.1 that supports Intel MPI
  • Run all node daemons and cookbook recipes in isolated Python virtualenvs. This allows our code to always run with the required Python dependencies and solves all conflicts and runtime failures that were being caused by user packages installed in the system Python
  • Torque:
    • Process nodes added to or removed from the cluster in batches in order to speed up cluster scaling
    • Scale up only if required CPU/nodes can be satisfied
    • Scale down if pending jobs have unsatisfiable CPU/nodes requirements
    • Add support for jobs in hold/suspended state (this includes job dependencies)
    • Automatically terminate and replace faulty or unresponsive compute nodes
    • Add retries in case of failures when adding or removing nodes
    • Add support for ncpus reservation and multi nodes resource allocation (e.g. -l nodes=2:ppn=3+3:ppn=6)
    • Optimized Torque global configuration to faster react to the dynamic cluster scaling

Changes

  • Update EFA installer to a new version, note this changes the location of mpicc and mpirun. To avoid breaking existing code, we recommend you use the modulefile module load openmpi and which mpicc for anything that requires the full path
  • Eliminate Launch Configuration and use Launch Templates in all the regions
  • Torque: upgrade to version 6.1.2
  • Run all ParallelCluster daemons with Python 3.6 in a virtualenv. Daemons code now supports Python >= 3.5

Bug Fixes

  • Fix issue with sanity check at creation time that was preventing clusters from being created in private subnets
  • Fix pcluster configure when relative config path is used
  • Make FSx Substack depend on ComputeSecurityGroupIngress to keep FSx from trying to create prior to the SG allowing traffic within itself
  • Restore correct value for filehandle_limit that was getting reset when setting memory_limit for EFA
  • Torque: fix compute nodes locking mechanism to prevent job scheduling on nodes being terminated
  • Restore logic that was automatically adding compute nodes identity to SSH known_hosts file
  • Slurm: fix issue that was causing the ParallelCluster daemons to fail when the cluster is stopped and an empty compute nodes file is imported in Slurm config

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192