Skip to content

Commit

Permalink
Merge Release 2.4.1
Browse files Browse the repository at this point in the history
Merge Release 2.4.1
  • Loading branch information
demartinofra authored Jul 29, 2019
2 parents f4d9378 + 22f26e0 commit 8f5359f
Show file tree
Hide file tree
Showing 66 changed files with 2,110 additions and 1,454 deletions.
3 changes: 2 additions & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
**Please See** [Git Pull Request Instructions](https://github.com/aws/aws-parallelcluster/wiki/Git-Pull-Request-Instructions)

*Issue #, if available:*

*Description of changes:*


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
48 changes: 48 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,54 @@
CHANGELOG
=========

2.4.1
=====

**ENHANCEMENTS**

* Add support for ap-east-1 region (Hong Kong)
* Add possibility to specify instance type to use when building custom AMIs with ``pcluster createami``
* Speed up cluster creation by having compute nodes starting together with master node
* Enable ASG CloudWatch metrics for the ASG managing compute nodes
* Install Intel MPI 2019u4 on Amazon Linux, Centos 7 and Ubuntu 1604
* Upgrade Elastic Fabric Adapter (EFA) to version 1.4.1 that supports Intel MPI
* Run all node daemons and cookbook recipes in isolated Python virtualenvs. This allows our code to always run with the
required Python dependencies and solves all conflicts and runtime failures that were being caused by user packages
installed in the system Python

* Torque:

* Process nodes added to or removed from the cluster in batches in order to speed up cluster scaling
* Scale up only if required CPU/nodes can be satisfied
* Scale down if pending jobs have unsatisfiable CPU/nodes requirements
* Add support for jobs in hold/suspended state (this includes job dependencies)
* Automatically terminate and replace faulty or unresponsive compute nodes
* Add retries in case of failures when adding or removing nodes
* Add support for ncpus reservation and multi nodes resource allocation (e.g. -l nodes=2:ppn=3+3:ppn=6)
* Optimized Torque global configuration to faster react to the dynamic cluster scaling

**CHANGES**

* Update EFA installer to a new version, note this changes the location of ``mpicc`` and ``mpirun``.
To avoid breaking existing code, we recommend you use the modulefile ``module load openmpi`` and ``which mpicc``
for anything that requires the full path
* Eliminate Launch Configuration and use Launch Templates in all the regions
* Torque: upgrade to version 6.1.2
* Run all ParallelCluster daemons with Python 3.6 in a virtualenv. Daemons code now supports Python >= 3.5

**BUG FIXES**

* Fix issue with sanity check at creation time that was preventing clusters from being created in private subnets
* Fix pcluster configure when relative config path is used
* Make FSx Substack depend on ComputeSecurityGroupIngress to keep FSx from trying to create prior to the SG
allowing traffic within itself
* Restore correct value for ``filehandle_limit`` that was getting reset when setting ``memory_limit`` for EFA
* Torque: fix compute nodes locking mechanism to prevent job scheduling on nodes being terminated
* Restore logic that was automatically adding compute nodes identity to SSH ``known_hosts`` file
* Slurm: fix issue that was causing the ParallelCluster daemons to fail when the cluster is stopped and an empty compute nodes file
is imported in Slurm config


2.4.0
=====

Expand Down
14 changes: 8 additions & 6 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,22 +83,24 @@ You can view the running compute hosts:
For more information on any of these steps see the `Getting Started Guide`_.

.. _`Getting Started Guide`: https://aws-parallelcluster.readthedocs.io/en/latest/getting_started.html
.. _`Getting Started Guide`: https://docs.aws.amazon.com/parallelcluster/latest/ug/getting_started.html

Documentation
-------------

Documentation is part of the project and is published to -
https://aws-parallelcluster.readthedocs.io/. Of most interest to new users is
the Getting Started Guide -
https://aws-parallelcluster.readthedocs.io/en/latest/getting_started.html.
We've been working hard to greatly improve the `Documentation <https://docs.aws.amazon.com/parallelcluster/latest/ug/>`_, it's now published in 10 languages, one of the many benefits of being hosted on AWS Docs. Of most interest to new users is
the `Getting Started Guide <https://docs.aws.amazon.com/parallelcluster/latest/ug/getting_started.html>`_.

If you have changes you would like to see in the docs, please either submit feedback using the feedback link at the bottom
of each page or create an issue or pull request for the project at:
https://github.com/awsdocs/aws-parallelcluster-user-guide.

Issues
------

Please open a GitHub issue for any feedback or issues:
https://github.com/aws/aws-parallelcluster. There is also an active AWS
HPC forum which may be helpful:https://forums.aws.amazon.com/forum.jspa?forumID=192.
HPC forum which may be helpful: https://forums.aws.amazon.com/forum.jspa?forumID=192.

Changes
-------
Expand Down
194 changes: 97 additions & 97 deletions amis.txt
Original file line number Diff line number Diff line change
@@ -1,102 +1,102 @@
# alinux
ap-northeast-1: ami-0dcc18768374b4441
ap-northeast-2: ami-022e7c66ccb807c9f
ap-northeast-3: ami-04402be7b85999df8
ap-south-1: ami-0a14b1f0e7427a4bb
ap-southeast-1: ami-02079735c20c1ac4e
ap-southeast-2: ami-0c65952cdec26ae39
ca-central-1: ami-01f28f8381746746f
cn-north-1: ami-0da67c26ce2e8d111
cn-northwest-1: ami-03dc8f759de9de690
eu-central-1: ami-0ff6d2a86b9199e82
eu-north-1: ami-0cb08caa10d113ed7
eu-west-1: ami-0b5c32b12b9c340d0
eu-west-2: ami-0c218c2aaa7185f03
eu-west-3: ami-011e0eee21d52f23e
sa-east-1: ami-0d154ae55458941fd
us-east-1: ami-0d130bdfab2037f8a
us-east-2: ami-00d2a10466c577ac7
us-gov-east-1: ami-0f5003922daf22962
us-gov-west-1: ami-ba83fbdb
us-west-1: ami-0b6f7961ee845966e
us-west-2: ami-0d611d90619419e93
ap-east-1: ami-0548157406b20efd7
ap-northeast-1: ami-0266f3876f58f4c10
ap-northeast-2: ami-0b83279e099fee532
ap-south-1: ami-08d877dddd63d4f11
ap-southeast-1: ami-0797836c2582f62b3
ap-southeast-2: ami-097287dbf20f32cd2
ca-central-1: ami-00695df58bfe70532
cn-north-1: ami-0bf01904468ac34e0
cn-northwest-1: ami-07ce7fae883830295
eu-central-1: ami-0ae496b08133b7003
eu-north-1: ami-006772a1e3158c024
eu-west-1: ami-03112372c8bd7886e
eu-west-2: ami-0f2ed960e7413b152
eu-west-3: ami-0feb90a4cf119551f
sa-east-1: ami-02ce797ed2bac903a
us-east-1: ami-0fd18b144da8357b7
us-east-2: ami-0257f6012767b54c9
us-gov-east-1: ami-002752c6ec611554d
us-gov-west-1: ami-527e3e33
us-west-1: ami-03a203cbdfe6ef914
us-west-2: ami-0340aea5e9e9e5202
# centos6
ap-northeast-1: ami-086781b933db101a5
ap-northeast-2: ami-07d646c87d889d816
ap-northeast-3: ami-082ece6e5fe8f6fd1
ap-south-1: ami-02389426198baf430
ap-southeast-1: ami-02105387481bd0ad0
ap-southeast-2: ami-0050fad9761b3957c
ca-central-1: ami-0e70755a47200df23
eu-central-1: ami-03979ebb9cfee2ccc
eu-north-1: ami-085a9ecbf9f64f65b
eu-west-1: ami-070ba56e38a744df5
eu-west-2: ami-08553013e6e986028
eu-west-3: ami-0afff5bc147c847e0
sa-east-1: ami-0635a9bdc378fe67f
us-east-1: ami-091f37e900368fe1a
us-east-2: ami-055404b3df678da86
us-west-1: ami-0e438402399c457d7
us-west-2: ami-0651b7e7cfde4b3a0
ap-east-1: ami-08de06c8c25c4e483
ap-northeast-1: ami-0fb1e620a6e6c7c63
ap-northeast-2: ami-083dda363f440b5f3
ap-south-1: ami-0a19d85caae09e69b
ap-southeast-1: ami-04b147081e72b8141
ap-southeast-2: ami-0cc13227daec10928
ca-central-1: ami-0e584b1dc9d90cfe3
eu-central-1: ami-047fc8e8af243d384
eu-north-1: ami-07413aa597232ff9e
eu-west-1: ami-0ef1bf4281c1c4604
eu-west-2: ami-0097ab9ba306ca46b
eu-west-3: ami-0b941e4e71f296ca7
sa-east-1: ami-0505ed5c5ad56d04b
us-east-1: ami-016392fa0b61bde58
us-east-2: ami-000a7976d7539e448
us-west-1: ami-0a25f5e16aafd09d1
us-west-2: ami-0951110bddb6944b0
# centos7
ap-northeast-1: ami-09bae677f8f58842d
ap-northeast-2: ami-0eeb6c96d0e6c2d90
ap-northeast-3: ami-084c0dbc04f722758
ap-south-1: ami-031f8f67a53de53fe
ap-southeast-1: ami-041ca5c2f5b748966
ap-southeast-2: ami-06c7f5584ecfcac3a
ca-central-1: ami-0afc2ea67b3963398
eu-central-1: ami-0205eaef48a9fc97a
eu-north-1: ami-0420576e18a5fcb7c
eu-west-1: ami-0f67868de5be7b0b3
eu-west-2: ami-057fa1a5314e3c414
eu-west-3: ami-05b2808c2dc4fb82c
sa-east-1: ami-0da1262e3c5d9af72
us-east-1: ami-031eb9c5390c0f8f6
us-east-2: ami-0050bd80a1cecfe37
us-west-1: ami-09bd008b253048b80
us-west-2: ami-003da28849bc413f5
ap-east-1: ami-0b8dbcf754d6b1a15
ap-northeast-1: ami-075158d05e7ffc090
ap-northeast-2: ami-03d158fde32bf5c43
ap-south-1: ami-0256ac397ea3738d9
ap-southeast-1: ami-0768c2e0ebf2b048b
ap-southeast-2: ami-0d4bc69b138616534
ca-central-1: ami-03f172775e62c78ef
eu-central-1: ami-009cd1bd82fcb612e
eu-north-1: ami-018a7d07256217bd0
eu-west-1: ami-0a47a5dc8fee55323
eu-west-2: ami-0697036b2287f0afc
eu-west-3: ami-017a92730499673fd
sa-east-1: ami-00e378be239ed59f4
us-east-1: ami-0a4d7e08ea5178c02
us-east-2: ami-0a3b8f19ab7333a80
us-west-1: ami-0b977098af0dd77e3
us-west-2: ami-0d6c93513ba5d3734
# ubuntu1404
ap-northeast-1: ami-0939e3e1030d4f7d2
ap-northeast-2: ami-0481c6b023e2328b4
ap-northeast-3: ami-0a535e1d0bb7bc502
ap-south-1: ami-000e99acc047832ae
ap-southeast-1: ami-09ca9a6a8fee71ba5
ap-southeast-2: ami-09646cc49a932a37e
ca-central-1: ami-06ac5db73837bc364
cn-north-1: ami-07e16a5709c99f963
cn-northwest-1: ami-05348579489ba3673
eu-central-1: ami-0032889c720d364dc
eu-north-1: ami-0976908358f0bfa01
eu-west-1: ami-0f5c65a609ad3afb4
eu-west-2: ami-08c2d96c2805037e7
eu-west-3: ami-0f6cd6ac9be8f2b32
sa-east-1: ami-0d0da341da4802af9
us-east-1: ami-017bfe181606779d8
us-east-2: ami-043eb896e1bb2b948
us-gov-east-1: ami-060ced48ab370aadf
us-gov-west-1: ami-32f98153
us-west-1: ami-0d48f8a9d5735efde
us-west-2: ami-0169da6ccb6347f50
ap-east-1: ami-0059b45a57f781c19
ap-northeast-1: ami-07531e6831cd3b73e
ap-northeast-2: ami-0ee26fe2d03734d6c
ap-south-1: ami-047d504b1c9897a71
ap-southeast-1: ami-0497ef9ffb2284737
ap-southeast-2: ami-07c920b78a691f3de
ca-central-1: ami-0a25d23a0f47e04df
cn-north-1: ami-0d20db2e290c07266
cn-northwest-1: ami-091c2a6f3a16fe374
eu-central-1: ami-0cd4c9af30dce9c71
eu-north-1: ami-02a5769e4d5f5c4c9
eu-west-1: ami-00aeb9a13213998a0
eu-west-2: ami-0be08f7b993cf51e5
eu-west-3: ami-0dfb94ca6be715d8c
sa-east-1: ami-008eaaf7ed3b81e00
us-east-1: ami-006da8413e239334a
us-east-2: ami-0a1882523d4df3e24
us-gov-east-1: ami-04d31bc443d9b3c36
us-gov-west-1: ami-8b7c3cea
us-west-1: ami-02a1eceaa0c83ee27
us-west-2: ami-0b1e995e9452b4050
# ubuntu1604
ap-northeast-1: ami-06b328a6ee03ccdf4
ap-northeast-2: ami-0179e2707f709f813
ap-northeast-3: ami-0c9b72bae5efc9f61
ap-south-1: ami-0f21d1eb3339ebd6a
ap-southeast-1: ami-01899e9a659eb2267
ap-southeast-2: ami-049c81a79d55b2c8a
ca-central-1: ami-0b8928a1f643684eb
cn-north-1: ami-0ae967dc97d5eb57a
cn-northwest-1: ami-0ba0b1ed49ce7b1b1
eu-central-1: ami-002422c65a5bb1af8
eu-north-1: ami-0d3c7ce730c73ab00
eu-west-1: ami-00328873639859269
eu-west-2: ami-0c1de72c6acf4b187
eu-west-3: ami-090d577bb6d08e95b
sa-east-1: ami-08df8912b098a3f42
us-east-1: ami-08e1d33a6a64499de
us-east-2: ami-0219fdb6f47395d88
us-gov-east-1: ami-0af2c8e5bf3c334b0
us-gov-west-1: ami-7b85fd1a
us-west-1: ami-066818f6a6be06fb5
us-west-2: ami-07122cb5a96b7fee9
ap-east-1: ami-0eacbde87adcd79a4
ap-northeast-1: ami-082d16fe36ad64a5d
ap-northeast-2: ami-0603f6bfdaf0520b9
ap-south-1: ami-0e4af994480d249c6
ap-southeast-1: ami-0e51553a3f083c9d3
ap-southeast-2: ami-0404f148f2106e206
ca-central-1: ami-079729e8f44ea4e33
cn-north-1: ami-0f71072a6c2f049fc
cn-northwest-1: ami-0625b09f99c971e40
eu-central-1: ami-054fd0d64e09a12d5
eu-north-1: ami-08984e346e48bc46f
eu-west-1: ami-0c5c2481e10335e90
eu-west-2: ami-047a75cbcc2756dda
eu-west-3: ami-0b26c8b0857c0722d
sa-east-1: ami-0e654e24368bf23f5
us-east-1: ami-0c535eb8c5a80b962
us-east-2: ami-00fb6e36bb37b662e
us-gov-east-1: ami-0a4c82eb37facd766
us-gov-west-1: ami-777b3b16
us-west-1: ami-0bb311ef404c8a54b
us-west-2: ami-097b7aae68846a39a
8 changes: 0 additions & 8 deletions cli/pcluster/cfnconfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -360,12 +360,6 @@ def __init_vpc_parameters(self):
"VPC section [%s] used in [%s] section is not defined" % (vpc_section, self.__cluster_section)
)

# Check that cidr and public ips are not both set
cidr_value = self.__config.get(vpc_section, "compute_subnet_cidr", fallback=None)
public_ips = self.__config.getboolean(vpc_section, "use_public_ips", fallback=True)
if self.__sanity_check:
ResourceValidator.validate_vpc_coherence(cidr_value, public_ips)

def __check_account_capacity(self):
"""Try to launch the requested number of instances to verify Account limits."""
if self.parameters.get("Scheduler") == "awsbatch" or self.parameters.get("ClusterType", "ondemand") == "spot":
Expand Down Expand Up @@ -514,14 +508,12 @@ def __init_cluster_parameters(self):
post_install_args=("PostInstallArgs", None),
s3_read_resource=("S3ReadResource", None),
s3_read_write_resource=("S3ReadWriteResource", None),
tenancy=("Tenancy", None),
master_root_volume_size=("MasterRootVolumeSize", None),
compute_root_volume_size=("ComputeRootVolumeSize", None),
base_os=("BaseOS", None),
ec2_iam_role=("EC2IAMRoleName", "EC2IAMRoleName"),
extra_json=("ExtraJson", None),
custom_chef_cookbook=("CustomChefCookbook", None),
custom_chef_runlist=("CustomChefRunList", None),
additional_cfn_template=("AdditionalCfnTemplate", None),
custom_awsbatch_template_url=("CustomAWSBatchTemplateURL", None),
)
Expand Down
7 changes: 7 additions & 0 deletions cli/pcluster/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,13 @@ def _get_parser():
help="Specifies the OS of the base AMI. "
"Valid options are: alinux, ubuntu1404, ubuntu1604, centos6, centos7.",
)
pami.add_argument(
"-i",
"--instance-type",
dest="instance_type",
default="t2.xlarge",
help="Sets instance type to build the ami on. Defaults to t2.xlarge.",
)
pami.add_argument(
"-ap",
"--ami-name-prefix",
Expand Down
16 changes: 3 additions & 13 deletions cli/pcluster/config_sanity.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,17 +46,6 @@ def __get_partition(self):
return "aws-us-gov"
return "aws"

@staticmethod
def validate_vpc_coherence(cidr_value, public_ip):
"""
Check that cidr_value and public_ip parameters are not conflicting.
:param cidr_value: the value of compute_subnet_cidr set by the user (default should be None)
:param public_ip: the value of use_public_ips set by the user (default should be True)
"""
if cidr_value and public_ip is False:
ResourceValidator.__fail("VPC COHERENCE", "compute_subnet_cidr needs use_public_ips to be true")

@staticmethod
def __check_sg_rules_for_port(rule, port_to_check):
"""
Expand Down Expand Up @@ -355,7 +344,8 @@ def validate(self, resource_type, resource_value): # noqa: C901 FIXME
),
(
["cloudformation:DescribeStacks"],
"arn:%s:cloudformation:%s:%s:stack/parallelcluster-*" % (partition, self.region, account_id),
["cloudformation:DescribeStackResource"],
"arn:%s:cloudformation:%s:%s:stack/parallelcluster-*/*" % (partition, self.region, account_id),
),
(["s3:GetObject"], "arn:%s:s3:::%s-aws-parallelcluster/*" % (partition, self.region)),
(["sqs:ListQueues"], "*"),
Expand Down Expand Up @@ -438,7 +428,7 @@ def validate(self, resource_type, resource_value): # noqa: C901 FIXME
self.__fail(resource_type, e.response.get("Error").get("Message"))
# EC2 Placement Group
elif resource_type == "EC2PlacementGroup":
if resource_value == "DYNAMIC":
if resource_value == "DYNAMIC" or resource_value == "NONE":
pass
else:
try:
Expand Down
3 changes: 2 additions & 1 deletion cli/pcluster/easyconfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,8 @@ def configure(args): # noqa: C901 FIXME!!!
# ensure that the directory for the config file exists (because
# ~/.parallelcluster is likely not to exist on first usage)
try:
os.makedirs(os.path.dirname(config_file))
config_folder = os.path.dirname(config_file) or "."
os.makedirs(config_folder)
except OSError as e:
if e.errno != errno.EEXIST:
raise # can safely ignore EEXISTS for this purpose...
Expand Down
Loading

0 comments on commit 8f5359f

Please sign in to comment.