-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathRELEASE_NOTES
123 lines (112 loc) · 6.59 KB
/
RELEASE_NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
RELEASE NOTES FOR SLURM VERSION 20.02
IMPORTANT NOTES:
If using the slurmdbd (Slurm DataBase Daemon) you must update this first.
NOTE: If using a backup DBD you must start the primary first to do any
database conversion, the backup will not start until this has happened.
The 20.02 slurmdbd will work with Slurm daemons of version 18.08 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it.
Slurm can be upgraded from version 18.08 or 19.05 to version 20.02 without loss
of jobs or other state information. Upgrading directly from an earlier version
of Slurm will result in loss of state information.
If using SPANK plugins that use the Slurm APIs, they should be recompiled when
upgrading Slurm to a new major release.
NOTE: Slurmctld is now set to fatal in case of computing node configured with
CPUs == #Sockets. CPUs has to be either total number of cores or threads.
NOTE: The FastSchedule option has been removed. The FastSchedule=2 functionality
(used for testing and development) is available as the new
SlurmdParameters=config_overrides option.
HIGHLIGHTS
==========
-- Drop slurm.spec-legacy from distribution.
-- Removed the smap command.
-- Exclusive behavior of a node includes all GRES on a node as well
as the cpus.
-- Use python3 instead of python for internal build/test scripts.
The slurm.spec file has been updated to depend on python3 as well.
-- Added new NodeSet configuration option to help simplify partition
configuration sections for heterogeneous / condo-style clusters.
-- Added slurm.conf option MaxDBDMsgs to control how many messages will be
stored in the slurmctld before throwing them away when the slurmdbd is down.
-- The checkpoint plugin interface and all associated API calls have been
removed.
-- slurm_init_job_desc_msg() initializes mail_type as uint16_t. This allows
mail_type to be set to NONE with scontrol.
-- Add new slurm_spank_log() function to print messages back to the user from
within a SPANK plugin without prepending "error: " from slurm_error().
-- Enforce having partition name and nodelist=ALL when creating reservations
with flags=PART_NODES.
-- SPANK - removed never-implemented slurm_spank_slurmd_init() interface. This
hook has always been accessible through slurm_spank_init() in the
S_CTX_SLURMD context instead.
-- sbcast - add new BcastAddr option to NodeName lines to allow sbcast traffic
to flow over an alternate network path.
-- Added auth/jwt plugin, and 'scontrol token' subcommand.
-- PMIx - improve performance of proc map generation.
-- Move kill_invalid_depend and max_depend_depth options to the new
DependencyParameters option, and deprecate use in SchedulerParameters.
-- Enable job dependencies for any job on any cluster in the same federation.
-- Allow clusters to be added automatically to db at startup of ctld.
-- Add AccountingStorageExternalHost slurm.conf parameter.
-- The "ConditionPathExists" condition in slurmd.service has been disabled by
default to permit simpler installation of a "configless" Slurm cluster.
-- In SchedulerParameters remove deprecated max_job_bf and replace with
bf_max_job_test.
-- Disable sbatch, salloc, srun --reboot for non-admins.
-- SPANK - added support for S_JOB_GID in the job script context with
spank_get_item().
-- Prolog/Epilog - add SLURM_JOB_GID environment variable.
-- Add new slurmrestd command/daemon which implements the Slurm REST API.
-- The inconsistent terminology and environment variable naming for
Heterogeneous Job ("HetJob") support has been tidied up.
-- The correct term for these jobs are "HetJobs", references to "PackJob"
have been corrected.
-- The correct term for the separate constituent jobs are "components",
references to "packs" have been corrected.
-- Output from 'scontrol show job' and others has been made consistent
with this .
-- Relevant environment variables are all of the form SLURM_HET_JOB_*.
(Old forms are still supported for this release, but may be removed
in the future.)
-- slurm.spec - override "hardening" linker flags to ensure RHEL8 builds
in a usable manner.
-- Now when requesting exclusive access on a node the job will be given all the
GRES on a node whether or not they asked for it.
NOTE: Because of this change exclusive jobs submitted on previous versions
of Slurm may produce cosmetic errors such as
'error: _job_alloc_whole_node_internal: This should never happen, we
couldn't find the gres 7696487:3160171' and when the job finishes
'error: gres/gpu: job 17262 dealloc node snowflake0 GRES count underflow
(0 < 1)'
These should only exist while these older jobs are loading/finishing.
The error does not present an actual problem or impact future jobs and
should be ignored on pre-20.02 jobs.
-- Both jobcomp and cli_filter now have lua interfaces.
SPECIAL NOTES FOR CRAY SYSTEMS
==============================
-- burst_buffer/datawarp - the OtherTimeout configuration option will now
apply to the stage-in "setup" and stage-out "dws_post_run" calls in to
dw_wlm_cli. (The StageInTimeout and StageOutTimeout were previously
used here by mistake.)
-- burst_buffer/datawarp - add a set of % symbols that will be replaced by
job details. E.g., %d will be filled in with the WorkDir for the job.
CONFIGURATION FILE CHANGES (see man appropriate man page for details)
=====================================================================
-- The mpi/openmpi plugin has been removed as it does nothing.
MpiDefault=openmpi will be translated to the functionally-equivalent
MpiDefault=none.
COMMAND CHANGES (see man pages for details)
===========================================
-- Display StepId=<jobid>.batch instead of StepId=<jobid>.4294967294 in output
of "scontrol show step". (slurm_sprint_job_step_info())
-- MPMD in srun will now defer PATH resolution for the commands to launch to
slurmstepd. Previously it would handle resolution client-side, but with
a non-standard approach that walked PATH in reverse.
-- squeue - added "--me" option, equivalent to --user=$USER.
-- The LicensesUsed line has been removed from 'scontrol show config'.
Please see the 'scontrol show licenses' command as an alternative.
-- sbatch - adjusted backoff times for "--wait" option to reduce load on
slurmctld. This results in a steady-state delay of 32s between queries,
instead of the prior 10s delay.
-- Change 'scancel -t' to filter based only on job base state.