You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Distributed deployment script fails to work on Peanut cluster at UChicago.
Steps to reproduce
Run the script with with arguments -d -w WORK_DIR -j JOB_ID -n 2 -e after a successful allocation using salloc .
What is the current bug behavior?
The script exits abnormally in preparing hosts file due to unmatched number of ChronoKeepers.
The script exits abnormally in getting remote hostnames.
ChronoKeeper fails to launch due to wrong IP or hostname in the configuration file.
Initial diagnosis:
Cannot get hostname list correctly. Slurm assigns environment variable SLURM_JOB_ID automatically after salloc, which is different to the behavior on Ares. That leads to the code in the if branch in the prepare_hosts function, which does not work.
mpssh enables key check on ssh on default. Peanut has conflicting key problem right now. ssh does not work with key check enabled.
dig is used to get IP from a remote hostname. But it returns nothing useful on Peanut. nslookup works on Peanut, but fails on Ares.
What is the expected correct behavior?
ChronoLog is deployed on multiple nodes. Data from clients can be stored in WORK_DIR/output as CSV files.
Relevant logs and/or screenshots
N/A
The text was updated successfully, but these errors were encountered:
Summary
Distributed deployment script fails to work on Peanut cluster at UChicago.
Steps to reproduce
Run the script with with arguments
-d -w WORK_DIR -j JOB_ID -n 2 -e
after a successful allocation usingsalloc
.What is the current bug behavior?
Initial diagnosis:
SLURM_JOB_ID
automatically aftersalloc
, which is different to the behavior on Ares. That leads to the code in the if branch in theprepare_hosts
function, which does not work.mpssh
enables key check on ssh on default. Peanut has conflicting key problem right now. ssh does not work with key check enabled.dig
is used to get IP from a remote hostname. But it returns nothing useful on Peanut.nslookup
works on Peanut, but fails on Ares.What is the expected correct behavior?
ChronoLog is deployed on multiple nodes. Data from clients can be stored in
WORK_DIR/output
as CSV files.Relevant logs and/or screenshots
N/A
The text was updated successfully, but these errors were encountered: