Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have usermod directory support fail if shell_commands returns an error #4355

Closed
ekluzek opened this issue Jan 20, 2023 · 11 comments
Closed

Have usermod directory support fail if shell_commands returns an error #4355

ekluzek opened this issue Jan 20, 2023 · 11 comments

Comments

@ekluzek
Copy link
Contributor

ekluzek commented Jan 20, 2023

Currently when you setup a case with user-mods it will run shell_commands for you -- but won't fail with an error. I think this should be changed to exist with error if there is an error in the shell_commands.

This would be a simple change on line 121 the last line of the following snippet in CIME/user_mod_support.py

    for shell_command_file in case_shell_command_files:
        if os.path.isfile(shell_command_file):
            os.chmod(shell_command_file, 0o777)
            run_cmd_no_fail(shell_command_file, verbose=True)

That last line should just be changed to

            run_cmd(shell_command_file, verbose=True)

This came up, because we expected it to fail in this CTSM PR...

ESCOMP/CTSM#1932

So it's an easy change, if we agree with it.

@fischer-ncar @jedwards4b @billsacks thoughts on this?

@billsacks
Copy link
Member

This sounds like a good idea in theory, but I suspect there is – or at least used to be – a reason why this is using run_cmd_no_fail. I see that @jgfouca changed it from run_cmd to run_cmd_no_fail in 4fff865 (merged in #267). I wouldn't be surprised if there are – or were – some shell_commands files that are used in a variety of cases, some of which result in errors on some commands. And one could argue that ignoring those errors is a feature, not a bug, so that you can create a generic shell_commands rather than needing to make different versions for different configurations.

@adrifoster
Copy link
Collaborator

Right now we are trying to have NEON cases fail for FATES runs at two crop sites and our current method was to have the shell_commands file tell the user why they can't run those cases for FATES and then exit. But then the job is still submittable, when I think we would like it to not be. Is there another method for achieving this behavior?

@jedwards4b
Copy link
Contributor

can you point me to the shell_commands file that does this?

@ekluzek
Copy link
Contributor Author

ekluzek commented Jan 20, 2023

That is a good point @billsacks , although I'm more of the mind that if you create a shell_commands that fails in some cases, that you've set it up wrong. So you actually want it to fail. But, if we end up with a lot of things that fail because of this that would be a problem. But, in my mind I would want a case to fail if the shell_commands doesn't work, because that means I've done something wrong in setting it up, or I'm using it for a case that I really shouldn't.

@jgfouca do you have some thoughts here?

@adrifoster
Copy link
Collaborator

adrifoster commented Jan 20, 2023

can you point me to the shell_commands file that does this?

Here is the one I was trying to create for our NEON sites.

@jgfouca
Copy link
Contributor

jgfouca commented Jan 20, 2023

@billsacks , that PR is really old and I switched tons of run_cmd calls to run_cmd_no_fail in it so I doubt a lot of thought went into the switch or that I would remember it if I did actually think about it.

Looking at this code now, I am confused by @ekluzek 's comments because switching from run_cmd_no_fail to run_cmd should actually hide errors if the return code from run_cmd is not checked. run_cmd_no_fail should always raise an exception when the command returns non-zero. It seems like you must be talking about stderr output being hidden and not necessarily non-zero return codes?

@jedwards4b
Copy link
Contributor

@adrifoster

How about adding something like
./xmlchange BUILD_COMPLETE=FALSE

@ekluzek
Copy link
Contributor Author

ekluzek commented Jan 21, 2023

@jgfouca you are right as constructed it should fail with the use of run_cmd_no_fail at the expect statement which should happen provided the error code isn't 0. So I'm not getting why it isn't aborting in a clean way. @jedwards4b might have a good idea here.

On Monday we'll see what @adrifoster finds out, and have her show us what it looks like when it's run.

@adrifoster
Copy link
Collaborator

Hello,

So for @ekluzek's suggestion above we get this output after the ./create_newcase call

 Creating Case directory /glade/work/afoster/NEON/all_sites/cases/NEON_KONA
Adding user mods directory /glade/work/afoster/CTSM/cime_config/usermods_dirs/NEON/FATES/KONA
Adding user mods directory /glade/work/afoster/CTSM/cime_config/usermods_dirs/NEON/FATES/defaults
RUN: /glade/work/afoster/NEON/all_sites/cases/NEON_KONA/shell_commands
FROM: /glade/work/afoster/NEON/all_sites/cases/NEON_KONA
  stat: 1

  output: Cannot run FATES at KONA because FATES does not yet have crops.

./case.setup, ./case.build, proceeded normally. And then the case was able to be submitted with no errors.


For @jedwards4b's suggestion, I got this for a ./create_clone:

08:04 $ ./create_clone --case ${case_name} --clone ${CLONE_ROOT} --keepexe --user-mods-dir ${user_mods}
Adding user mods directory /glade/work/afoster/CTSM/cime_config/usermods_dirs/NEON/FATES/KONA
Adding user mods directory /glade/work/afoster/CTSM/cime_config/usermods_dirs/NEON/FATES/defaults
RUN: /glade/work/afoster/NEON/all_sites/cases/NEON_KONA_2/shell_commands
FROM: /glade/work/afoster/NEON/all_sites/cases/NEON_KONA_2
  output: For your changes to take effect, run:
./case.build --clean-all
./case.build
Cannot run FATES at KONA because FATES does not yet have crops.

Inequivalent lines <entry id='BUILD_COMPLETE' value='FALSE'> != <entry id='BUILD_COMPLETE' value='TRUE'>
  NORMALIZED: <entry id='BUILD_COMPLETE' value='FALSE'> != <entry id='BUILD_COMPLETE' value='TRUE'>

ERROR: env_build.xml cannot be changed via usermods if keepexe is an option: 
 Failed to clone case, removed /glade/work/afoster/NEON/all_sites/cases/NEON_KONA_2

So that is good! But unfortunately, for a ./create_newcase:

08:06 $ ./create_newcase --case ${case_name} --res CLM_USRDAT --compset I1PtClm51Fates --project ${PROJECT} --run-unsupported --user-mods-dir ${user_mods}
Compset longname is 2000_DATM%1PT_CLM51%FATES_SICE_SOCN_SROF_SGLC_SWAV
Compset specification file is /glade/work/afoster/CTSM/cime_config/config_compsets.xml
Automatically adding SESP to compset
Compset forcing is 1972-2004
ATM component is  Data driven ATM single point tower site data set 
LND component is clm5.1:FATES (Functionally Assembled Terrestrial Ecosystem Simulator) Ecosystem Demography model: 
ICE component is Stub ice component
OCN component is Stub ocn component
ROF component is Stub river component
GLC component is Stub glacier (land ice) component
WAV component is Stub wave component
ESP component is Stub external system processing (ESP) component
Pes     specification file is /glade/work/afoster/CTSM/cime_config/config_pes.xml
Compset specific settings: name is RUN_STARTDATE and value is 2000-01-01
Machine is cheyenne
Pes setting: grid match    is l%1x1|l%CLM_USRDAT 
Pes setting: grid          is a%CLM_USRDAT_l%CLM_USRDAT_oi%null_r%null_g%null_w%null_z%null_m%null 
Pes setting: compset       is 2000_DATM%1PT_CLM51%FATES_SICE_SOCN_SROF_SGLC_SWAV_SESP 
Pes setting: tasks       is {'NTASKS_ATM': 1, 'NTASKS_LND': 1, 'NTASKS_ROF': 1, 'NTASKS_ICE': 1, 'NTASKS_OCN': 1, 'NTASKS_GLC': 1, 'NTASKS_WAV': 1, 'NTASKS_CPL': 1} 
Pes setting: threads     is {'NTHRDS_ATM': 1, 'NTHRDS_LND': 1, 'NTHRDS_ROF': 1, 'NTHRDS_ICE': 1, 'NTHRDS_OCN': 1, 'NTHRDS_GLC': 1, 'NTHRDS_WAV': 1, 'NTHRDS_CPL': 1} 
Pes setting: rootpe      is {'ROOTPE_ATM': 0, 'ROOTPE_LND': 0, 'ROOTPE_ROF': 0, 'ROOTPE_ICE': 0, 'ROOTPE_OCN': 0, 'ROOTPE_GLC': 0, 'ROOTPE_WAV': 0, 'ROOTPE_CPL': 0} 
Pes setting: pstrid      is {} 
Pes other settings: {}
Pes comments: none
setting additional fields from config_pes: {}
 Compset is: 2000_DATM%1PT_CLM51%FATES_SICE_SOCN_SROF_SGLC_SWAV_SESP 
 Grid is: a%CLM_USRDAT_l%CLM_USRDAT_oi%null_r%null_g%null_w%null_z%null_m%null 
 Components in compset are: ['datm', 'clm', 'sice', 'socn', 'srof', 'sglc', 'swav', 'sesp'] 
No charge_account info available, using value from PROJECT
cesm model version found: ctsm5.1.dev115-36-g2ecd83e98
Batch_system_type is pbs
job is case.run USER_REQUESTED_WALLTIME None USER_REQUESTED_QUEUE None WALLTIME_FORMAT %H:%M:%S
job is case.st_archive USER_REQUESTED_WALLTIME None USER_REQUESTED_QUEUE None WALLTIME_FORMAT %H:%M:%S
 Creating Case directory /glade/work/afoster/NEON/all_sites/cases/NEON_KONA_2
Adding user mods directory /glade/work/afoster/CTSM/cime_config/usermods_dirs/NEON/FATES/KONA
Adding user mods directory /glade/work/afoster/CTSM/cime_config/usermods_dirs/NEON/FATES/defaults
RUN: /glade/work/afoster/NEON/all_sites/cases/NEON_KONA_2/shell_commands
FROM: /glade/work/afoster/NEON/all_sites/cases/NEON_KONA_2
  output: Cannot run FATES at KONA because FATES does not yet have crops.

And then ./case.setup, ./case.build, etc. proceeded normally. I am guessing because the shell_commands file is run before you build the case (in a new case, not from a clone)...

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Apr 28, 2023
@github-actions
Copy link
Contributor

github-actions bot commented May 4, 2023

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants