Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix array allocation problem with new nfix tests #2924

Closed
slevis-lmwg opened this issue Jan 6, 2025 · 5 comments · Fixed by #2928
Closed

Fix array allocation problem with new nfix tests #2924

slevis-lmwg opened this issue Jan 6, 2025 · 5 comments · Fixed by #2928
Assignees
Labels
bug something is working incorrectly

Comments

@slevis-lmwg
Copy link
Contributor

slevis-lmwg commented Jan 6, 2025

@glemieux got segfault errors in a bunch of izumi nag tests on the tmp-241219 branch:

FAIL ERP_D_Ld5_P48x1.f10_f10_mg37.I1850Clm50Bgc.izumi_nag.clm-ciso RUN time=223
FAIL ERP_D_Ld5_P48x1.f10_f10_mg37.I1850Clm60Bgc.izumi_nag.clm-ciso RUN time=299
FAIL ERP_D_Ld5_P48x1.f10_f10_mg37.I1850Clm60Bgc.izumi_nag.clm-ciso--clm-matrixcnOn RUN time=260
FAIL SMS_D.f10_f10_mg37.I2000Clm60BgcCrop.izumi_nag.clm-crop RUN time=38
FAIL SMS_D_Ld1_P48x1.f10_f10_mg37.I2000Clm45BgcCrop.izumi_nag.clm-oldhyd RUN time=48
FAIL SMS_D_Ld5.f10_f10_mg37.I2000Clm50BgcCrop.izumi_nag.clm-irrig_alternate RUN time=50
FAIL SMS_D_Ld65.f10_f10_mg37.I2000Clm60BgcCrop.izumi_nag.clm-FireLi2024GSWP RUN time=49
FAIL SMS_D_P48x1_Ld5.f10_f10_mg37.I2000Clm50BgcCrop.izumi_nag.clm-irrig_spunup RUN time=53
FAIL SMS_Ld5_D_P48x1.f10_f10_mg37.IHistClm50Bgc.izumi_nag.clm-monthly RUN time=128

@glemieux confirmed that the same tests pass in ctsm5.3.016, so we suspect that the bug entered in the first tmp-241219 branch tag, i.e. when @slevis-lmwg merged b4b-dev (ultimately #2917).
@slevis-lmwg ignored the izumi tests when merging b4b-dev because izumi had just been upgraded and was giving problems, including not having room for new baselines.

@slevis-lmwg will work on debugging this starting today. The fix needs to go on b4b-dev as well.

@slevis-lmwg slevis-lmwg self-assigned this Jan 6, 2025
@slevis-lmwg slevis-lmwg converted this from a draft issue Jan 6, 2025
@slevis-lmwg slevis-lmwg added the bug something is working incorrectly label Jan 6, 2025
@slevis-lmwg slevis-lmwg moved this from Todo to In Progress in LMWG: Near Term Priorities Jan 6, 2025
@slevis-lmwg
Copy link
Contributor Author

I started with one of the Clm6 SMS tests from the list:
SMS_D_Ld65.f10_f10_mg37.I2000Clm60BgcCrop.izumi_nag.clm-FireLi2024GSWP.

@glemieux
Copy link
Collaborator

glemieux commented Jan 6, 2025

For future reference, here's the location of the results from the tests I ran:

tmp241219 results: /scratch/cluster/glemieux/ctsm-tests/tests_0103-202929iz
ctsm5.3.016 results: /scratch/cluster/glemieux/ctsm-tests/tests_0103-164806iz

@slevis-lmwg
Copy link
Contributor Author

Promising news:
The SMS test that I started with, gets past the segfault error and appears to run smoothly now.
I have now submitted another SMS and one ERP.

@slevis-lmwg
Copy link
Contributor Author

The second SMS test and the ERP test also passed. It probably makes sense for me to open a PR and run the test-suites next.

@slevis-lmwg slevis-lmwg moved this from In progress - master/b4b-dev to Done (non release/external) in CTSM: Upcoming tags Jan 9, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in LMWG: Near Term Priorities Jan 9, 2025
@ekluzek ekluzek changed the title Fix problem with Izumi tests Fix array allocation problem with new nfix tests Jan 9, 2025
@ekluzek
Copy link
Collaborator

ekluzek commented Jan 9, 2025

We talked about this at the CTSM SE meeting this morning. Our notes from there:

  • Good to do to figure out how to prevent in the future
  • It didn’t come in in the original implementation – good as most work will be fine with it
  • It came in with some “b4b” refactoring
  • The issue means that using the new method would be wrong in some fashion, e.g. in 2-deg I-case spinup and historical baseline for PPE with BNF tag (ctsm53017_f19_BNF_[AD, SASU, pSASU, hist]) NCAR/LMWG_dev#88
  • Note that nag caught this even though test exercising the option didn’t exist (because parameters were being read and arrays allocated even when option was off)
  • tmp-241219.n03.ctsm5.3.016 will be adding one izumi and one derecho test
  • Takeaways:
  • Make sure there is at least one test for new namelist options coming in.
  • When refactoring something make sure it’s exercised in tests and give identical results

@briandobbins this is the issue with the problem we discussed in CSEG. The fix is small and came in with tmp-241219.n03.ctsm5.3.019. The tag WITH the problem is: tmp-241219.n01.ctsm5.3.019 so it would be good to try compiling with more aggressive checking options in that tag.

slevis-lmwg added a commit to slevis-lmwg/ctsm that referenced this issue Jan 9, 2025
Merge tmp-241219 branch to master

Includes three tmp-241219 tags:

tmp-241219.n01.ctsm5.3.016 Merge b4b-dev:
nfix_method options Houlton (default), Bytnerowicz (option)
tmp-241219.n02.ctsm5.3.016 FATES hydro test update
tmp-241219.n03.ctsm5.3.016 Bug fix for izumi nag tests to pass (b4b unless using Bytnerowicz)

Fixes ESCOMP#2924 Fix problem with izumi nag tests
Fixes ESCOMP#2878 Remove fates_allom_smode shell_command update in FatesColdHydro testmod
Fixes ESCOMP#2869 Update temperature cost function for symbiotic nfix in FUN

Changes answers as documented in the ChangeLog.

slevis resolved conflicts:
doc/ChangeLog
doc/ChangeSum
ekluzek added a commit to ekluzek/CTSM that referenced this issue Jan 13, 2025
Merge tmp-241219 branch to master

Includes three tmp-241219 tags:

tmp-241219.n01.ctsm5.3.016 Merge b4b-dev:
nfix_method options Houlton (default), Bytnerowicz (option)
tmp-241219.n02.ctsm5.3.016 FATES hydro test update
tmp-241219.n03.ctsm5.3.016 Bug fix for izumi nag tests to pass (b4b unless using Bytnerowicz)

Fixes ESCOMP#2924 Fix problem with izumi nag tests
Fixes ESCOMP#2878 Remove fates_allom_smode shell_command update in FatesColdHydro testmod
Fixes ESCOMP#2869 Update temperature cost function for symbiotic nfix in FUN

Changes answers as documented in the ChangeLog.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is working incorrectly
Projects
Status: Done (non release/external)
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants