Add JEDI QG model example cycled DA workflow #85

christopherwharrop-noaa · 2024-03-18T21:10:38Z

This PR adds a collection of Parsl wrappers that configure and execute the various applications needed for a simple cycled 3DVar JEDI QG workflow. The wrappers consist of "configure" and "run" tasks that execute separately on different resources. The "configure" tasks will run on either a "service" (when internet access is required) or a "serial" resource while the "run" tasks can execute on compute nodes in parallel. Wrappers are created for: installing JEDI, running a "truth" forecast, creating simulated observations from the "truth", running 3DVar using the simulated obs and a background from a previous cycle forecast, and running a forecast initialized from the analysis produced by 3DVar.

These are not the most efficient or best organized wrappers. Much more thought should be put into design that can be applicable for future applications and experiment types. However, effort was made to build configurable components that could be used to run cycled 3DVar experiments with varying configuration options.

An experiment program, workflow.py was created to illustrate how the wrappers can be used to build a workflow.

Automated tests for the various wrappers has not been added yet because it isn't clear what the best way to do it is. All such tests would depend on a JEDI install, which takes time to build, and would take even longer in CI. Testing will be added once a strategy is determined.

Closes #84

)

* Update LICENSE and README to be consistent with repository contents * Add basic install scripts for Spack and ExaWorks SDK * Add basic test for ExaWorks installation

* Add ExaWorks Docker containers and update ExaWorks install scripts Dockerfiles for both Ubuntu and CentOS containers is added. Scripts for installing Spack and Exaworks are updated. Add basic "hello" Parsl test script. Modify CI workflow to use containers for testing ExaWorks. * Rename CI workflow Modify CI workflow names to be consistent with current usage. * Fix typo in comment and add newline at end of Docker files

* Add CI test for Docker container and update checkout action for test example CI Also, add py-pylint to the Spack install script to enable better testing. * Restructure ExaWorks installation into small pieces Create installation scripts for each main ExaWorks component. Create Docker files for each main ExaWorks component. Use Spack build caches to optimize installation times. Update Github Actions workflow to leverage Spack build caches. Update Github Actions workflow to split install into different build jobs. * Add missing colon from workflow CI yml file * Fix another missing colon in the workflow yaml file * Rework Docker containers and install scripts Install scripts and containers were modified to build and use a "bootstrap" compiler for building the exaworks software. This is necessary because dependencies shared between various components of exaworks will induce multiple rebuilds of the gcc compiler, which takes a very long time. This is avoided by bootstrapping gcc with the native compiler, and then rebuilding the newer gcc with itself in a "base" environment. In this way, gcc is built twice for the base environment, but subsequent environments using the "base" environment will not rebuild gcc. This commit also adds CentOS7 containers for the multi-stage build of exaworks that breaks it into smaller pieces with shorter build times. The Github Actions workflow was updated to add CentOS containers. * Fix typo in script comment

* Consolidate Dockerfiles for ExaWorks constituents There were Dockerfiles for Ubuntu20 and CentOS7 builds of ExaWorks and its constituents. The Dockerfiles for the base environment were significantly different but the ones for the constituent pieces only differed in the initial FROM statement at the top. This commit removes the duplicate Dockerfiles for the constituent parts and uses build arguments (e.g. ARG) to specify the OS to use at build time. This elminates 5 extraneous Dockerfiles and increases maintainability. * Fix typo in GHA workflow for CentOS container * Use environment variable in CI workflow to control OS used for containers * Use env context to access container os variable

* Modify CI so that Parsl examples are tested for latest containers * Remove unused CI workflow * Fix typo in container bind mount for parsl test CI workflow

* Split install of bootstrap compiler into its own script Do not remove compiler bootstrap environment because its needed when installing from a build cache. Add explicit specs for compilers to disambiguate. * Update Dockerfiles for Spack exaworks base install * Fix accidental incorrect placement of ./install_bootstrap.sh in Dockerfile * Fix comment "boostrap" typos

* Add miniconda3 and newer gcc, parsl, flux * Rework install scripts to use pip for Parsl and RADICAL This allows the latest versions of Parsl and RADICAL to be installed via pip with miniconda3. It works around the problem that Spack packages are not kept up to date for Parsl and RADICAL. It also drastically speeds up the installation process by eliminating extremely long builds of Parsl and RADICAL dependencies. * Checksums are missing for Flux, so don't try to retreive them

* Create README with a list of links to workflow resources * Add list markdown for resource items * Fix typo * Fix swift/t github link

* Add slurm cluster docker files * Add docker container with test for Parsl+Flux+MPI * Clean up unnecessary files and change Docker build context * Refactor CI workflows for efficiency * Remove unnecessary docker entrypoint files and attempt to optimize docker layer caching in CI

* Add a status badge to the README * Don't use caching in CI when pushing container images to registry * Remove use of docker image cache in push steps

* Consolidate install scripts * Update CI workflow to reflect install script changes

* Initialize Intel compilers silently

Update Dockerfiles and CI workflow to remove loading of Flux Spack env.

* Reorganize tests to compare output against baselines * Fix typo in CI yaml config * Fix typo in CI test yaml * Fix incorrect python for installing parsl via pip * Fix parsl_flux_resource_list baseline for CI resources

* Add portable install scripts Install scripts must take into account when default compiler is different than the one needed in the flux spack environment. * Fix typos in Dockerfile * chown Spack install to admin:admin * Make all three containers use same shell init * Refresh Spack view after install to get correct PYTHONPATH and CPATH * Call flux/parsl install script twice to repair pip damage

* Split installation steps into separate docker layers * Rearrange installation of spack and flux/parsl * Use docker compose v2 * print compose version * Add ghcr docker layer caching back in * Set up intel oneapi and flux env activation in /etc/profile.d * Fix spack environment setup * Fix typos in GHA workflow config * Fix comments in Dockerfiles

* Install Flux with conda instead of Spack This removes use of Spack for installing the required packages. Miniconda3 is installed instead and used to install both Parsl and Flux. The "chiltepin" conda environment is activated at login time. * Add initialization to both .bashrc and .bash_profile

* Switch to GNU compilers and JEDI spack-stack environment * Fix /opt copy mistake and add spack-stack Dockerfile * Update CI workflow * Turn off "load" option since container cannot be both pushed and loaded * Fix typo in CI workflow yaml * Remove debug "which" from parsl/flux test program * Remove commented line and add verification for MPI test * Try changing string quoting in CI yaml * Escape the quote in CI workflow * Remove tab from CI YAML file

* Automate creation of custom spack-stack Dockerfile * Fix incorrect comments, add arguments for stack to use * Enable selection of stack for tests * Add test for spack-stack stack * Update parsl_flux_mpi_hello baseline to work with new tests

* Small fixes to conda environment and creation of machine-specific stacks * Update chiltepin stack for container use * Create a rudimentary stack factory * Add rudimentary factory for config * Move config to YAML files * Update CI to use yaml file argument and remove obsolete JEDI stack calls * Update baseline test data to match change in test code

* Remove obsolete ExaWorks directory * Update frontend container's /opt to be consistent with spack-stack changes

* Restructure repo to have proper locations for package source and tests Port tests to pytest framework * Update chiltepin conda env to add pytest * Rework container to use user modifiable conda install Specify version 7.4.0 of pytest to work around bugs * Add pytest fixture to set up and teardown parsl for flux tests * Improve tests to check for output correctness * Fix bug, only remove previous output if it exists * Update CI workflow to use pytest * Remove unneeded test baseline output

…urces (#45) * Add test to ensure concurrent MPI programs don't use overlapping resources Add mpi_pi.f90 test program for tests requiring longer run times. Update config factory to set cores per node in the provide constructor. Add test * Fix line length issue in mpi_pi.f90 Create ci.yaml configuration specifically for CI testing.

* Update documentation to provide most basic instructions for use * Update README with docker usage * Fix grammatical errors in README

The latest Parsl breaks the latest version of Flux that is available via conda. So we explicitly set the version of Parsl to get the latest one compatible with Flux.

NaureenBharwaniNOAA · 2024-03-27T19:13:09Z

src/chiltepin/jedi/leadtime.py

@@ -0,0 +1,49 @@
+def leadtime_to_seconds(fcst):


if we renamed this from fcst_to_seconds to leadtime_to_seconds. Should we also take advantage of this change and fix the parameter being passed in from fcst to leadtime?

That's a great idea! It does look confusing now.

NaureenBharwaniNOAA · 2024-03-28T17:18:19Z

tests/test_leadtime.py

+from chiltepin.jedi.leadtime import leadtime_to_seconds, seconds_to_leadtime
+
+def test_leadtime_to_seconds():
+    assert leadtime_to_seconds("PT0S") == 0
+    assert leadtime_to_seconds("PT1M") == 0
+    assert leadtime_to_seconds("PT1H") == 3600
+    assert leadtime_to_seconds("PT1H30M") == 3600
+
+    assert leadtime_to_seconds("P1D") == 86400
+    assert leadtime_to_seconds("P1DT1H") == 90000
+    assert leadtime_to_seconds("P1DT1H30M") == 90000
+
+    assert leadtime_to_seconds("MT0S") == -0
+    assert leadtime_to_seconds("MT1H") == -3600
+    assert leadtime_to_seconds("MT1H30M") == -3600
+
+    assert leadtime_to_seconds("M1D") == -86400
+    assert leadtime_to_seconds("M1DT1H") == -90000
+    assert leadtime_to_seconds("M1DT1H30M") == -90000
+    assert leadtime_to_seconds("M1DT1H30M") == -90000
+
+


These numbers are pretty off, also there are only 13 entries for leadtime_to_seconds and 14 supplied values for seconds_to_leadtime. So the values are a little off. The numbers in the code accurately pass the tests. The values bolded and italicized below are the incorrect values.

leadtime.leadtime_to_seconds

"PT0S"

"PT1M"

"PT1H"

"PT1H30M"

"P1D"

"P1DT1H"

"P1DT1H30M"

"MT0S"

"MT1H"

"MT1H30M"

"M1D"

"M1DT1H"

"M1DT1H30M"

leadtime.seconds_to_leadtime

0

60

3600

5400

86400

90000

91800

-0

-60

-3600

-5400

-86400

-90000

-91800

assert leadtime_to_seconds("PT1M") == 0
should be
assert leadtime_to_seconds("PT1M") == 60

assert leadtime_to_seconds("PT1H30M") == 3600
should be
assert leadtime_to_seconds("PT1H30M") == 5400

assert leadtime_to_seconds("P1DT1H30M") == 90000
should be
assert leadtime_to_seconds("P1DT1H30M") == 91800

assert leadtime_to_seconds("MT1H30M") == -3600
should be
assert leadtime_to_seconds("MT1H30M") == -5400

assert leadtime_to_seconds("M1DT1H30M") == -90000
should be
assert leadtime_to_seconds("M1DT1H30M") == -91800

It appears I accidentally left out the MT1M case. That should evaluate to -60. Perhaps the missing case caused mapping from leadtime to seconds to get messed up in the issue. Sorry about that. I'll go edit the issue and fix it.

NaureenBharwaniNOAA · 2024-03-28T19:32:36Z

src/chiltepin/jedi/leadtime.py

Changed all instances of fcst to leadtime

christopherwharrop-noaa and others added 30 commits February 13, 2023 09:34

Initial commit

02e8c40

Update LICENSE and README to be consistent with repository contents (#12

fadc280

)

Feature/install scripts (#13)

1d57070

* Update LICENSE and README to be consistent with repository contents * Add basic install scripts for Spack and ExaWorks SDK * Add basic test for ExaWorks installation

Move workflow to correct place (#14)

86ec97a

Update parsl tests to use latest containers (#23)

734b03d

* Modify CI so that Parsl examples are tested for latest containers * Remove unused CI workflow * Fix typo in container bind mount for parsl test CI workflow

Add collection of workflow resources (#26)

933e2ff

* Create README with a list of links to workflow resources * Add list markdown for resource items * Fix typo * Fix swift/t github link

Remove broken caching and use workaround for broken pip install (#28)

ff66ab4

* Add a status badge to the README * Don't use caching in CI when pushing container images to registry * Remove use of docker image cache in push steps

Consolidate install scripts (#29)

13e2a3d

* Consolidate install scripts * Update CI workflow to reflect install script changes

Feature/silence intel (#30)

ce6dc38

* Initialize Intel compilers silently

Load Flux Spack environment automatically at login (#31)

59b2ece

Update Dockerfiles and CI workflow to remove loading of Flux Spack env.

Reorganize tests to compare output against baselines (#32)

9237d1a

* Reorganize tests to compare output against baselines * Fix typo in CI yaml config * Fix typo in CI test yaml * Fix incorrect python for installing parsl via pip * Fix parsl_flux_resource_list baseline for CI resources

Remove obsolete ExaWorks directory (#41)

8e6ef4f

* Remove obsolete ExaWorks directory * Update frontend container's /opt to be consistent with spack-stack changes

Add documentation (#47)

2e05a6d

* Update documentation to provide most basic instructions for use * Update README with docker usage * Fix grammatical errors in README

Add Hera support and set versions of Parsl and Flux (#49)

0afe2f2

The latest Parsl breaks the latest version of Flux that is available via conda. So we explicitly set the version of Parsl to get the latest one compatible with Flux.

Added test to validate mpi hello run output (#52)

760ef65

Updated ignore files and applied lexicographic ascending sorting (#53)

cd6c079

Move import into function so it can be serialized

8c0eb9c

christopherwharrop-noaa force-pushed the feature/qg branch from 9460442 to 08952f5 Compare March 21, 2024 14:13

christopherwharrop-noaa and others added 14 commits March 21, 2024 09:20

Fix black lint issue

5f153b2

Rework configuration to enable remote executio on GC

ddc8b9f

Add multi-site config file for QG workflow

a1ad3cc

Minor fixes to install, add container config

701a3bf

Add transfer test script

5c1a13f

Add data transfer example with token refresh and storage

b564c1e

Cleanup and fix linting errors

0702eaf

Update variable names

46c72a9

Fix linting problem

cb384b0

Fix linting error

6c4d20e

Fix linting errors

5e7bc70

Fix linting

d3bc04b

Fix linting errors

d5920e6

WIP unit tests

c92aed8

NaureenBharwaniNOAA reviewed Mar 27, 2024

View reviewed changes

NaureenBharwaniNOAA added 2 commits March 27, 2024 19:15

WIP unit tests

a084f01

Add unit tests

759579b

NaureenBharwaniNOAA reviewed Mar 28, 2024

View reviewed changes

Update variable anme

825ac63

NaureenBharwaniNOAA reviewed Mar 28, 2024

View reviewed changes

src/chiltepin/jedi/leadtime.py Outdated

Copy link

Collaborator

NaureenBharwaniNOAA Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed all instances of fcst to leadtime

NaureenBharwaniNOAA and others added 6 commits March 28, 2024 19:35

Ran black & isort

a4e5929

Add unit tests

59a0223

Fix linting issues

d1c060b

Updated tests

dd9ad24

Fix leadtime and parsl flux tests

7c13261

fix isort issue

704bc89

christopherwharrop-noaa closed this May 9, 2024

christopherwharrop-noaa force-pushed the feature/qg branch from 8cbcc85 to 704bc89 Compare May 9, 2024 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add JEDI QG model example cycled DA workflow #85

Add JEDI QG model example cycled DA workflow #85

christopherwharrop-noaa commented Mar 18, 2024 •

edited

Loading

NaureenBharwaniNOAA Mar 27, 2024

christopherwharrop-noaa Mar 27, 2024

NaureenBharwaniNOAA Mar 28, 2024 •

edited

Loading

christopherwharrop-noaa Mar 29, 2024

christopherwharrop-noaa Mar 29, 2024

christopherwharrop-noaa Mar 29, 2024

christopherwharrop-noaa Mar 29, 2024

christopherwharrop-noaa Mar 29, 2024

christopherwharrop-noaa Mar 29, 2024

NaureenBharwaniNOAA Mar 28, 2024

Add JEDI QG model example cycled DA workflow #85

Add JEDI QG model example cycled DA workflow #85

Conversation

christopherwharrop-noaa commented Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NaureenBharwaniNOAA Mar 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

christopherwharrop-noaa commented Mar 18, 2024 •

edited

Loading

NaureenBharwaniNOAA Mar 28, 2024 •

edited

Loading