-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JEDI QG model example cycled DA workflow #85
Conversation
* Update LICENSE and README to be consistent with repository contents * Add basic install scripts for Spack and ExaWorks SDK * Add basic test for ExaWorks installation
* Add ExaWorks Docker containers and update ExaWorks install scripts Dockerfiles for both Ubuntu and CentOS containers is added. Scripts for installing Spack and Exaworks are updated. Add basic "hello" Parsl test script. Modify CI workflow to use containers for testing ExaWorks. * Rename CI workflow Modify CI workflow names to be consistent with current usage. * Fix typo in comment and add newline at end of Docker files
* Add CI test for Docker container and update checkout action for test example CI Also, add py-pylint to the Spack install script to enable better testing. * Restructure ExaWorks installation into small pieces Create installation scripts for each main ExaWorks component. Create Docker files for each main ExaWorks component. Use Spack build caches to optimize installation times. Update Github Actions workflow to leverage Spack build caches. Update Github Actions workflow to split install into different build jobs. * Add missing colon from workflow CI yml file * Fix another missing colon in the workflow yaml file * Rework Docker containers and install scripts Install scripts and containers were modified to build and use a "bootstrap" compiler for building the exaworks software. This is necessary because dependencies shared between various components of exaworks will induce multiple rebuilds of the gcc compiler, which takes a very long time. This is avoided by bootstrapping gcc with the native compiler, and then rebuilding the newer gcc with itself in a "base" environment. In this way, gcc is built twice for the base environment, but subsequent environments using the "base" environment will not rebuild gcc. This commit also adds CentOS7 containers for the multi-stage build of exaworks that breaks it into smaller pieces with shorter build times. The Github Actions workflow was updated to add CentOS containers. * Fix typo in script comment
* Consolidate Dockerfiles for ExaWorks constituents There were Dockerfiles for Ubuntu20 and CentOS7 builds of ExaWorks and its constituents. The Dockerfiles for the base environment were significantly different but the ones for the constituent pieces only differed in the initial FROM statement at the top. This commit removes the duplicate Dockerfiles for the constituent parts and uses build arguments (e.g. ARG) to specify the OS to use at build time. This elminates 5 extraneous Dockerfiles and increases maintainability. * Fix typo in GHA workflow for CentOS container * Use environment variable in CI workflow to control OS used for containers * Use env context to access container os variable
* Modify CI so that Parsl examples are tested for latest containers * Remove unused CI workflow * Fix typo in container bind mount for parsl test CI workflow
* Split install of bootstrap compiler into its own script Do not remove compiler bootstrap environment because its needed when installing from a build cache. Add explicit specs for compilers to disambiguate. * Update Dockerfiles for Spack exaworks base install * Fix accidental incorrect placement of ./install_bootstrap.sh in Dockerfile * Fix comment "boostrap" typos
* Add miniconda3 and newer gcc, parsl, flux * Rework install scripts to use pip for Parsl and RADICAL This allows the latest versions of Parsl and RADICAL to be installed via pip with miniconda3. It works around the problem that Spack packages are not kept up to date for Parsl and RADICAL. It also drastically speeds up the installation process by eliminating extremely long builds of Parsl and RADICAL dependencies. * Checksums are missing for Flux, so don't try to retreive them
* Create README with a list of links to workflow resources * Add list markdown for resource items * Fix typo * Fix swift/t github link
* Add slurm cluster docker files * Add docker container with test for Parsl+Flux+MPI * Clean up unnecessary files and change Docker build context * Refactor CI workflows for efficiency * Remove unnecessary docker entrypoint files and attempt to optimize docker layer caching in CI
* Add a status badge to the README * Don't use caching in CI when pushing container images to registry * Remove use of docker image cache in push steps
* Consolidate install scripts * Update CI workflow to reflect install script changes
* Initialize Intel compilers silently
Update Dockerfiles and CI workflow to remove loading of Flux Spack env.
* Reorganize tests to compare output against baselines * Fix typo in CI yaml config * Fix typo in CI test yaml * Fix incorrect python for installing parsl via pip * Fix parsl_flux_resource_list baseline for CI resources
* Add portable install scripts Install scripts must take into account when default compiler is different than the one needed in the flux spack environment. * Fix typos in Dockerfile * chown Spack install to admin:admin * Make all three containers use same shell init * Refresh Spack view after install to get correct PYTHONPATH and CPATH * Call flux/parsl install script twice to repair pip damage
* Split installation steps into separate docker layers * Rearrange installation of spack and flux/parsl * Use docker compose v2 * print compose version * Add ghcr docker layer caching back in * Set up intel oneapi and flux env activation in /etc/profile.d * Fix spack environment setup * Fix typos in GHA workflow config * Fix comments in Dockerfiles
* Install Flux with conda instead of Spack This removes use of Spack for installing the required packages. Miniconda3 is installed instead and used to install both Parsl and Flux. The "chiltepin" conda environment is activated at login time. * Add initialization to both .bashrc and .bash_profile
* Switch to GNU compilers and JEDI spack-stack environment * Fix /opt copy mistake and add spack-stack Dockerfile * Update CI workflow * Turn off "load" option since container cannot be both pushed and loaded * Fix typo in CI workflow yaml * Remove debug "which" from parsl/flux test program * Remove commented line and add verification for MPI test * Try changing string quoting in CI yaml * Escape the quote in CI workflow * Remove tab from CI YAML file
* Automate creation of custom spack-stack Dockerfile * Fix incorrect comments, add arguments for stack to use * Enable selection of stack for tests * Add test for spack-stack stack * Update parsl_flux_mpi_hello baseline to work with new tests
* Small fixes to conda environment and creation of machine-specific stacks * Update chiltepin stack for container use * Create a rudimentary stack factory * Add rudimentary factory for config * Move config to YAML files * Update CI to use yaml file argument and remove obsolete JEDI stack calls * Update baseline test data to match change in test code
* Remove obsolete ExaWorks directory * Update frontend container's /opt to be consistent with spack-stack changes
* Restructure repo to have proper locations for package source and tests Port tests to pytest framework * Update chiltepin conda env to add pytest * Rework container to use user modifiable conda install Specify version 7.4.0 of pytest to work around bugs * Add pytest fixture to set up and teardown parsl for flux tests * Improve tests to check for output correctness * Fix bug, only remove previous output if it exists * Update CI workflow to use pytest * Remove unneeded test baseline output
…urces (#45) * Add test to ensure concurrent MPI programs don't use overlapping resources Add mpi_pi.f90 test program for tests requiring longer run times. Update config factory to set cores per node in the provide constructor. Add test * Fix line length issue in mpi_pi.f90 Create ci.yaml configuration specifically for CI testing.
* Update documentation to provide most basic instructions for use * Update README with docker usage * Fix grammatical errors in README
The latest Parsl breaks the latest version of Flux that is available via conda. So we explicitly set the version of Parsl to get the latest one compatible with Flux.
9460442
to
08952f5
Compare
src/chiltepin/jedi/leadtime.py
Outdated
@@ -0,0 +1,49 @@ | |||
def leadtime_to_seconds(fcst): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we renamed this from fcst_to_seconds
to leadtime_to_seconds
. Should we also take advantage of this change and fix the parameter being passed in from fcst
to leadtime
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea! It does look confusing now.
tests/test_leadtime.py
Outdated
from chiltepin.jedi.leadtime import leadtime_to_seconds, seconds_to_leadtime | ||
|
||
def test_leadtime_to_seconds(): | ||
assert leadtime_to_seconds("PT0S") == 0 | ||
assert leadtime_to_seconds("PT1M") == 0 | ||
assert leadtime_to_seconds("PT1H") == 3600 | ||
assert leadtime_to_seconds("PT1H30M") == 3600 | ||
|
||
assert leadtime_to_seconds("P1D") == 86400 | ||
assert leadtime_to_seconds("P1DT1H") == 90000 | ||
assert leadtime_to_seconds("P1DT1H30M") == 90000 | ||
|
||
assert leadtime_to_seconds("MT0S") == -0 | ||
assert leadtime_to_seconds("MT1H") == -3600 | ||
assert leadtime_to_seconds("MT1H30M") == -3600 | ||
|
||
assert leadtime_to_seconds("M1D") == -86400 | ||
assert leadtime_to_seconds("M1DT1H") == -90000 | ||
assert leadtime_to_seconds("M1DT1H30M") == -90000 | ||
assert leadtime_to_seconds("M1DT1H30M") == -90000 | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These numbers are pretty off, also there are only 13 entries for leadtime_to_seconds
and 14 supplied values for seconds_to_leadtime
. So the values are a little off. The numbers in the code accurately pass the tests. The values bolded and italicized below are the incorrect values.
leadtime.leadtime_to_seconds
-
"PT0S"
-
"PT1M"
-
"PT1H"
-
"PT1H30M"
-
"P1D"
-
"P1DT1H"
-
"P1DT1H30M"
-
"MT0S"
-
"MT1H"
-
"MT1H30M"
-
"M1D"
-
"M1DT1H"
-
"M1DT1H30M"
leadtime.seconds_to_leadtime
-
0
-
60
-
3600
-
5400
-
86400
-
90000
-
91800
-
-0
-
-60
-
-3600
-
-5400
-
-86400
-
-90000
-
-91800
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert leadtime_to_seconds("PT1M") == 0
should be
assert leadtime_to_seconds("PT1M") == 60
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert leadtime_to_seconds("PT1H30M") == 3600
should be
assert leadtime_to_seconds("PT1H30M") == 5400
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert leadtime_to_seconds("P1DT1H30M") == 90000
should be
assert leadtime_to_seconds("P1DT1H30M") == 91800
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert leadtime_to_seconds("MT1H30M") == -3600
should be
assert leadtime_to_seconds("MT1H30M") == -5400
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert leadtime_to_seconds("M1DT1H30M") == -90000
should be
assert leadtime_to_seconds("M1DT1H30M") == -91800
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears I accidentally left out the MT1M
case. That should evaluate to -60
. Perhaps the missing case caused mapping from leadtime to seconds to get messed up in the issue. Sorry about that. I'll go edit the issue and fix it.
src/chiltepin/jedi/leadtime.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed all instances of fcst
to leadtime
8cbcc85
to
704bc89
Compare
This PR adds a collection of Parsl wrappers that configure and execute the various applications needed for a simple cycled 3DVar JEDI QG workflow. The wrappers consist of "configure" and "run" tasks that execute separately on different resources. The "configure" tasks will run on either a "service" (when internet access is required) or a "serial" resource while the "run" tasks can execute on compute nodes in parallel. Wrappers are created for: installing JEDI, running a "truth" forecast, creating simulated observations from the "truth", running 3DVar using the simulated obs and a background from a previous cycle forecast, and running a forecast initialized from the analysis produced by 3DVar.
These are not the most efficient or best organized wrappers. Much more thought should be put into design that can be applicable for future applications and experiment types. However, effort was made to build configurable components that could be used to run cycled 3DVar experiments with varying configuration options.
An experiment program,
workflow.py
was created to illustrate how the wrappers can be used to build a workflow.Automated tests for the various wrappers has not been added yet because it isn't clear what the best way to do it is. All such tests would depend on a JEDI install, which takes time to build, and would take even longer in CI. Testing will be added once a strategy is determined.
Closes #84