-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple Zarr to Rule them All #54
Conversation
user configs are retrieved either from data_config.yaml or they are set as flags to train_model.py Capabilities of ConfigLoader class extended
lat lon specifications make the code more flexible
Zarr is registered to model buffer Normalization happens on device on_after_batch_transfer
…on/neural-lam into feature_dataset_yaml
This is amazing stuff Simon! 😄 Looking forward to sink my teeth into all of this. I don't mind this being one big PR, as these changes are so interconnected. I'll do a proper review of this later, but here's a short answer to the question about variable weights for now: I think we really want to keep the option to have a fixed variable weighting. There are in many cases good reasons to want to specify such weights manually rather than letting the model handle the balancing. I liked how it was handled before, with everything converted to standard-deviations that were either output by the model or derived from manual weights. That being said, I don't mind if we get rid of it with this PR and just re-introduce it later. As you rightly point out the logic for how to create and use parameter weights has to be reworked anyhow. We can make a new issue about that, but my initial thoughts would be to have such weights specified in the config yaml, either per-variable or using some generic strategy names (e.g. |
the behavior of xr.Dataset.dims will change soon
Yes, amazing work @sadamov! I will read through this over the next days and then we can may schedule a call to discuss in more detail? There is some of the reasoning in the structure I don't quite follow, but that is probably just because I have thought about some things slightly differently :) |
HI @sadamov, It seems that your example Kind regards, |
@mpvginde That is currently not possible. I just discussed it with @leifdenby and this functionality will be added back before the PR is merged. Good point, thanks! |
Thanks for the response. |
This PR is superceded by #66 |
Describe your changes
This PR is massive and was heavily discussed in the dev-team. It introduces a shift towards .zarr based data storage and .yaml based interaction with them using config files. The model itself should now be fully domain agnostic. All user input is either defined in aforementioned yaml files or via the
train_model.py
flags.Please have a look at #24 to learn more about the main reasoning behind these changes. In short:
New dependencies:
xarray, dask, zarr, pandas
BREAKING
While this PR tries to implement all functionalities present in
main
there is one major setup that is not supported anylonger. The training is based on (re)-analysis now and the reforecast based training previously used with e.g. MEPS is not supported. Especially thesubsample_step
logic has been removed from the code. This functionality can be re-introduced in "zarr-workflow" with a later PR. This will also break the tests!MISSING
The whole handling of the boundary data is only implemented in the dataloading part but not actually used in the model. This will be added in a later PR. Parameter Weights are currently not used in the model since @joeloskarsson added probabilistic loss functions.
Issue Links
Solves #24
File by file changes
neural_lam/config.py
This is the main workhorse and since I am not an expert in OOP, please double-check the decorators carefully (e.g. @leifdenby). I tried to set this up in a modular way where each function can be called independently. The main method calledprocess_dataset()
combines most of the individual steps and makes the data ready for the Torch Dataset Class. I tried to capture some errors and print some useful information without going all in on perfect software engineering. Since that seemed to match with the repo in other places. I am certain that other people will have different needs and am more than happy to add/change functionality already in this PR.docs/download_danra.py
This should be executed to run the tests and by anyone using the defaultconfig_data.py
neural_lam/models/ar_model.py
Update the handling of static fields that are now supplied in zarr archives. Also, handling ofbatch_times
mostly for plotting.neural_lam/data_config.yaml
Zarr based dataloading and specifications fully implemented. This is now documented in detail inREADME.md
neural_lam/utils.py
Loader functions not required anylonger as this is now handled by theconfig.Config
class.neural_lam/weather_dataset.py
This is now a classical Torch Dataset and PL DataLoader class based on xarray. All additional features like forcing calculation and windowing were placed in separate scripts.tests
The tests are broken, as mentioned below I will need some help here to fix them. I added my own test based on the zarr-yaml workflow. Currently 2/3 work okay.calculate_statistics.py
is now purely based on xarray and therefore natively fast, lazy and executed in parallel. This replacescreate_parameter_weights.py
. @leifdenby Which makes my midnight-merge last week kind of useless ^^. @joeloskarsson Is my understanding right that we don't require parameter_weights anylonger, given that people are using the right loss functions? I have removed all weighting (even vertical levels) from the code should these be re-introduced somewhere?create_boundary_mask.py
is one of many "helper"-scripts to generate static and forcing features if the user doesn't provide them themselves. As the name states, this one creates the old type of "in-domain" bordermask.create_forcings.py
stores datetime forcings previously calculated in the WeatherDataset in a zarr archive. This and previous helper-scripts replacecreate_grid_features.py
that was previously used to load pre-computed and stored numpy arrays.create_mesh.py
andplot_graph.py
the loading of xy-grid is now streamlined (and hopefully in the right order @bet20ICL) using theconfig.Config.get_xy()
train_model.py
with the new dataloader class everything is neat and tidyRemarks
I did some thorough testing of this code, but given the sheer size of the PR it was not easy. I am open to both in-depth reviews and/or splitting the PR up into multiple smaller ones.
@SimonKamuk the tests don't work anymore and I tried my best to re-implement them for DANRA. But the pooch does not seem to work with Danra. I am always getting access rights issues, unless I access them directly with xr.open_zarr("..."). Could you help me fix them? Currently, your tests are disabled as well as my last test.
That being said, I am pretty proud to say that it is now possible to fully train Neural-LAM on DANRA data (and COSMO btw) and here I want to show some technical results as proof:
This is the graph created by
create_mesh.py
and the new get_xy() logic:This is the validation error_map after very few training steps and a subset of variables defined in
data_config.yaml
This is the resulting map of a 10step * 3hour forecast using the model in
--eval "test"
mode:Please, test this code with your own data over your own domain, this should help confirming that the model is truly domain-agnostic now.
Type of change
Checklist before requesting a review
pull
with--rebase
option if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
reflecting type of change, for example "bug fixes", add section where
missing)
Checklist for assignee