Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example notebook #26

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Add example notebook #26

wants to merge 8 commits into from

Conversation

SarahAlidoost
Copy link
Member

Here, I added an example notebook that shows the complete workflow: model training, data processing and prediction.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@SarahAlidoost
Copy link
Member Author

@QianqianHan96 here I added an example notebook mainly to show how to optimize the emulator workflow using dask, dask-ml and xarray. There might be small differences in handling data for example, LAI interpolation using data from the previous year. I didn't run the example notebook on snellius with a large dataset. I created this pull request, just in case you would like to add the example to the repository.

@QianqianHan96
Copy link
Collaborator

Hi Sarah, thanks for adding the example notebook. I will have a look at it probably at the end of this month when I start producing fluxes dataset. Now I am still working on cleaning the training data.

I will let you know after I go through your optimized script.

@QianqianHan96
Copy link
Collaborator

QianqianHan96 commented Nov 20, 2024

Hi Sarah,

I started working on this pull request this week. Your script is way better organized than the script I shared with you in May.
However, I found that for LAI "preprocessing", for short period and just Europe, there is no problem. But if I do it for one year and global scale, it is super slow (I tried to use same chunk as you "500", but there are too many tasks: 370000 tasks). So maybe it's better to do interpolations first and then export to zarr for LAI, in this way we decrease 81 times data volume from 1km to 9km.
image
image

@QianqianHan96
Copy link
Collaborator

QianqianHan96 commented Nov 21, 2024

Hi Sarah,

I started working on this pull request this week. Your script is way better organized than the script I shared with you in May. However, I found that for LAI "preprocessing", for short period and just Europe, there is no problem. But if I do it for one year and global scale, it is super slow (I tried to use same chunk as you "500", but there are too many tasks: 370000 tasks). So maybe it's better to do interpolations first and then export to zarr for LAI, in this way we decrease 81 times data volume from 1km to 9km. image image

I found that the problem is not about data volume without interpolation. The real reason is chunk size. If we change "chunks=500" to "chunks ={'longitude': 250, 'latitude': 250,'time':750}", there is no problem for global one year. For small data, "chunks=500" is okay, but big data we need to specify chunks for each dimension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants