Add example notebook #26

SarahAlidoost · 2024-06-05T13:27:16Z

Here, I added an example notebook that shows the complete workflow: model training, data processing and prediction.

review-notebook-app · 2024-06-05T13:27:21Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

SarahAlidoost · 2024-06-05T13:33:37Z

@QianqianHan96 here I added an example notebook mainly to show how to optimize the emulator workflow using dask, dask-ml and xarray. There might be small differences in handling data for example, LAI interpolation using data from the previous year. I didn't run the example notebook on snellius with a large dataset. I created this pull request, just in case you would like to add the example to the repository.

QianqianHan96 · 2024-06-05T14:29:52Z

Hi Sarah, thanks for adding the example notebook. I will have a look at it probably at the end of this month when I start producing fluxes dataset. Now I am still working on cleaning the training data.

I will let you know after I go through your optimized script.

QianqianHan96 · 2024-11-20T14:46:07Z

Hi Sarah,

I started working on this pull request this week. Your script is way better organized than the script I shared with you in May.
However, I found that for LAI "preprocessing", for short period and just Europe, there is no problem. But if I do it for one year and global scale, it is super slow (I tried to use same chunk as you "500", but there are too many tasks: 370000 tasks). So maybe it's better to do interpolations first and then export to zarr for LAI, in this way we decrease 81 times data volume from 1km to 9km.

QianqianHan96 · 2024-11-21T16:42:36Z

Hi Sarah,

I started working on this pull request this week. Your script is way better organized than the script I shared with you in May. However, I found that for LAI "preprocessing", for short period and just Europe, there is no problem. But if I do it for one year and global scale, it is super slow (I tried to use same chunk as you "500", but there are too many tasks: 370000 tasks). So maybe it's better to do interpolations first and then export to zarr for LAI, in this way we decrease 81 times data volume from 1km to 9km.

I found that the problem is not about data volume without interpolation. The real reason is chunk size. If we change "chunks=500" to "chunks ={'longitude': 250, 'latitude': 250,'time':750}", there is no problem for global one year. For small data, "chunks=500" is okay, but big data we need to specify chunks for each dimension.

SarahAlidoost added 7 commits May 7, 2024 12:01

add preprocessing to example nb

8cd3c65

add model prediction to nb

71266f2

fix some of the functions

a621516

fix the notebook

df04513

refcator the example notebook

17857e4

rename folder

6e68e37

remove folder EmulatorShowcaseSarah

e5ad927

remove ipynb_checkpoints

24bef89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example notebook #26

Add example notebook #26

SarahAlidoost commented Jun 5, 2024

review-notebook-app bot commented Jun 5, 2024

SarahAlidoost commented Jun 5, 2024

QianqianHan96 commented Jun 5, 2024

QianqianHan96 commented Nov 20, 2024 •

edited

Loading

QianqianHan96 commented Nov 21, 2024 •

edited

Loading

Add example notebook #26

Are you sure you want to change the base?

Add example notebook #26

Conversation

SarahAlidoost commented Jun 5, 2024

review-notebook-app bot commented Jun 5, 2024

SarahAlidoost commented Jun 5, 2024

QianqianHan96 commented Jun 5, 2024

QianqianHan96 commented Nov 20, 2024 • edited Loading

QianqianHan96 commented Nov 21, 2024 • edited Loading

QianqianHan96 commented Nov 20, 2024 •

edited

Loading

QianqianHan96 commented Nov 21, 2024 •

edited

Loading