Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ocean data regridding to Gaussian grid functionality #8

Merged
merged 39 commits into from
Mar 7, 2024

Conversation

danielabdi-noaa
Copy link
Contributor

This adds capability of re-gridding tripolar ocean dataset into a Gaussian grid.

@timothyas Let me know if this is how you want it or if you need a deeper integration with ufs2caro (same config file,
same input/output paths etc). I am assuming regrid comes first so currently it is reading from netcdf file one.

To use the class, please take the steps outlined the doc string of "Regrid" class.

I will leave as a draft

Copy link
Member

@timothyas timothyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @danielabdi-noaa for this, it looks great! I have a ton of comments, but mostly are just fairly small suggestions. I think the main things that we should come to agreement on are:

  1. Code structure: as mentioned in one of my comments, I think this should be a module within ufs2arco, but I'd like to hear what you think
  2. With that in mind, I was also thinking of the regridding as one step in the ufs2arco process. 1) pull the data 2) regrid 3) chunk and store to zarr. If that's the case, then I think we would want to generalize this a bit more, and I think we would move the regrid options to be in the same config file as everything else, and it would just be another section. I think that would make sense since the regridding involves opening a dataset, processing it, and storing a dataset, we may as well hook it up to everything else. What do you think?
  3. Assumptions about the data: Right now this specifically looks for dimension and variable names that are specific to MOM6. We will eventually want to figure out how to make this general to any of the model components, for instance so it could work with CICE6 data. We could also address that in the future, and just make it known that this is specific to MOM6 for now.
  4. xarray vs numpy: Right now this does all of the regridding with numpy arrays, since the .values accessor is used. I haven't used xesmf in a while, but I think we could do all of this in xarray. This would let us clean up some code regarding some of the is_3d if statements, and we could trim down some of the coordinate variable creation. I'd be curious what you think about trying that out.



@__single_arg_fast_cache
def gaussian_latitudes(n):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really nice @danielabdi-noaa ! I'm curious, is this the code that is used to create the same grid that the FV3 is output on? I had imagined that we would just read the FV3 grid from somewhere and use that, but I think this is more elegant if they are the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize that FV3 already outputs on a Gaussian grid, and thought that the gaussian grid is something we would want to use for the AI project due to its desirable properties. Anyway, I checked the grid coordinates contained in the atmosphere files (bfg_/sfg_ files) and it seems to be exactly the same. I will add the option of reading the grid from the atmosphere files just to be safe and use that as the default target interpolation grid.

regrid/regrid.py Outdated
from gaussian_grid import gaussian_latitudes


class Regrid:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment on the structure, I was imagining that this regrid.py module would go in the ufs2arco/ source code directory so that it could be imported with everything else. I do think that the regridding capability is general enough that it could go there, since we're working with grids that are regularly used by the UFS. Did you have something else in mind?

Regarding the class name, what do you think about calling this something like Tripolar2GaussianRegridder or something? I'm picturing a future scenario where we may want to do a different regridding operation, so we may want others open like Gaussian2CubedSphereRegridder or Gaussian2LatLonRegridder ... or something like that. I'm totally open to suggestions on the name, but I think something more specific would be helpful later on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again regarding structure, I would then suggest moving the config-regrid.yaml, regrid_env.yaml into a directory specific to the data you're working with, something like examples/replay-mom6.

regrid/regrid.py Outdated Show resolved Hide resolved
regrid/gaussian_grid.py Outdated Show resolved Hide resolved
regrid/regrid.py Outdated Show resolved Hide resolved
regrid/regrid.py Outdated Show resolved Hide resolved
regrid/regrid.py Outdated Show resolved Hide resolved
regrid/regrid.py Outdated Show resolved Hide resolved
regrid/regrid.py Outdated Show resolved Hide resolved
regrid/regrid.py Outdated Show resolved Hide resolved
@danielabdi-noaa
Copy link
Contributor Author

danielabdi-noaa commented Dec 18, 2023

Thanks @danielabdi-noaa for this, it looks great! I have a ton of comments, but mostly are just fairly small suggestions. I think the main things that we should come to agreement on are:

  1. Code structure: as mentioned in one of my comments, I think this should be a module within ufs2arco, but I'd like to hear what you think
  2. With that in mind, I was also thinking of the regridding as one step in the ufs2arco process. 1) pull the data 2) regrid 3) chunk and store to zarr. If that's the case, then I think we would want to generalize this a bit more, and I think we would move the regrid options to be in the same config file as everything else, and it would just be another section. I think that would make sense since the regridding involves opening a dataset, processing it, and storing a dataset, we may as well hook it up to everything else. What do you think?
  3. Assumptions about the data: Right now this specifically looks for dimension and variable names that are specific to MOM6. We will eventually want to figure out how to make this general to any of the model components, for instance so it could work with CICE6 data. We could also address that in the future, and just make it known that this is specific to MOM6 for now.
  4. xarray vs numpy: Right now this does all of the regridding with numpy arrays, since the .values accessor is used. I haven't used xesmf in a while, but I think we could do all of this in xarray. This would let us clean up some code regarding some of the is_3d if statements, and we could trim down some of the coordinate variable creation. I'd be curious what you think about trying that out.

@timothyas All good points. Here is the current status

  1. Regriding code is now part of ufs2arco

  2. Regriding now entirely works with xarray datasets as inputs/outputs, no writing to file. It is hooked up in example_replay_mom6.ipynb in a three-step process. The dataset ds comes from MOM6Dataset, and it regrids
    and passes it onto the store_dataset function.

lats, lons = RegridMOM6.compute_gaussian_grid(180, 360)
rg = RegridMOM6(lats, lons, ds, config_filename = "config-replay.yaml")
ds = rg.regrid(ds)

The regrid config is merged into the existing yaml file there, under its own section RegridMOM6. It could be inserted into MOM6Dataset instead, but I am not sure which is better. The store_dataset currently assumes original dimensions before regridding xh,yh etc, which are replaced by lon,lat after regridding. Just to make the notebook work, I overwrote the chunks_out with something appropriate for the new dimesnions, but will let you decide how to handle that best.

  1. For now this is specific to MOM6, mainly because the assumption made in the regridding code regarding variable names and their dimensions. Should not be difficult to make that more generic later.

  2. It is now running entirely using xarray, and infact can regrid a dataset created from multiple netcdf files in one go with

ds=rg.regrid(ds)

Copy link
Member

@timothyas timothyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @danielabdi-noaa! I think this looks great, this is the way I was thinking about it too. I agree with your choices to look for a weight file but create one if one doesn't exist, and I like that we have the Gaussian grid generation code here + the option to read a grid. I have a few more comments now:

  • I don't understand the lines that make sure that there are more than 2 dimensions/coordinates, and that time is one of the variables, do you know why this is? Is this to avoid regridding coordinate (non-dimension) variables? I'm just thinking that it could be the case that we want to regrid a bunch of 2D variables at a single point in time.
  • I'm now unable to run the docs because I don't have a grid rotation file. We could make this file available online, but I'd actually suggest that we make the rotation file an optional argument, and ignore vector fields if they are not present. I've made suggestions throughout the code to enable that, but let me know what you think.
  • There are some issues with some of the xesmf dependencies that make it so we can't import xesmf without activating the conda environment (sometimes requiring deactivating it and reactivating it). Since I think many people will not necessarily want to do regridding, I suggest we make this import optional. What do you think? This would be one way to side step the windows build that is failing on our CI workflow, which I don't understand and don't really feel like debugging :)
  • I added some small updates so that the documentation renders on readthedocs
  • I have some small commits for the documentation. Would you mind if I committed these to your branch?

ufs2arco/gaussian_grid.py Outdated Show resolved Hide resolved
ufs2arco/gaussian_grid.py Outdated Show resolved Hide resolved
ufs2arco/regrid_mom6.py Outdated Show resolved Hide resolved
ufs2arco/regrid_mom6.py Outdated Show resolved Hide resolved
ufs2arco/regrid_mom6.py Outdated Show resolved Hide resolved
ufs2arco/regrid_mom6.py Outdated Show resolved Hide resolved
ufs2arco/regrid_mom6.py Outdated Show resolved Hide resolved
ufs2arco/regrid_mom6.py Outdated Show resolved Hide resolved
ufs2arco/regrid_mom6.py Outdated Show resolved Hide resolved
ufs2arco/regrid_mom6.py Outdated Show resolved Hide resolved
@timothyas
Copy link
Member

@pjpegion the CICE6 output seems to always come with ANGLE (long_name = angle grid makes with latitude line on U grid). Is this the field needed to regrid vector fields? Can we always assume that this is present?

ufs2arco/regrid_ufs.py Outdated Show resolved Hide resolved
ufs2arco/regrid_ufs.py Outdated Show resolved Hide resolved
ufs2arco/regrid_ufs.py Outdated Show resolved Hide resolved
@pjpegion
Copy link

@pjpegion the CICE6 output seems to always come with ANGLE (long_name = angle grid makes with latitude line on U grid). Is this the field needed to regrid vector fields? Can we always assume that this is present?

No, it may not be present. We should point to a seperate grid file.

@danielabdi-noaa danielabdi-noaa marked this pull request as ready for review December 20, 2023 18:37
@timothyas
Copy link
Member

This looks great @danielabdi-noaa, thanks for all your work and for iterating back and forth with me. I don't understand why the windows build is still failing, I can check it out next week though. Otherwise I think this is good to go

@danielabdi-noaa
Copy link
Contributor Author

@timothyas It is failing because arguments passed to regrid use xe.Regridder and xesmf is not installed on windows. The solution would be to put everything in a try...except block but is an ugly solution in my opinion. If it is possible to skip the Windows CI run on the regridding files, that maybe better until you figure out the issue with the xesmf install on windows.
Thanks for all the tips and contribution to the PR!

@timothyas
Copy link
Member

@danielabdi-noaa something I thought about relating to this PR is that it might be regridding more variables than we want. For instance if we only want temperature and salinity from our MOM6Dataset, this current code will still regrid everything it finds, then subset the variables. The thing is that right now we would have Regridder objects referencing items in the Dataset section of the yaml file, so I think it's probably cleanest to merge this PR, then implement that change. Do you agree? If so we should open an issue with that so we don't forget.

ufs2arco/regrid/ufsregridder.py Outdated Show resolved Hide resolved
ufs2arco/regrid/ufsregridder.py Outdated Show resolved Hide resolved
ufs2arco/regrid/ufsregridder.py Outdated Show resolved Hide resolved
@danielabdi-noaa
Copy link
Contributor Author

@timothyas I have committed your suggestions. Lets see if it solves the CI issue!

@timothyas
Copy link
Member

Hi @danielabdi-noaa, with that suggestion the CI is passing. Then after #5 was merged there was a minor conflict in ufs2arco/init.py which I hope you don't mind I just resolved (basically the importing of Timer and classes from .regrid were on the same line). So now I think this looks good, let me know if you think it's ready and @frolovsa it is ready for your eyes whenever you get the chance.

@danielabdi-noaa
Copy link
Contributor Author

@timothyas Looks good to me!

@timothyas
Copy link
Member

I just realized this never got merged. Sorry @danielabdi-noaa ! I gave it another look and I think it's good to go. Merging now.. 🚀

@timothyas timothyas merged commit f6c777d into NOAA-PSL:main Mar 7, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants