-
Notifications
You must be signed in to change notification settings - Fork 51
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
included change requests from leifdenby:
- removed linting dependencies - minor changes to test file - added notebook outlining generation of meps_example_reduced from meps_example
- Loading branch information
1 parent
9352949
commit 4995de0
Showing
3 changed files
with
252 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,237 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Creating meps_example_reduced\n", | ||
"This notebook outlines how the small-size test dataset meps_example_reduced was created based on the slightly larger dataset meps_example. The zipped up datasets are 263 MB and 2.6 GB, respectively.\n", | ||
"\n", | ||
"The dataset was reduced in size by reducing the number of grid points and variables.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Standard library\n", | ||
"import os\n", | ||
"\n", | ||
"# Third-party\n", | ||
"import numpy as np\n", | ||
"import torch" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"The number of grid points was reduced to 1/4 by halving the number of coordinates in both the x and y direction. This was done by removing a quarter of the grid points along each outer edge, so the center grid points would stay centered in the new set.\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Load existing grid\n", | ||
"grid_xy = np.load('data/meps_example/static/nwp_xy.npy')\n", | ||
"# Get slices in each dimension by cutting off a quarter along each edge\n", | ||
"num_x, num_y = grid_xy.shape[1:]\n", | ||
"x_slice = slice(num_x//4, 3*num_x//4)\n", | ||
"y_slice = slice(num_y//4, 3*num_y//4)\n", | ||
"# Index and save reduced grid\n", | ||
"grid_xy_reduced = grid_xy[:, x_slice, y_slice]\n", | ||
"np.save('data/meps_example_reduced/static/nwp_xy.npy', grid_xy_reduced)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"This cut out the border, so a new perimeter of 10 grid points was established as border (10 was also the border size in the original \"meps_example\").\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Outer 10 grid points are border\n", | ||
"old_border_mask = np.load('data/meps_example/static/border_mask.npy')\n", | ||
"assert np.all(old_border_mask[10:-10, 10:-10] == False)\n", | ||
"assert np.all(old_border_mask[:10, :] == True)\n", | ||
"assert np.all(old_border_mask[:, :10] == True)\n", | ||
"assert np.all(old_border_mask[-10:,:] == True)\n", | ||
"assert np.all(old_border_mask[:,-10:] == True)\n", | ||
"\n", | ||
"# Create new array with False everywhere but the outer 10 grid points\n", | ||
"border_mask = np.zeros_like(grid_xy_reduced[0,:,:], dtype=bool)\n", | ||
"border_mask[:10] = True\n", | ||
"border_mask[:,:10] = True\n", | ||
"border_mask[-10:] = True\n", | ||
"border_mask[:,-10:] = True\n", | ||
"np.save('data/meps_example_reduced/static/border_mask.npy', border_mask)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"A few other files also needed to be copied using only the new reduced grid" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Load surface_geopotential.npy, index only values from the reduced grid, and save to new file\n", | ||
"surface_geopotential = np.load('data/meps_example/static/surface_geopotential.npy')\n", | ||
"surface_geopotential_reduced = surface_geopotential[x_slice, y_slice]\n", | ||
"np.save('data/meps_example_reduced/static/surface_geopotential.npy', surface_geopotential_reduced)\n", | ||
"\n", | ||
"# Load pytorch file grid_features.pt\n", | ||
"grid_features = torch.load('data/meps_example/static/grid_features.pt')\n", | ||
"# Index only values from the reduced grid. \n", | ||
"# First reshape from (num_grid_points_total, 4) to (num_grid_points_x, num_grid_points_y, 4), \n", | ||
"# then index, then reshape back to new total number of grid points\n", | ||
"print(grid_features.shape)\n", | ||
"grid_features_new = grid_features.reshape(num_x, num_y, 4)[x_slice,y_slice,:].reshape((-1, 4))\n", | ||
"# Save to new file\n", | ||
"torch.save(grid_features_new, 'data/meps_example_reduced/static/grid_features.pt')\n", | ||
"\n", | ||
"# flux_stats.pt is just a vector of length 2, so the grid shape and variable changes does not change this file\n", | ||
"torch.save(torch.load('data/meps_example/static/flux_stats.pt'), 'data/meps_example_reduced/static/flux_stats.pt')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"The number of variables was reduced by truncating the variable list to the first 8." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"num_vars = 8\n", | ||
"\n", | ||
"# Load parameter_weights.npy, truncate to first 8 variables, and save to new file\n", | ||
"parameter_weights = np.load('data/meps_example/static/parameter_weights.npy')\n", | ||
"parameter_weights_reduced = parameter_weights[:num_vars]\n", | ||
"np.save('data/meps_example_reduced/static/parameter_weights.npy', parameter_weights_reduced)\n", | ||
"\n", | ||
"# Do the same for following 4 pytorch files\n", | ||
"for file in ['diff_mean', 'diff_std', 'parameter_mean', 'parameter_std']:\n", | ||
" old_file = torch.load(f'data/meps_example/static/{file}.pt')\n", | ||
" new_file = old_file[:num_vars]\n", | ||
" torch.save(new_file, f'data/meps_example_reduced/static/{file}.pt')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Lastly the files in each of the directories train, test, and val have to be reduced. The folders all have the same structure with files of the following types:\n", | ||
"```\n", | ||
"nwp_YYYYMMDDHH_mbrXXX.npy\n", | ||
"wtr_YYYYMMDDHH.npy\n", | ||
"nwp_toa_downwelling_shortwave_flux_YYYYMMDDHH.npy\n", | ||
"```\n", | ||
"with ```YYYYMMDDHH``` being some date with hours, and ```XXX``` being some 3-digit integer.\n", | ||
"\n", | ||
"The first type of file has x and y in dimensions 1 and 2, and variable index in dimension 3. Dimension 0 is unchanged.\n", | ||
"The second type has has x and y in dimensions 1 and 2. Dimension 0 is unchanged.\n", | ||
"The last type has just x and y as the only 2 dimensions.\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 12, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"(65, 268, 238, 18)\n", | ||
"(65, 268, 238)\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print(np.load('data/meps_example/samples/train/nwp_2022040100_mbr000.npy').shape)\n", | ||
"print(np.load('data/meps_example/samples/train/nwp_toa_downwelling_shortwave_flux_2022040112.npy').shape)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The following loop goes through each file in each sample folder and indexes them according to the dimensions given by the file name." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"for sample in ['train', 'test', 'val']:\n", | ||
" files = os.listdir(f'data/meps_example/samples/{sample}')\n", | ||
"\n", | ||
" for f in files:\n", | ||
" data = np.load(f'data/meps_example/samples/{sample}/{f}')\n", | ||
" if 'mbr' in f:\n", | ||
" data = data[:,x_slice,y_slice,:num_vars]\n", | ||
" elif 'wtr' in f:\n", | ||
" data = data[x_slice, y_slice]\n", | ||
" else:\n", | ||
" data = data[:,x_slice,y_slice]\n", | ||
" np.save(f'data/meps_example_reduced/samples/{sample}/{f}', data)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Lastly, the file ```data_config.yaml``` is modified manually by truncating the variable units, long and short names, and setting the new grid shape. Also the unit descriptions containing ```^``` was automatically parsed using latex, and to avoid having to install latex in the GitHub CI/CD pipeline, this was changed to ```**```." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.14" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters