Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: CDAT migration: potential performance or memory bottlenecks handling time-series data files #892

Open
chengzhuzhang opened this issue Nov 8, 2024 · 4 comments · May be fixed by #907
Assignees
Labels
bug Bug fix (will increment patch version)

Comments

@chengzhuzhang
Copy link
Contributor

chengzhuzhang commented Nov 8, 2024

What happened?

This happened again in a test (climo) vs obs (time-series) run case. When using time-series as input, the run was either killed or hung without finishing. I suspect there is a performance/memory bottleneck in handling reading and computing climatology on the fly from time-series files. More debugging is needed.

Running the same variable on main branch takes shorter than 1 min wall time.

What did you expect to happen? Are there are possible answers you came across?

No response

Minimal Complete Verifiable Example (MVCE)

Command line:
python run_script.py -d U.cfg

# run_script.py

import os
from e3sm_diags.parameter.core_parameter import CoreParameter
from e3sm_diags.run import runner

param = CoreParameter()


param.reference_data_path = '/global/cfs/cdirs/e3sm/diagnostics/observations/Atm/time-series'
param.test_data_path = '/global/cfs/cdirs/e3sm/chengzhu/eamxx/post/data/rgr'
param.test_name = 'eamxx_decadal'
param.seasons = ["ANN"]
#param.save_netcdf = True

param.ref_timeseries_input = True
# Years to slice the ref data, base this off the years in the filenames.
param.ref_start_yr = "1996"
param.ref_end_yr = "1996"

prefix = '/global/cfs/cdirs/e3sm/www/zhang40/tests/eamxx'
param.results_dir = os.path.join(prefix, 'eamxx_decadal_1996_1107_edv3')

runner.sets_to_run = ["lat_lon",
        "zonal_mean_xy",
        "zonal_mean_2d",
        "zonal_mean_2d_stratosphere",
        "polar",
        "cosp_histogram",
        "meridional_mean_2d",
        "annual_cycle_zonal_mean",]

runner.run_diags([param])



# U.cfg
[#]
sets = ["lat_lon"]
case_id = "ERA5"
variables = ["U"]
ref_name = "ERA5"
reference_name = "ERA5 Reanalysis"
seasons = ["ANN", "01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "DJF", "MAM", "JJA", "SON"]
plevs = [850.0]
test_colormap = "PiYG_r"
reference_colormap = "PiYG_r"
contour_levels = [-20, -15, -10, -8, -5, -3, -1, 1, 3, 5, 8, 10, 15, 20]
diff_levels = [-8, -6, -5, -4, -3, -2, -1, 1, 2, 3, 4, 5, 6, 8]
regrid_method = "bilinear"

Relevant log output

No response

Anything else we need to know?

No response

Environment

cdat-migration-fy24 branch

@chengzhuzhang chengzhuzhang added the bug Bug fix (will increment patch version) label Nov 8, 2024
@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Nov 8, 2024

This time series variable (U) is being derived from other variables. I noticed one of the variable being used for derivation is ua. The dataset is 76 GB, which on my end causes a hang then crash when trying to call ds.load()

You confirmed the CDAT code is handling these larger datasets? I don't recall other diags using datasets this large, but I might be wrong.

@tomvothecoder
Copy link
Collaborator

RE: My comment above.

I think the ds.load() call is trying to load the entire time series into memory, rather than the time slice. I'll try switching the order of subsetting to slice on time first before ds.load(), which should cut down on the memory requirements.

@tomvothecoder tomvothecoder mentioned this issue Nov 8, 2024
9 tasks
@chengzhuzhang
Copy link
Contributor Author

Closed with #892

@tomvothecoder
Copy link
Collaborator

Reopening because we found this is happening again in #880.

@tomvothecoder tomvothecoder reopened this Dec 16, 2024
@tomvothecoder tomvothecoder linked a pull request Dec 17, 2024 that will close this issue
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fix (will increment patch version)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants