-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEFS download fixes #3349
base: develop
Are you sure you want to change the base?
GEFS download fixes #3349
Conversation
#for(j in 1:31){ | ||
if(ens_index == 1){ | ||
base_filename2 <- paste0("gec00",".t",cycle,"z.pgrb2a.0p50.f") | ||
curr_hours <- hours_char[hours <= 384] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my changes in this hunk are just to simplify the logic, but flagging that I can't figure out why this line was here at all: All ensemble members from a given forecast cycle have the same number of hours available, so my unconfident best guess is someone got confused between cycle ids and ensemble ids. But cycle 00 is the one with more hours, so this would have been wrong anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think there are some ensemble members that are the full 35 days, and others that are shorter, even for cycle 00
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point to any examples of the shorter members, or documentation that mentions them? I've been poking around and haven't yet found any yet, but if this is an intermittent thing it'd be easy for me to have missed it.
curr_time <- lubridate::with_tz(Sys.time(), tzone = "UTC") | ||
curr_date <- lubridate::as_date(curr_time) | ||
|
||
noaa_page <- readLines('https://nomads.ncep.noaa.gov/pub/data/nccf/com/gens/prod/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These pages (here and on deleted line 91 below) load fine in a browser, but inside R and via curl they throw Stream error in the HTTP/2 framing layer
. But all this scraping and extracting can be replaced by knowing that the forecasts are retained for four days -- am I missing cases where data posting isn't reliable enough to assume "today and the three days before that"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My experience is there can be delays in when files show up and occasionally gaps, which is why we didn't just assume forecasts are always there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good to know, but note that the existing code only checked whether there exists a date folder that matches the requested forecast date, not whether all forecasts (or even all cycles) are present in it. How often do delays/gaps occur at the whole-day level?
Each individual file download (line 50 above) is wrapped in a tryCatch, so for both old and new code the likeliest outcome when files are missing will be a series of "skipping" messages (possibly then followed by a failure when downscaling can't bridge the time gap). Is that acceptable for this PR or do we need to dig further in on pre-download availability checks?
hours_char = hours_char, | ||
cycle = cycle, | ||
base_filename1 = base_filename1, | ||
vars = vars, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for clarity
##' GEFS weather data isn't always posted immediately, and to compensate, this function adjusts requests made in the last two hours | ||
##' back two hours (approximately the amount of time it takes to post the data) to make sure the most current forecast is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this two-hour adjustment was being done in the current code, and am sure it isn't being done after this PR. Deleted it from the docs.
@@ -223,8 +223,16 @@ downscale_ShortWave_to_half_hrly <- function(df,lat, lon, hr = 0.5){ | |||
|
|||
for (k in 1:nrow(data.hrly)) { | |||
if(is.na(data.hrly$surface_downwelling_shortwave_flux_in_air[k])){ | |||
SWflux <- as.matrix(subset(df, .data$day == data.hrly$day[k] & .data$hour == data.hrly$hour[k], data.hrly$surface_downwelling_shortwave_flux_in_air[k])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing a bit on what this was supposed to be, because it could not have worked as written -- not only is .data
invalid inside base::subset (it's specific to tidy evaluation) , but I'm not sure what subset would do with a bare number for its third argument.
My replacement below seems reasonable if the goal of this step is "when SWflux is null in the hourly data, take it from the matching day and hour of the coarser-scale data", but please speak up if you see a better fix.
lats <- round(lat_list/.5)*.5 | ||
lons <- round(lon_list/.5)*.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but: GEFS is now also available on a 0.25 degree grid. Would it be of interest to add support for that?
Description
Getting
download.NOAA_GEFS
working again, most notably by not scraping two different webpages to extract dates that we can predict by knowing the forecasts are retained for four days. Best I can tell there have been some changes in the NOAA server configuration and the format of the grib files since this function last worked.See comments inline for my reasoning on each change, but I'd like a critical eye from someone more familiar with GEFS.
Motivation and Context
Review Time Estimate
Types of changes
Checklist: