-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R Script for Validation Plotting #254
base: main
Are you sure you want to change the base?
Conversation
Thank you for the suggestions! Co-authored-by: Zhian N. Kamvar <[email protected]>
src/plot_validation_data.R
Outdated
####### | ||
|
||
# Create a PDF file to save the plots | ||
pdf("~/Downloads/plot_validation_by_location.pdf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line only works if the home directory has a Downloads folder, which isn't a given. it might be good to let the user choose where the pdf is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally agree. If this script is intended for use by anyone other than @trobacker, it would be good to declare this as a variable at the top and provide instructions for people to change it depending on their machine.
I've added other comments on static file names that might need to be examined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments! Indeed, this script will have a lot of changes to undergo. I shared it so that folk could see how I plotted training data and validated data with CladeTime data for the use case of the UMass-HMLR model. There will absolutely be some work on this file for improvement and general use. Feel free to make commits.
# Note: this file is generated through CladeTime as is not made here, must be | ||
# created ahead of time and path changed here | ||
hub_path <- here::here() | ||
df_validation <- read.csv(here::here("auxiliary-data/example-files/summarized_clades_asof_2024-10-28_on_2025-01-07.csv")) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file name going to change at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it could be any format at this time. They are pre-made files from CladeTime right now to generate validation data.
# Resolved with tidyr::complete() | ||
|
||
# Meta data for getting data available on reference date | ||
reference_date <- "2024-10-28" ## REFERENCE DATE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this reference date always going to be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it will be changed in the long run. I need to create functions in this script for general use.
select(-abbreviation) | ||
|
||
# Model output, just UMASS HMLR for now | ||
df_model_output <- arrow::read_parquet(file.path(hub_path, "model-output/UMass-HMLR/2024-10-30-UMass-HMLR.parquet")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file name always going to stay the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily, in general we will apply it to any model_output file (focusing on UMass HMLR for now).
colnames(targets_this_location)[3] <- "clade" | ||
|
||
p <- ggplot(df_out_this_location, aes(x = target_date, y = value)) + | ||
ggtitle(paste0("Daily Observed and Predicted Proportions \nfor model output in ", this_location, " - 2024-10-30-UMass-HMLR")) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this date static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it should generally be considered variable and it will be from the model_output files.
A prototype R script and sample CladeTime output to plot model output (just UMass-HMLR for now) along with data available at reference date (training) and data validated through CladeTime at a later date - recall rule-of-thumb for validation: 90 days later.
Caution: Don't merge until we've had time to look over this!
This R script file is under significant construction but available for reference now.