Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R Script for Validation Plotting #254

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

trobacker
Copy link
Collaborator

@trobacker trobacker commented Jan 7, 2025

A prototype R script and sample CladeTime output to plot model output (just UMass-HMLR for now) along with data available at reference date (training) and data validated through CladeTime at a later date - recall rule-of-thumb for validation: 90 days later.

Caution: Don't merge until we've had time to look over this!

This R script file is under significant construction but available for reference now.

src/plot_validation_data.R Outdated Show resolved Hide resolved
src/plot_validation_data.R Outdated Show resolved Hide resolved
src/plot_validation_data.R Outdated Show resolved Hide resolved
trobacker and others added 2 commits January 7, 2025 17:53
#######

# Create a PDF file to save the plots
pdf("~/Downloads/plot_validation_by_location.pdf")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line only works if the home directory has a Downloads folder, which isn't a given. it might be good to let the user choose where the pdf is created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally agree. If this script is intended for use by anyone other than @trobacker, it would be good to declare this as a variable at the top and provide instructions for people to change it depending on their machine.

I've added other comments on static file names that might need to be examined.

Copy link
Collaborator Author

@trobacker trobacker Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments! Indeed, this script will have a lot of changes to undergo. I shared it so that folk could see how I plotted training data and validated data with CladeTime data for the use case of the UMass-HMLR model. There will absolutely be some work on this file for improvement and general use. Feel free to make commits.

# Note: this file is generated through CladeTime as is not made here, must be
# created ahead of time and path changed here
hub_path <- here::here()
df_validation <- read.csv(here::here("auxiliary-data/example-files/summarized_clades_asof_2024-10-28_on_2025-01-07.csv")) |>
Copy link
Member

@zkamvar zkamvar Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file name going to change at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could be any format at this time. They are pre-made files from CladeTime right now to generate validation data.

# Resolved with tidyr::complete()

# Meta data for getting data available on reference date
reference_date <- "2024-10-28" ## REFERENCE DATE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this reference date always going to be the same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it will be changed in the long run. I need to create functions in this script for general use.

select(-abbreviation)

# Model output, just UMASS HMLR for now
df_model_output <- arrow::read_parquet(file.path(hub_path, "model-output/UMass-HMLR/2024-10-30-UMass-HMLR.parquet"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file name always going to stay the same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, in general we will apply it to any model_output file (focusing on UMass HMLR for now).

colnames(targets_this_location)[3] <- "clade"

p <- ggplot(df_out_this_location, aes(x = target_date, y = value)) +
ggtitle(paste0("Daily Observed and Predicted Proportions \nfor model output in ", this_location, " - 2024-10-30-UMass-HMLR")) +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this date static?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it should generally be considered variable and it will be from the model_output files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants