Skip to content

Commit

Permalink
Add plots of bias and variance over time (#302)
Browse files Browse the repository at this point in the history
* Initial push

* Include Variance over Time

* remove unnecessary values

* Rename data

* Fix filtering

* pre commit

* Remove stats changes

* Add loess

* precommit

* Remove Comparison

* Update _setup.qmd

* Update _model.qmd

* Update _model.qmd

* add names

* Remove $

* Remove absolute

* Update reports/performance/_model.qmd

Co-authored-by: Dan Snow <[email protected]>

* Update reports/performance/_model.qmd

Co-authored-by: Dan Snow <[email protected]>

* Update reports/performance/_model.qmd

Co-authored-by: Dan Snow <[email protected]>

* Update reports/performance/_model.qmd

Co-authored-by: Dan Snow <[email protected]>

* wrapup

* Revert run_id change

* Remove extraneous space

---------

Co-authored-by: Dan Snow <[email protected]>
Co-authored-by: Dan Snow <[email protected]>
  • Loading branch information
3 people authored Jan 2, 2025
1 parent 7344ec9 commit cef253f
Showing 1 changed file with 129 additions and 0 deletions.
129 changes: 129 additions & 0 deletions reports/performance/_model.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -1258,3 +1258,132 @@ model_big_misses_assessment %>%
```

:::

## Variance Over Time

These plot shows show trends in the variance of sale price and estimated FMV. Ideally, the model's estimates should have the same variance as the true values (sales) with respect to time.

::: {.panel-tabset}

```{r _model_organize_variance_data}
training_data_monthly <- training_data_pred %>%
filter(!ind_pin_is_multicard, !sv_is_outlier) %>%
mutate(
meta_sale_date = as.Date(meta_sale_date),
year = year(meta_sale_date),
month = month(meta_sale_date),
difference = (pred_card_initial_fmv - meta_sale_price),
squared_difference = difference^2
) %>%
group_by(year, month) %>%
summarize(
total_sales = sum(meta_sale_price),
total_fmv = sum(pred_card_initial_fmv),
variance_sale = var(meta_sale_price),
variance_fmv = var(pred_card_initial_fmv),
mean_difference = mean(difference),
sse = sum(squared_difference),
n = n(),
.groups = "drop"
) %>%
mutate(
variance_diff = variance_fmv - variance_sale,
date = make_date(year, month),
variance_ratio = variance_fmv / variance_sale,
percent_sales = n / sum(n) * 100,
percent_sse = sse / sum(sse) * 100
)
training_data_monthly_long <- training_data_monthly %>%
pivot_longer(
cols = c(
variance_sale, variance_fmv, percent_sales,
percent_sse, variance_diff
),
names_to = "Metric",
values_to = "Value"
)
```

### Variance Ratio (FMV / Sale Price)

```{r _model_variance_ratio_chart}
ggplot(training_data_monthly, aes(x = date, y = variance_ratio)) +
geom_line() +
geom_point() +
labs(
x = "Date",
y = "Variance Ratio"
) +
theme_minimal()
```

### Total FMV and Sale Price Variance

```{r _model_overall_variance_chart}
ggplot(
training_data_monthly_long %>% filter(Metric %in%
c("variance_sale", "variance_fmv")),
aes(x = date, y = Value, color = Metric)
) +
geom_line() +
geom_point() +
geom_smooth(method = "loess", se = FALSE) +
labs(
x = "Month",
y = "Variance",
color = "Metric"
) +
scale_color_discrete(
labels = c(
"variance_sale" = "Variance of Sale Price",
"variance_fmv" = "Variance of FMV"
)
) +
scale_y_continuous(labels = function(x) {
scales::label_scientific()(x) %>%
paste0("$", .)
}) +
theme_minimal()
```

### Variance Difference (Sale Price - FMV)

```{r _model_variance_diff_chart}
ggplot(training_data_monthly, aes(x = date, y = variance_sale - variance_fmv)) +
geom_line() +
geom_point() +
geom_smooth(method = "loess", se = FALSE) +
labs(
x = "Date",
y = "Difference in Variance"
) +
scale_y_continuous(labels = function(x) {
scales::label_scientific()(x) %>%
paste0("$", .)
}) +
theme_minimal()
```

### Distribution of Sales and SSE

```{r _model_distribution_sales_sse_chart}
ggplot(training_data_monthly, aes(x = date)) +
geom_bar(aes(y = percent_sales, fill = "Sales"),
stat = "identity", position = "identity", alpha = 0.5
) +
geom_bar(aes(y = percent_sse, fill = "Sum of Square Errors"),
stat = "identity", position = "identity", alpha = 0.5
) +
scale_fill_manual(
values = c("Sales" = "#00BFC4", "Sum of Square Errors" = "#F8766D")
) +
labs(
x = "Date",
y = "Normalized Scale",
fill = "",
) +
theme_minimal() +
theme(legend.position = "bottom")
```
:::

0 comments on commit cef253f

Please sign in to comment.