Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add residential model parity updates #83

Merged
merged 7 commits into from
Jan 10, 2025
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions dvc.lock
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@ stages:
deps:
- path: pipeline/00-ingest.R
hash: md5
md5: 29292ee2bef109914c423c9259aa8879
size: 22847
md5: 816b28ff1c68d17a9082d0dc839a85c0
size: 22844
Comment on lines +8 to +9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nitpick, non-blocking] Same as the res model, it'd be nice if we could update the input hashes for downstream stages too to avoid a weird diff next time someone runs the model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 6af9b01!

params:
params.yaml:
assessment:
year: '2024'
date: '2024-01-01'
triad: city
triad: north
group: condo
data_year: '2023'
working_year: '2024'
working_year: '2025'
input:
min_sale_year: '2015'
max_sale_year: '2023'
Expand All @@ -31,24 +31,24 @@ stages:
outs:
- path: input/assessment_data.parquet
hash: md5
md5: b49601e8a812659026c7358d84f5e16b
size: 85702121
md5: 1acef7f3c22353411bc15a03d7493164
size: 85643154
- path: input/char_data.parquet
hash: md5
md5: d1a30dd51db2985be57548c1498f2533
size: 160972976
md5: 5be564143ebae5a67e8f44eb93d839dd
size: 159013932
- path: input/condo_strata_data.parquet
hash: md5
md5: 8fe86e0af29431ecb021f101f79789ee
size: 40481
md5: b5a85462a7f4de94916b228be45ccd75
size: 40543
- path: input/land_nbhd_rate_data.parquet
hash: md5
md5: f3ec9627322bd271bf2957b7388aaa34
size: 3873
- path: input/training_data.parquet
hash: md5
md5: 9b2510ac885e4fc77928661a012d8821
size: 79812730
md5: e818848026f6dc6e3d6af9b8d6b34641
size: 79923460
train:
cmd: Rscript pipeline/01-train.R
deps:
Expand Down
21 changes: 14 additions & 7 deletions params.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -168,28 +168,32 @@ model:
- "prox_num_pin_in_half_mile"
- "prox_num_bus_stop_in_half_mile"
- "prox_num_foreclosure_per_1000_pin_past_5_years"
- "prox_num_school_in_half_mile"
- "prox_airport_dnl_total"
- "prox_nearest_bike_trail_dist_ft"
- "prox_nearest_cemetery_dist_ft"
- "prox_nearest_cta_route_dist_ft"
- "prox_nearest_cta_stop_dist_ft"
- "prox_nearest_hospital_dist_ft"
- "prox_lake_michigan_dist_ft"
- "prox_nearest_major_road_dist_ft"
- "prox_nearest_metra_route_dist_ft"
- "prox_nearest_metra_stop_dist_ft"
- "prox_nearest_park_dist_ft"
- "prox_nearest_railroad_dist_ft"
- "prox_nearest_secondary_road_dist_ft"
- "prox_nearest_university_dist_ft"
- "prox_nearest_vacant_land_dist_ft"
- "prox_nearest_water_dist_ft"
- "prox_nearest_golf_course_dist_ft"
- "prox_nearest_road_highway_dist_ft"
- "prox_nearest_road_arterial_dist_ft"
- "prox_nearest_road_collector_dist_ft"
- "prox_nearest_road_highway_daily_traffic"
- "prox_nearest_road_arterial_daily_traffic"
- "prox_nearest_road_collector_daily_traffic"
- "prox_nearest_new_construction_dist_ft"
- "prox_nearest_stadium_dist_ft"
- "acs5_percent_age_children"
- "acs5_percent_age_senior"
- "acs5_median_age_total"
- "acs5_percent_mobility_moved_from_other_state"
- "acs5_percent_household_family_married"
- "acs5_percent_household_nonfamily_alone"
- "acs5_percent_education_high_school"
Expand All @@ -203,11 +207,8 @@ model:
- "acs5_median_household_total_occupied_year_built"
- "acs5_median_household_renter_occupied_gross_rent"
- "acs5_percent_household_owner_occupied"
- "acs5_percent_household_total_occupied_w_sel_cond"
- "acs5_percent_mobility_moved_in_county"
- "other_tax_bill_rate"
- "ccao_is_active_exe_homeowner"
- "ccao_is_corner_lot"
- "ccao_n_years_exe_homeowner"
- "time_sale_year"
- "time_sale_day"
Expand All @@ -217,6 +218,12 @@ model:
- "time_sale_day_of_month"
- "time_sale_day_of_week"
- "time_sale_post_covid"
- "shp_parcel_centroid_dist_ft_sd"
- "shp_parcel_edge_len_ft_sd"
- "shp_parcel_interior_angle_sd"
- "shp_parcel_mrr_area_ratio"
- "shp_parcel_mrr_side_ratio"
- "shp_parcel_num_vertices"
- "meta_strata_1"
- "meta_strata_2"

Expand Down
2 changes: 2 additions & 0 deletions reports/_setup.qmd
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
---
execute:
echo: FALSE
params:
run_id: "2024-02-08-dreamy-sam"
year: "2024"
Expand Down
129 changes: 129 additions & 0 deletions reports/performance/_model.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -1036,3 +1036,132 @@ model_big_misses_assessment %>%
```

:::

## Variance Over Time

These plot shows show trends in the variance of sale price and estimated FMV. Ideally, the model's estimates should have the same variance as the true values (sales) with respect to time.

::: {.panel-tabset}

```{r _model_organize_variance_data}
training_data_monthly <- training_data_pred %>%
filter(!sv_is_outlier) %>%
mutate(
meta_sale_date = as.Date(meta_sale_date),
year = year(meta_sale_date),
month = month(meta_sale_date),
difference = (pred_card_initial_fmv - meta_sale_price),
squared_difference = difference^2
) %>%
group_by(year, month) %>%
summarize(
total_sales = sum(meta_sale_price),
total_fmv = sum(pred_card_initial_fmv),
variance_sale = var(meta_sale_price),
variance_fmv = var(pred_card_initial_fmv),
mean_difference = mean(difference),
sse = sum(squared_difference),
n = n(),
.groups = "drop"
) %>%
mutate(
variance_diff = variance_fmv - variance_sale,
date = make_date(year, month),
variance_ratio = variance_fmv / variance_sale,
percent_sales = n / sum(n) * 100,
percent_sse = sse / sum(sse) * 100
)
training_data_monthly_long <- training_data_monthly %>%
pivot_longer(
cols = c(
variance_sale, variance_fmv, percent_sales,
percent_sse, variance_diff
),
names_to = "Metric",
values_to = "Value"
)
```

### Variance Ratio (FMV / Sale Price)

```{r _model_variance_ratio_chart}
ggplot(training_data_monthly, aes(x = date, y = variance_ratio)) +
geom_line() +
geom_point() +
labs(
x = "Date",
y = "Variance Ratio"
) +
theme_minimal()
```

### Total FMV and Sale Price Variance

```{r _model_overall_variance_chart}
ggplot(
training_data_monthly_long %>% filter(Metric %in%
c("variance_sale", "variance_fmv")),
aes(x = date, y = Value, color = Metric)
) +
geom_line() +
geom_point() +
geom_smooth(method = "loess", se = FALSE) +
labs(
x = "Month",
y = "Variance",
color = "Metric"
) +
scale_color_discrete(
labels = c(
"variance_sale" = "Variance of Sale Price",
"variance_fmv" = "Variance of FMV"
)
) +
scale_y_continuous(labels = function(x) {
scales::label_scientific()(x) %>%
paste0("$", .)
}) +
theme_minimal()
```

### Variance Difference (Sale Price - FMV)

```{r _model_variance_diff_chart}
ggplot(training_data_monthly, aes(x = date, y = variance_sale - variance_fmv)) +
geom_line() +
geom_point() +
geom_smooth(method = "loess", se = FALSE) +
labs(
x = "Date",
y = "Difference in Variance"
) +
scale_y_continuous(labels = function(x) {
scales::label_scientific()(x) %>%
paste0("$", .)
}) +
theme_minimal()
```

### Distribution of Sales and SSE

```{r _model_distribution_sales_sse_chart}
ggplot(training_data_monthly, aes(x = date)) +
geom_bar(aes(y = percent_sales, fill = "Sales"),
stat = "identity", position = "identity", alpha = 0.5
) +
geom_bar(aes(y = percent_sse, fill = "Sum of Square Errors"),
stat = "identity", position = "identity", alpha = 0.5
) +
scale_fill_manual(
values = c("Sales" = "#00BFC4", "Sum of Square Errors" = "#F8766D")
) +
labs(
x = "Date",
y = "Normalized Scale",
fill = "",
) +
theme_minimal() +
theme(legend.position = "bottom")
```

:::
Loading