Remove price binning from comps and filter targets by tri instead #301

jeancochrane · 2024-12-24T21:38:39Z

This PR removes price binning from the comps algorithm, and replaces it with a filter that only generates comps for the tri that the model is targeting. We also reduce the number of comps that we calculate for each property from 20 to 5, since our internal analysis suggests that comps after the fifth comp are less informative.

With these changes, the comps pipeline runs in 48 hours for the City tri. See 2024-12-29-compassionate-kyra for the output of the test run.

We also use this PR to merge some speed improvements that @dfsnow added in #314.

…for-comps

…omps feedback cycle

…eed up comps feedback cycle" This reverts commit 5c7098a.

jeancochrane · 2025-01-03T22:47:58Z

pipeline/04-interpret.R


  # Translate comp indexes to PINs and document numbers
  comps[[1]] <- comps[[1]] %>%
    mutate(
+      # Correct for the fact that Python is 0-indexed by incrementing the
+      # comp indexes by 1, and cast null indicators (-1) to null
+      across(everything(), ~ ifelse(. == -1, NA, . + 1)),


The previous version of this code wasn't properly handling the null indicator for comps (-1). I think we never ran into this problem because we've never had a null comp in production, but if you reduce the number of trees dramatically for testing purposes, you can run into models that can't find a comp for certain parcels.

jeancochrane · 2025-01-03T22:50:02Z

python/comps.py

-        ).values
+    total_num_possible_comps = len(comparison_df)
+    chunked_ids, chunked_scores = [], []
+    for chunk_num in set(observation_df["chunk"]):


We iterate over the set of chunks here rather than something like range(1, num_chunks + 1) because pd.cut won't produce a bin for each number in the range if the number of targets is less than the number of bins. This should never occur in production, but it can happen if you're testing the algorithm on a very small number of targets.

dfsnow

Nice, this looks great @jeancochrane. I like that it simplifies things a lot. Let's merge #314 and this should be good to go.

* Add comps incremental speedups * Format with ruff

Remove price binning from comps and filter targets by tri instead

acd5f1d

jeancochrane temporarily deployed to deploy December 24, 2024 21:41 — with GitHub Actions Inactive

jeancochrane added 5 commits December 26, 2024 20:03

Remove unnecessary predicted_value computation from comps pipeline

2821261

Merge branch '2025-assessment-year' into jeancochrane/remove-binning-…

2a602f3

…for-comps

Update params.yaml for comps binning changes

b5afd47

Remove stray trailing comma from comps function call in interpret stage

ff16ae7

Temporarily reduce num_iterations in params so that we can speed up c…

5c7098a

…omps feedback cycle

jeancochrane temporarily deployed to deploy December 27, 2024 16:21 — with GitHub Actions Inactive

jeancochrane added 2 commits December 27, 2024 18:13

Fix handling for missing comps in interpret stage

85138f5

Use actual chunk values when iterating chunks in comps.py

494092a

jeancochrane temporarily deployed to deploy December 27, 2024 18:18 — with GitHub Actions Inactive

Switch to city tri for the purpose of testing unbinned comps

2642178

jeancochrane temporarily deployed to deploy December 27, 2024 20:29 — with GitHub Actions Inactive

Revert "Temporarily reduce num_iterations in params so that we can sp…

399dfa1

…eed up comps feedback cycle" This reverts commit 5c7098a.

jeancochrane temporarily deployed to deploy December 27, 2024 21:25 — with GitHub Actions Inactive

jeancochrane commented Jan 3, 2025

View reviewed changes

jeancochrane marked this pull request as ready for review January 3, 2025 22:50

jeancochrane requested review from dfsnow and wrridgeway as code owners January 3, 2025 22:50

Switch back to North tri

a049ec4

dfsnow approved these changes Jan 8, 2025

View reviewed changes

Add minor comps speed improvements (#314)

a1b03cd

* Add comps incremental speedups * Format with ruff

jeancochrane merged commit aca92c2 into 2025-assessment-year Jan 8, 2025
4 checks passed

jeancochrane deleted the jeancochrane/remove-binning-for-comps branch January 8, 2025 23:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove price binning from comps and filter targets by tri instead #301

Remove price binning from comps and filter targets by tri instead #301

jeancochrane commented Dec 24, 2024 •

edited

Loading

jeancochrane Jan 3, 2025

jeancochrane Jan 3, 2025

dfsnow left a comment

Remove price binning from comps and filter targets by tri instead #301

Remove price binning from comps and filter targets by tri instead #301

Conversation

jeancochrane commented Dec 24, 2024 • edited Loading

jeancochrane Jan 3, 2025

Choose a reason for hiding this comment

jeancochrane Jan 3, 2025

Choose a reason for hiding this comment

dfsnow left a comment

Choose a reason for hiding this comment

jeancochrane commented Dec 24, 2024 •

edited

Loading