-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add comps map to individual PIN report (#181)
* Add intermediate leaf_node output to interpret pipeline stage * Add python get_comps function for computing comps from leaf node assignments * Flesh out comp calculation * Add Python requirements to renv environment * Make sure assessment data is loaded in interpret stage when comp_enable is TRUE * Continue with comps debugging * Refactor and test get_comps logic * Clean up comments and extraneous debugging code ahead of testing * Temporarily set comp_enable=TRUE for the purposes of testing comps * Satisfy pre-commit * Remove num_iteration arg from predict() in comp calculation * Make sure requirements.txt is copied into image before installing R dependencies * Install python3-venv in Dockerfile * Pass n=20 to get_comps correctly in 04-interpret.R * Temporarily slim down training set to test comp calculation * Wrap get_comps() call in tryCatch in interpret pipeline stage for better error logging * Test raising an error from python/comps.py * Remove temporary error in python/comps.py * Swap arg order in _get_similarity_matrix to confirm numba error message * Revert "Swap arg order in _get_similarity_matrix to confirm numba error message" This reverts commit 5beefd5. * Raise error in interpret stage if get_comps fails * Revert "Temporarily slim down training set to test comp calculation" This reverts commit e27581f. * Try refactoring comps.py for less memory use * Get comps working locally with less memory intensive algorithm * Use sales to generate comps * Instrument python/comps.py with logging and temporarily remove numba decorator * Instrument interpret comps stage with more logging and skip feature importance for now * Bump vcpu and memory in build-and-run-model to take full advantage of 10xlarge instance * Add some logging to try to determine whether record_evals are being saved properly * Add extra logging to extract_weights function to debug empty weights vector * Pin lightsnip to jeancochrane/record-evals branch * Remove debug logs from comps and tree weights extraction functions * njit _get_top_n_comps * Revert "Remove debug logs from comps and tree weights extraction functions" This reverts commit 6d82d5b. * Print record_evals length in train stage for debugging * Add some more debug logging to train stage * Switch to save_tree_error instead of valids arg in lightgbm model definition * Update lightsnip to latest working version * More fixes for comps * Try removing parallelism from _get_top_n_comps * Enable parallelization for comps algorithm * Temporarily write comps inputs out to file for testing * Reduce vcpu/memory in build-and-run-model to see if it provisions smaller instance * Transpose weights in get_comps and add debug script * Remove debugging utilities from comps pipeline ahead of final test * Appease pre-commit * Add back empty line in 04-interpret.R that got accidentally deleted * Try jeancochrane/restrict-instance-types-in-build-and-run-batch-job branch for build-and-run-model workflow * Switch back to m4.10xlarge instance sizing in build-and-run-model * Add progress logging to comps.py * Switch back to main branch of build-and-run-batch-job * Switch to bare iteration rather than vector operations for producing similarity scores in comps.py * Run comps against binned data to speed up python/comps.py * Log price ranges in python/comps.py * Update comps pipeline to work with sales chunking * Qualify package for rownames_to_column in interpret pipeline stage * Skip comps bin when no observations are placed in that bin in python/comps.py * Small cleanup to python/comps.py * Fix partitioning for comps pipeline * Fix typo in comps pipeline * Add comps to individual PIN report * Cleanup comps map --------- Co-authored-by: Dan Snow <[email protected]>
- Loading branch information
1 parent
df5b89c
commit aed2d12
Showing
5 changed files
with
130 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -790,9 +790,9 @@ Our goal in maintaining multiple lockfiles is to keep the list of dependencies r | |
|
||
### Using Lockfiles for Local Development | ||
|
||
When working on the model locally, you'll typically want to install non-core dependencies _on top of_ the core dependencies. To do this, simply run `renv::restore("<path_to_lockfile")` to install all dependencies from the lockfile. | ||
When working on the model locally, you'll typically want to install non-core dependencies _on top of_ the core dependencies. To do this, simply run `renv::restore(lockfile = "<path_to_lockfile")` to install all dependencies from the lockfile. | ||
|
||
For example, if you're working on the `ingest` stage and want to install all its dependencies, start with the main profile (run `renv::activate()`), then install the `dev` profile dependencies on top of it (run `renv::restore("renv/profiles/dev/renv.lock")`). | ||
For example, if you're working on the `ingest` stage and want to install all its dependencies, start with the main profile (run `renv::activate()`), then install the `dev` profile dependencies on top of it (run `renv::restore(lockfile = "renv/profiles/dev/renv.lock")`). | ||
|
||
> :warning: WARNING: Installing dependencies from a dev lockfile will **overwrite** any existing version installed by the core one. For example, if `[email protected]` is installed by the core lockfile, and `[email protected]` is installed by the dev lockfile, renv will **overwrite** `[email protected]` with `[email protected]`. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
{{< include ../_setup.qmd >}} | ||
|
||
## Comparables | ||
|
||
This map shows the target parcel alongside the `r metadata$comp_num_comps` most | ||
similar parcels, where similarity is determined by the number of matching leaf | ||
node assignments that the model gives to each parcel weighted by the relative | ||
importance of each tree. See [this | ||
vignette](https://ccao-data.github.io/lightsnip/articles/finding-comps.html) | ||
for more background on the similarity algorithm. | ||
|
||
```{r _comp_map} | ||
comp_df_filtered <- comp_df %>% | ||
filter(pin == target_pin) %>% | ||
tidyr::pivot_longer(starts_with("comp_pin_"), values_to = "comp_pin") %>% | ||
select(-name, -starts_with("comp_score_")) %>% | ||
bind_cols( | ||
comp_df %>% | ||
filter(pin == target_pin) %>% | ||
tidyr::pivot_longer( | ||
starts_with("comp_score_"), | ||
values_to = "comp_score" | ||
) %>% | ||
select(-name, -starts_with("comp_pin_"), -pin) | ||
) %>% | ||
mutate(type = "Comp.") %>% | ||
left_join( | ||
training_data, | ||
by = c("comp_pin" = "meta_pin"), | ||
relationship = "many-to-many" | ||
) %>% | ||
select( | ||
pin, comp_pin, comp_score, meta_1yr_pri_board_tot, | ||
meta_sale_date, meta_sale_price, | ||
loc_latitude, loc_longitude, meta_class, | ||
char_bldg_sf, char_yrblt, char_ext_wall, type | ||
) %>% | ||
group_by(comp_pin) %>% | ||
filter(meta_sale_date == max(meta_sale_date)) %>% | ||
bind_rows( | ||
tibble::tribble( | ||
~pin, ~comp_pin, ~comp_score, ~type, | ||
target_pin, target_pin, 1, "target" | ||
) %>% | ||
left_join( | ||
assessment_data %>% | ||
select( | ||
meta_pin, meta_class, meta_1yr_pri_board_tot, | ||
char_bldg_sf, char_yrblt, char_ext_wall, | ||
loc_latitude, loc_longitude | ||
), | ||
by = c("pin" = "meta_pin"), | ||
) %>% | ||
mutate(type = "Target") | ||
) %>% | ||
mutate(meta_1yr_pri_board_tot = meta_1yr_pri_board_tot * 10) | ||
comp_palette <- | ||
colorFactor( | ||
palette = "Set1", | ||
domain = comp_df_filtered$type | ||
) | ||
leaflet() %>% | ||
addProviderTiles(providers$CartoDB.Positron) %>% | ||
addCircleMarkers( | ||
data = comp_df_filtered, | ||
~loc_longitude, | ||
~loc_latitude, | ||
opacity = 1, | ||
fillOpacity = 1, | ||
radius = 2, | ||
color = ~ comp_palette(type), | ||
popup = ~ paste0( | ||
type, " PIN: ", | ||
"<a target='_blank' rel='noopener noreferrer' ", | ||
"href='https://www.cookcountyassessor.com/pin/", comp_pin, | ||
"'>", comp_pin, "</a>", | ||
"<br>Score: ", scales::percent(comp_score, accuracy = 0.01), | ||
"<br>Class: ", meta_class, | ||
"<br>BoR FMV: ", scales::dollar(meta_1yr_pri_board_tot, accuracy = 1), | ||
"<hr>", | ||
"Sale Date: ", meta_sale_date, | ||
"<br>Sale Price: ", scales::dollar(meta_sale_price, accuracy = 1), | ||
"<hr>", | ||
"Bldg Sqft: ", scales::comma(char_bldg_sf), | ||
"<br>Year Built: ", char_yrblt, | ||
"<br>Ext. Wall: ", char_ext_wall | ||
) | ||
) %>% | ||
setView( | ||
lng = mean(comp_df_filtered$loc_longitude), | ||
lat = mean(comp_df_filtered$loc_latitude), | ||
zoom = 10 | ||
) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters