Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test dispersion features #282

Closed
wants to merge 16 commits into from
28 changes: 14 additions & 14 deletions dvc.lock
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ stages:
deps:
- path: pipeline/00-ingest.R
hash: md5
md5: c453195da12dd0197e0bdd16f4ef3937
size: 23004
md5: 21bea1f0eda90f9b77c215e562c7af69
size: 23612
params:
params.yaml:
assessment:
Expand Down Expand Up @@ -38,28 +38,28 @@ stages:
outs:
- path: input/assessment_data.parquet
hash: md5
md5: e4b429a0121c6898b972fa20b42544fd
size: 425747228
md5: af14e5cc79bd08d679ffaa969df292c7
size: 427188490
- path: input/char_data.parquet
hash: md5
md5: 827c97f9d3bbd3426e8f6fd9136313f8
size: 847441146
md5: 320e9762d7244496215b0c76fd391fe4
size: 851611635
- path: input/complex_id_data.parquet
hash: md5
md5: 0e2a42a935106a9b6f50d8250012d98c
size: 703255
md5: c453537444b319bdd023117a4e2ec3a4
size: 703686
- path: input/hie_data.parquet
hash: md5
md5: ca86d0e5f29fd252455dc67e2dd40ac1
size: 1927927
md5: ca545885f4033cf366f25820f27b6a13
size: 1926795
- path: input/land_nbhd_rate_data.parquet
hash: md5
md5: f3ec9627322bd271bf2957b7388aaa34
size: 3873
md5: 6c1baaf2acbcba9869025bb336f4ad25
size: 4413
- path: input/training_data.parquet
hash: md5
md5: 76d91858f84f57ad2dce9fd292fe1ae2
size: 208138341
md5: d662d199383f18567643185cbc0f64db
size: 209497727
train:
cmd: Rscript pipeline/01-train.R
deps:
Expand Down
258 changes: 129 additions & 129 deletions dvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@ stages:
Ingest training and assessment data from Athena + generate townhome
complex identifiers
deps:
- pipeline/00-ingest.R
- pipeline/00-ingest.R
params:
- assessment
- input
- assessment
- input
outs:
- input/assessment_data.parquet
- input/char_data.parquet
- input/complex_id_data.parquet
- input/hie_data.parquet
- input/land_nbhd_rate_data.parquet
- input/training_data.parquet
- input/assessment_data.parquet
- input/char_data.parquet
- input/complex_id_data.parquet
- input/hie_data.parquet
- input/land_nbhd_rate_data.parquet
- input/training_data.parquet

frozen: true
train:
Expand All @@ -24,34 +24,34 @@ stages:
Train a LightGBM model with cross-validation. Generate model objects,
data recipes, and predictions on the test set (most recent 10% of sales)
deps:
- pipeline/01-train.R
- input/training_data.parquet
- pipeline/01-train.R
- input/training_data.parquet
params:
- cv
- model.engine
- model.hyperparameter
- model.objective
- model.parameter
- model.predictor
- model.seed
- model.verbose
- ratio_study
- toggle.cv_enable
- cv
- model.engine
- model.hyperparameter
- model.objective
- model.parameter
- model.predictor
- model.seed
- model.verbose
- ratio_study
- toggle.cv_enable
outs:
- output/intermediate/timing/model_timing_train.parquet:
cache: false
- output/parameter_final/model_parameter_final.parquet:
cache: false
- output/parameter_range/model_parameter_range.parquet:
cache: false
- output/parameter_search/model_parameter_search.parquet:
cache: false
- output/test_card/model_test_card.parquet:
cache: false
- output/workflow/fit/model_workflow_fit.zip:
cache: false
- output/workflow/recipe/model_workflow_recipe.rds:
cache: false
- output/intermediate/timing/model_timing_train.parquet:
cache: false
- output/parameter_final/model_parameter_final.parquet:
cache: false
- output/parameter_range/model_parameter_range.parquet:
cache: false
- output/parameter_search/model_parameter_search.parquet:
cache: false
- output/test_card/model_test_card.parquet:
cache: false
- output/workflow/fit/model_workflow_fit.zip:
cache: false
- output/workflow/recipe/model_workflow_recipe.rds:
cache: false

assess:
cmd: Rscript pipeline/02-assess.R
Expand All @@ -60,25 +60,25 @@ stages:
County. Also generate flags, calculate land values, and make any
post-modeling changes
deps:
- pipeline/02-assess.R
- input/training_data.parquet
- input/assessment_data.parquet
- input/complex_id_data.parquet
- input/land_nbhd_rate_data.parquet
- output/workflow/fit/model_workflow_fit.zip
- output/workflow/recipe/model_workflow_recipe.rds
- pipeline/02-assess.R
- input/training_data.parquet
- input/assessment_data.parquet
- input/complex_id_data.parquet
- input/land_nbhd_rate_data.parquet
- output/workflow/fit/model_workflow_fit.zip
- output/workflow/recipe/model_workflow_recipe.rds
params:
- assessment
- pv
- ratio_study
- model.predictor.all
- assessment
- pv
- ratio_study
- model.predictor.all
outs:
- output/assessment_card/model_assessment_card.parquet:
cache: false
- output/assessment_pin/model_assessment_pin.parquet:
cache: false
- output/intermediate/timing/model_timing_assess.parquet:
cache: false
- output/assessment_card/model_assessment_card.parquet:
cache: false
- output/assessment_pin/model_assessment_pin.parquet:
cache: false
- output/intermediate/timing/model_timing_assess.parquet:
cache: false

evaluate:
cmd: Rscript pipeline/03-evaluate.R
Expand All @@ -88,78 +88,78 @@ stages:
2. An assessor-specific ratio study comparing estimated assessments to
the previous year's sales
deps:
- pipeline/03-evaluate.R
- output/test_card/model_test_card.parquet
- output/assessment_pin/model_assessment_pin.parquet
- pipeline/03-evaluate.R
- output/test_card/model_test_card.parquet
- output/assessment_pin/model_assessment_pin.parquet
params:
- assessment
- ratio_study
- assessment
- ratio_study
outs:
- output/performance/model_performance_test.parquet:
cache: false
- output/performance_quantile/model_performance_quantile_test.parquet:
cache: false
- output/performance/model_performance_assessment.parquet:
cache: false
- output/performance_quantile/model_performance_quantile_assessment.parquet:
cache: false
- output/intermediate/timing/model_timing_evaluate.parquet:
cache: false
- output/performance/model_performance_test.parquet:
cache: false
- output/performance_quantile/model_performance_quantile_test.parquet:
cache: false
- output/performance/model_performance_assessment.parquet:
cache: false
- output/performance_quantile/model_performance_quantile_assessment.parquet:
cache: false
- output/intermediate/timing/model_timing_evaluate.parquet:
cache: false

interpret:
cmd: Rscript pipeline/04-interpret.R
desc: >
Generate SHAP values for each card and feature as well as feature
importance metrics for each feature
deps:
- pipeline/04-interpret.R
- input/assessment_data.parquet
- input/training_data.parquet
- output/assessment_card/model_assessment_card.parquet
- output/workflow/fit/model_workflow_fit.zip
- output/workflow/recipe/model_workflow_recipe.rds
- pipeline/04-interpret.R
- input/assessment_data.parquet
- input/training_data.parquet
- output/assessment_card/model_assessment_card.parquet
- output/workflow/fit/model_workflow_fit.zip
- output/workflow/recipe/model_workflow_recipe.rds
params:
- toggle.shap_enable
- toggle.comp_enable
- model.predictor.all
- toggle.shap_enable
- toggle.comp_enable
- model.predictor.all
outs:
- output/shap/model_shap.parquet:
cache: false
- output/feature_importance/model_feature_importance.parquet:
cache: false
- output/intermediate/timing/model_timing_interpret.parquet:
cache: false
- output/comp/model_comp.parquet:
cache: false
- output/shap/model_shap.parquet:
cache: false
- output/feature_importance/model_feature_importance.parquet:
cache: false
- output/intermediate/timing/model_timing_interpret.parquet:
cache: false
- output/comp/model_comp.parquet:
cache: false

finalize:
cmd: Rscript pipeline/05-finalize.R
desc: >
Save run timings and run metadata to disk and render a performance report
using Quarto.
deps:
- pipeline/05-finalize.R
- output/intermediate/timing/model_timing_train.parquet
- output/intermediate/timing/model_timing_assess.parquet
- output/intermediate/timing/model_timing_evaluate.parquet
- output/intermediate/timing/model_timing_interpret.parquet
- pipeline/05-finalize.R
- output/intermediate/timing/model_timing_train.parquet
- output/intermediate/timing/model_timing_assess.parquet
- output/intermediate/timing/model_timing_evaluate.parquet
- output/intermediate/timing/model_timing_interpret.parquet
params:
- run_note
- toggle
- input
- cv
- model
- pv
- ratio_study
- run_note
- toggle
- input
- cv
- model
- pv
- ratio_study
outs:
- output/intermediate/timing/model_timing_finalize.parquet:
cache: false
- output/timing/model_timing.parquet:
cache: false
- output/metadata/model_metadata.parquet:
cache: false
- reports/performance/performance.html:
cache: false
- output/intermediate/timing/model_timing_finalize.parquet:
cache: false
- output/timing/model_timing.parquet:
cache: false
- output/metadata/model_metadata.parquet:
cache: false
- reports/performance/performance.html:
cache: false

upload:
cmd: Rscript pipeline/06-upload.R
Expand All @@ -169,25 +169,25 @@ stages:
outputs prior to upload and attach a unique run ID. This step requires
access to the CCAO Data AWS account, and so is assumed to be internal-only
deps:
- pipeline/06-upload.R
- output/parameter_final/model_parameter_final.parquet
- output/parameter_range/model_parameter_range.parquet
- output/parameter_search/model_parameter_search.parquet
- output/workflow/fit/model_workflow_fit.zip
- output/workflow/recipe/model_workflow_recipe.rds
- output/test_card/model_test_card.parquet
- output/assessment_card/model_assessment_card.parquet
- output/assessment_pin/model_assessment_pin.parquet
- output/performance/model_performance_test.parquet
- output/performance_quantile/model_performance_quantile_test.parquet
- output/performance/model_performance_assessment.parquet
- output/performance_quantile/model_performance_quantile_assessment.parquet
- output/shap/model_shap.parquet
- output/comp/model_comp.parquet
- output/feature_importance/model_feature_importance.parquet
- output/metadata/model_metadata.parquet
- output/timing/model_timing.parquet
- reports/performance/performance.html
- pipeline/06-upload.R
- output/parameter_final/model_parameter_final.parquet
- output/parameter_range/model_parameter_range.parquet
- output/parameter_search/model_parameter_search.parquet
- output/workflow/fit/model_workflow_fit.zip
- output/workflow/recipe/model_workflow_recipe.rds
- output/test_card/model_test_card.parquet
- output/assessment_card/model_assessment_card.parquet
- output/assessment_pin/model_assessment_pin.parquet
- output/performance/model_performance_test.parquet
- output/performance_quantile/model_performance_quantile_test.parquet
- output/performance/model_performance_assessment.parquet
- output/performance_quantile/model_performance_quantile_assessment.parquet
- output/shap/model_shap.parquet
- output/comp/model_comp.parquet
- output/feature_importance/model_feature_importance.parquet
- output/metadata/model_metadata.parquet
- output/timing/model_timing.parquet
- reports/performance/performance.html

export:
cmd: Rscript pipeline/07-export.R
Expand All @@ -196,11 +196,11 @@ stages:
run. NOT automatically run since it is typically only run once. Manually
run once a model is selected
deps:
- pipeline/07-export.R
- pipeline/07-export.R
params:
- assessment.year
- input.min_sale_year
- input.max_sale_year
- ratio_study
- export
- assessment.year
- input.min_sale_year
- input.max_sale_year
- ratio_study
- export
frozen: true
Loading
Loading