Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test dispersion features #274

Open
dfsnow opened this issue Dec 7, 2024 · 2 comments
Open

Test dispersion features #274

dfsnow opened this issue Dec 7, 2024 · 2 comments
Assignees
Labels
new data/feature Create or edit a column/feature or collect new data

Comments

@dfsnow
Copy link
Member

dfsnow commented Dec 7, 2024

Predictive models with a lot of heterogeneity in the training data often include features that measure the dispersion/variation of each predictor. For example, in addition to a building's year built, you might also include the standard deviation of the year built, where the aggregation group is all properties within 1000 meters.

@dfsnow dfsnow added the new data/feature Create or edit a column/feature or collect new data label Dec 7, 2024
@Damonamajor
Copy link
Contributor

@dfsnow Can you give a quick list on metrics you think this would be valuable for?

@dfsnow
Copy link
Member Author

dfsnow commented Dec 9, 2024

Basically any continuous feature where we think the variance of the feature will have some effect on local values. Start with char_bldg_sf and char_yrblt. I'd expect both of those to be impactful here, as you can say things like "The average year built for this area is high and the variance is low, indicating this neighborhood is almost entirely new construction (and therefore likely worth more)."

@Damonamajor Damonamajor linked a pull request Dec 16, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new data/feature Create or edit a column/feature or collect new data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants