-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add column mapping for proposed multifuel table #3988
base: main
Are you sure you want to change the base?
Conversation
@jmelot Hello! Thanks for opening this PR, excited to get this in! I'm happy to answer any questions you have as you work on this, and review it when you reach a good pausing point. Just ping me! |
Thanks @e-belfer, I appreciate it!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of feedback
Thanks for your additions @jmelot! You did a nice job mimicking existing functions for other tables. I've added a few comments and suggested additions below!
Here are some similar issues to review for context.
wind: #3460
solar: #3461
storage: #3463
equipment: #3462
If you look at the transform issues in the task lists and then the transform PRs linked to those issues you may find some helpful comparisons. For example, #3522.
Regarding your comment about operating date:
One thing I noticed was that the operable and retired multifuel tables have Operating month and Operating year columns while the proposed table has Current month and Current year tables which presumably have a different meaning. I wasn't sure whether it would be better to store the proposed values in the same generator_operating_{month,year} columns or keep them separate. For now, I've stored them in the same column.
In the proposed generators table, this "current" column refers to the current planned operating date and is referred to as current_planned_generator_operating_month/year
. The generator table also has an original_planned_generator_operating_month/year
column, so it makes sense to distinguish the two with "current" and "original" prefixes. Because this table doesn't have an "original" column, I could see us going either way:
A) Keep the column name generator_operating_month/year
and relying on the operational_status
column to know whether it's planned or in effect.
B) Re-use the current_planned_generator_operating_month/year
column name for consistency and extra specificity.
@e-belfer or @cmgosnell what do you think?
Add table to harvested asset factory
Add this table to the harvested_entity_asset_factory
and finished_eia_assets
in src/pudl/transform/eia.py
. See the description of the core
layer in our docs for more detail about the harvesting process.
Add table to RESOURCE_METADATA
Add the table schema and information to the src/pudl/metadata/resources/eia860.py/RESOURCE_METADATA
dictionary.
Update release notes
You'll want to add a description of your changes to our docs/release_notes.rst
file
Testing
This is more of a question for @e-belfer and @cmgosnell -- should we add this table to any of the testing infrastructure? E.g.: test_minmax_rows
or test_unique_rows_eia
in eia_test.py
? It's not clear to me given the exclusion of the other generator subset tables (scd_generators_solar
etc.) from both of these functions.
src/pudl/package_data/eia860/column_maps/multifuel_proposed.csv
Outdated
Show resolved
Hide resolved
Regarding your comment about operating date:
|
Re testingAll of the We are planning on re-vamping our data validation process and depreciating if you wanted to add new tables into
Also general comment, with all of these steps to fully fully integrate these new tables, please feel free to say "I'd like to pass this off" or "can this be out of scope" or "i'll need more support in order to make ___ happen" or whatever! The extract mappings and the transform function is already really helpful! |
Thanks @aesharpe and @cmgosnell (and sorry to be slow getting back to you). On a quick skim, this makes sense. I'll have much more time next week, and I'll take a closer look and try to make the necessary edits before the new year. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmelot I left a little comment about boolean column naming and primary keys, otherwise just need the release notes!
I'll try and materialize the assets in a bit to see if it works / I have any other feeback. I'll also get back to you with more specific testing specifications in a moment.
src/pudl/metadata/fields.py
Outdated
"air_permit_limits": { | ||
"type": "boolean", | ||
"description": "True if there are air permit limits", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've started trying to add a verb suffix to boolean columns highlight their binary nature. E.g.: is_{x}
or has_{x}
or served_{x}
. This column would thus be has_air_permit_limits
. Applicable to a few other column names you added.
(this is also a great reminder to add this to our naming conventions docs page...)
Thanks, on it (and sorry this is dragging out a bit, the holidays were busier than I expected)! I think I was running into some kind of schema mismatch error when I last ran |
I updated the boolean column names, added a short release note, and updated the primary key! Two questions:
|
Overview
Closes #3438
What problem does this address?
What did you change?
Documentation
Make sure to update relevant aspects of the documentation.
Tasks
Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list