Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make association assets names more consistent #3035

Merged
merged 3 commits into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions devtools/python-output-table-conversion-debug.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,14 @@
"\n",
"You can create an asset by creating a new function and adding the `@asset` decorator. For now, the only attribute you should add to the decorator is the `compute_type = \"Python\"`. All this does is add a cute tag to the asset in the dag to let people know how the asset is being processed.\n",
"\n",
"Next you'll want to figure out what tables the output table depends on. Read through the old output function to see which normalized tables or output functions are being used as inputs to the joins and imputations. Once you have the input table names, add them to the asset function parameters. For example, the `utilities_eia860()` function merges `core_eia__entity_utilities`, `core_eia860__scd_utilities`, and `core_pudl__assn_utilities_eia` tables together so the asset would look like this:\n",
"Next you'll want to figure out what tables the output table depends on. Read through the old output function to see which normalized tables or output functions are being used as inputs to the joins and imputations. Once you have the input table names, add them to the asset function parameters. For example, the `utilities_eia860()` function merges `core_eia__entity_utilities`, `core_eia860__scd_utilities`, and `core_pudl__assn_eia_pudl_utilities` tables together so the asset would look like this:\n",
"\n",
"```python\n",
"@asset(compute_kind=\"Python\")\n",
"def denorm_utilities_eia860(\n",
" core_eia__entity_utilities: pd.DataFrame,\n",
" core_eia860__scd_utilities: pd.DataFrame,\n",
" core_pudl__assn_utilities_eia: pd.DataFrame,\n",
" core_pudl__assn_eia_pudl_utilities: pd.DataFrame,\n",
"):\n",
" ... # joining logic\n",
" return joined_df\n",
Expand Down Expand Up @@ -110,7 +110,7 @@
"def denorm_utilities_eia860(\n",
" core_eia__entity_utilities: pd.DataFrame,\n",
" core_eia860__scd_utilities: pd.DataFrame,\n",
" core_pudl__assn_utilities_eia: pd.DataFrame,\n",
" core_pudl__assn_eia_pudl_utilities: pd.DataFrame,\n",
"):\n",
" ... # joining logic\n",
" return joined_df\n",
Expand Down
13 changes: 9 additions & 4 deletions docs/dev/naming_conventions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,10 +64,15 @@ Naming convention: ``core_{source}__{asset_type}_{asset_name}``
* ``asset_type`` describes how the asset is modeled and its role in PUDL’s
collection of core assets. There are a handful of table types in this layer:

* ``assn``: Association tables provide connections between entities. This data
can be manually compiled or extracted from data sources. If the asset associates
data from two sources, the source names should be included in the ``asset_name``
in alphabetical order. Examples:
* ``assn``: Association assets provide connections between entities. They should
follow this naming convention:

``{layer}_{source of association asset}__assn_{datasets being linked}_{entity
being linked}``

Association assets can be manually compiled or extracted from data sources. If
the asset associates data from two sources, the source names should be included
in the ``asset_name`` in alphabetical order. Examples:

* ``core_pudl__assn_plants_eia`` associates EIA Plant IDs and manually assigned
PUDL Plant IDs.
Expand Down
12 changes: 6 additions & 6 deletions docs/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ Data Coverage
CAMPD API, and to include 2022 data. Due to changes in the ETL, Alaska, Puerto Rico
and Hawaii are now included in CEMS processing. See issue :issue:`1264` & PRs
:pr:`2779`, :pr:` 2816`.
* New :ref:`core_epa__assn_epacamd_eia` crosswalk version v0.3, see issue :issue:`2317`
* New :ref:`core_epa__assn_eia_epacamd` crosswalk version v0.3, see issue :issue:`2317`
and PR :pr:`2316`. EPA's updates add manual matches and exclusions focusing on
operating units with a generator ID as of 2018.
* New PUDL tables from :doc:`data_sources/ferc1`, integrating older DBF and newer XBRL
Expand Down Expand Up @@ -131,7 +131,7 @@ Data Coverage
:issue:`1823` & PR :pr:`2205`.

* Harvested owner utilities from the EIA 860 ownership table which are now included in
the :ref:`core_eia__entity_utilities` and :ref:`core_pudl__assn_utilities_eia`
the :ref:`core_eia__entity_utilities` and :ref:`core_pudl__assn_eia_pudl_utilities`
tables. See :pr:`2714`. Renamed columns with owner or operator suffix to differentiate
between owner and operator utility columns in :ref:`core_eia860__scd_ownership` and
:ref:`out_eia860__yearly_ownership`. See :pr:`2903`.
Expand All @@ -142,7 +142,7 @@ Data Coverage
:pr:`2561`.
* :ref:`out_eia860__yearly_emissions_control_equipment`, see issue :issue:`2338` & PR
:pr:`2561`.
* :ref:`core_eia860__yearly_boiler_emissions_control_equipment_assn`, see
* :ref:`core_eia860__assn_yearly_boiler_emissions_control_equipment`, see
:issue:`2338` & PR :pr:`2561`.
* :ref:`core_eia860__assn_boiler_cooling`, see :issue:`2586` & PR :pr:`2587`
* :ref:`core_eia860__assn_boiler_stack_flue`, see :issue:`2586` & PR :pr:`2587`
Expand Down Expand Up @@ -202,8 +202,8 @@ Data Coverage
* :ref:`out_ferc714__respondents_with_fips` (annual respondents with county FIPS IDs)
* :ref:`out_ferc714__summarized_demand` (annual demand for FERC-714 respondents)

* Added new table :ref:`core_epa__assn_epacamd_eia_subplant_ids`, which aguments the
:ref:`core_epa__assn_epacamd_eia` glue table. This table incorporates all
* Added new table :ref:`core_epa__assn_eia_epacamd_subplant_ids`, which aguments the
:ref:`core_epa__assn_eia_epacamd` glue table. This table incorporates all
:ref:`core_eia__entity_generators` and all :ref:`core_epacems__hourly_emissions` ID's
and uses these complete IDs to develop a full-coverage ``subplant_id`` column which
granularly connects EPA CAMD with EIA. Thanks to :user:`grgmiller` for his
Expand Down Expand Up @@ -240,7 +240,7 @@ Data Cleaning
affected a small number of records in any table referring to boilers, including
:ref:`core_eia__entity_boilers`, :ref:`core_eia860__scd_boilers`,
:ref:`core_eia923__monthly_boiler_fuel`, :ref:`core_eia860__assn_boiler_generator`
and the :ref:`core_epa__assn_epacamd_eia` crosswalk. It also had some minor downstream
and the :ref:`core_epa__assn_eia_epacamd` crosswalk. It also had some minor downstream
effects on the MCOE outputs. See :issue:`2366` and :pr:`2367`.
* The :ref:`core_eia923__monthly_boiler_fuel` table now includes the
``prime_mover_code`` column. This column was previously incorrectly being associated
Expand Down

This file was deleted.

Loading