Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add naming convention change to release notes #3028

Merged

Conversation

bendnorman
Copy link
Member

PR Overview

This PR:

  • Adds information about the name change to the release notes
  • Adds deprecation warnings to PudlTabl
  • Explains why schedule numbers are in the ferc names
  • Adds our policies about ordered source names in association table names and when to use pudl as a source

PR Checklist

  • Merge the most recent version of the branch you are merging into (probably dev).
  • All CI checks are passing. Run tests locally to debug failures
  • Make sure you've included good docstrings.
  • For major data coverage & analysis changes, run data validation tests
  • Include unit tests for new functions and classes.
  • Defensive data quality/sanity checks in analyses & data processing functions.
  • Update the release notes and reference reference the PR and related issues.
  • Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

@bendnorman bendnorman requested a review from cmgosnell November 8, 2023 01:38
:ref:`docs <asset-naming>`.

To help users migrate away from using ``PudlTabl`` and our temporary table names,
we've created a `google sheet <https://docs.google.com/spreadsheets/d/1RBuKl_xKzRSLgRM7GIZbc5zUYieWFE20cXumWuv5njo/edit?usp=sharing>`__
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the formatting of this sheet is helpful/legible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! My only suggestion would be to add a column which indicates whether or not it is a pudl db table. which obviously you could just derive from the name (w/ something like =if(LEFT(R2,1)="_",FALSE,TRUE))

mostly so users who are only gonna interact w/ the published data can ignore the non-db tables

I also think it might be good to copy-> paste values from the Old Asset/Table Name column and delete the hidden columns. (maybe its just me but i definitely got distracted by trying to understand what was going on in there)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also just remove non db assets so users don't think "what the heck are all of these _core and raw assets? They're not in the db?!"

We plan to remove ``PudlTabl`` from the pudl package once our known users have
succesfully migrated to pulling data directly from ``pudl.sqlite``. We've added
deprecation warnings to the ``PudlTabl`` class. We expect to remove ``PudlTabl``
at the end of February 2024.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

February is my best guess at when we can migrate our main users over to just pulling from the tagged databases. Maybe we don't put a date and just say "We'll remove it once we've migrated our known users."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not dating this is a good idea!

Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great to me! i add a few non-blocking suggestions.

Comment on lines +60 to +63
* ``source`` is sometimes ``pudl``. This means the asset
is a derived connection the contributors of PUDL created to connect multiple
datasets via manual or machine learning methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels good and true to me! i imagine maybe there could be a future version of pudlassets that included other/different things besides connection/record linkage type tables.

but i think this currently true so i think this is good as is!!

Comment on lines 70 to 71
The source names should appear in the same order for all assets that associate
the two sources. Examples:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make this alphabetical always? and swap core_epa__assn_epacamd_eia to core_epa__assn_eia_epacamd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or is this different because the og source is epa so epacamdshould come first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, we should probs make the alphabetical. I'll update the docs in this PR and open a new branch that updates the table names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some other inconsistencies with our assn assets. Would love your feedback!

:ref:`docs <asset-naming>`.

To help users migrate away from using ``PudlTabl`` and our temporary table names,
we've created a `google sheet <https://docs.google.com/spreadsheets/d/1RBuKl_xKzRSLgRM7GIZbc5zUYieWFE20cXumWuv5njo/edit?usp=sharing>`__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! My only suggestion would be to add a column which indicates whether or not it is a pudl db table. which obviously you could just derive from the name (w/ something like =if(LEFT(R2,1)="_",FALSE,TRUE))

mostly so users who are only gonna interact w/ the published data can ignore the non-db tables

I also think it might be good to copy-> paste values from the Old Asset/Table Name column and delete the hidden columns. (maybe its just me but i definitely got distracted by trying to understand what was going on in there)

We plan to remove ``PudlTabl`` from the pudl package once our known users have
succesfully migrated to pulling data directly from ``pudl.sqlite``. We've added
deprecation warnings to the ``PudlTabl`` class. We expect to remove ``PudlTabl``
at the end of February 2024.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not dating this is a good idea!

Comment on lines 92 to 97
logger.warning(
"PudlTabl is deprecated and will be removed from the pudl package"
"at the end of February 2024. To acccess the data returned by"
"this class, pull the desired table directly from the pudl.sqlite"
"database."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great! one nit is to add spaces at the end of each line bc it'll squish the wordstogether packageat (same w/ warning below)

Comment on lines 305 to 309
logger.warning(
"PudlTabl is deprecated and will be removed from the pudl package"
"at the end of February 2024. To access the data returned by this method,"
f"use the {table_name} table in the pudl.sqlite database."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes 100% sense to add this everytime someone grabs a table instead of just in init.. but it will be so noisy. i although i think it is the appropriate behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True it will be noisy but I think it will make the migration easier for folks. The warning message spits out the table they should pull from instead of them having to look up the PudlTabl method in the spreadsheet. IDK, you're more familiar with how folks use PudlTabl so this warning might be super annoying.

@bendnorman bendnorman marked this pull request as ready for review November 9, 2023 19:04
Copy link

codecov bot commented Nov 9, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (create-naming-convention-docs@50e3eef). Click here to learn what that means.

Additional details and impacted files
@@                       Coverage Diff                       @@
##             create-naming-convention-docs   #3028   +/-   ##
===============================================================
  Coverage                                 ?   88.6%           
===============================================================
  Files                                    ?      91           
  Lines                                    ?   11021           
  Branches                                 ?       0           
===============================================================
  Hits                                     ?    9773           
  Misses                                   ?    1248           
  Partials                                 ?       0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bendnorman bendnorman merged commit 53d5618 into create-naming-convention-docs Nov 9, 2023
15 checks passed
@bendnorman bendnorman deleted the create-renaming-release-notes branch November 9, 2023 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants