Reviewing Code Review #2240
Replies: 2 comments 1 reply
-
@jdangerx and I decided to revive this discussion because of the large dagster PR I created. This PR converts the ETL portion of our codebase to use dagster and initializes the IO Managers we'll use for PUDL. The most recent commit put the PR at 35 files and My general plan for the dagster integration is to split it into two phases. Phase 1 is converting the ETL to use dagster. Everything from the ferc to sqlite conversion to the normalized tables being loaded into the Separating the work into these two phases is convenient because the ETL and the PudlTabl class are two halves of pudl that are loosely coupled. The PudlTable class receives a SqlAlchemy engine that points to the outputs of the ETL. Phase 1Phase 1 felt hard to break into smaller pieces. The entire ETL should be converted to dagster before merging the changes into dev. It wouldn't make sense to have part of our ETL converted to dagster. The downside of this is that it leads to a massive PR! I think if I was to recreate the PR, I could have created a primary branch off of dev and created new branches for each portion of the ETL I converted. I think this would have made the development process a bit smoother. Instead, I had one PR and requested "partial" reviews from folks. @zschira and @katie-lamb reviews back in December were helpful, but it's been challenging to rope other people in since then. If I had smaller PRs for each portion of the ETL, people would have been more willing to review the work. How would other folks have structured this work? @jdangerx There is still some work left for Phase 1; see #1570. We could create separate branches for the integration tests and documents. Phase 2Phase 2 feels much easier to structure and parallelize. Once Phase 1 is in the |
Beta Was this translation helpful? Give feedback.
-
I'm going to start writing a "quick comment" and it's about to turn into a novel. I think the overall goal of code review is context transfer: we want at least two people to understand what's going into the codebase, how it interacts with the rest of the system, and why we need this code. Other things that I think are important, but not the most important goal that we want to keep in mind at all times:
Logical correctness, I think, is implicit in the context transfer piece - if two people both believe they understand the why/what/how then I assume they wouldn't let the code in unless those all were in good shape... Review speed is super important! I think it's more properly addressed in task breakdown / project management land, though - a lot of what causes reviews to be slow are:
Anyways - some proposed guidelines:
A lot of times this can be split across the main PR comment and a self-review. For reviewers:
As for long lived branches, my official take is "long lived branches are not worth the heartache, so screw 'em." But that's a conversation for elsewhere. As for the guideline about "be nice, not mean" I think that lives in a code of conduct - that should be table stakes for engaging at all. I'm happy to whip up a PR template - our existing one in PUDL is a good start, but I think it's a bit verbose and has a bit of a "did you do your homework" adversarial framing tbh. Some excellent resources that informed a lot of my strident opinions about code review: |
Beta Was this translation helpful? Give feedback.
-
Originally opened by @TrentonBush is the business repo. I'm moving this issue into a PUDL discussion so it can abe more visible and conversational.
Description
Review our existing process for code reviews, identify areas that could be improved, and draft requirements to fix them.
Code review is a means of quality assurance wherein a second person checks each piece of contributed code for various quality requirements (documentation, test coverage, etc). There is a tradeoff between the increased quality due to review and the costs of doing the review, and we want to make sure we choose the right tradeoff for us. Note that reviewing costs are not limited to the literal hours spent reviewing, but also include problems (extra merge conflicts, late delivery, etc) caused by any and all delays between opening a PR and the conclusion of the review.
Motivation
We want to minimize the maintenance burden of our code so that we can spend our resources on building new stuff to serve more users. We also want to avoid the long delays we have incurred in the past while waiting for someone to review new PRs.
This is an "important but not urgent" background problem for our organization.
Scope
How do we know when we are done?
This will be an ongoing process, but the first pass will be complete when we draft requirements for both submitting PRs and reviewing PRs. In particular we want to address:
For submitting:
For reviewing:
For both:
What is out of scope?
Anything else?
Some pieces are currently in various google docs:
Beta Was this translation helpful? Give feedback.
All reactions