Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Make [non] nullable struct fields easier to create #646

Merged
merged 6 commits into from
Jan 16, 2025

Conversation

scovich
Copy link
Collaborator

@scovich scovich commented Jan 15, 2025

What changes are proposed in this pull request?

A lot of code (especially in tests) calls StructField::new with literal values of the nullable: bool argument. Booleans are easy to misinterpret (which value means non-null field?) -- and it's hard to read when the third arg is split to its own line in nested expressions such as:

        StructField::new(
            "fileConstantValues",
            StructType::new([StructField::new(
                "partitionValues",
                MapType::new(DataType::STRING, DataType::STRING, true),
                false,
            )]),
            true,
        ),

To improve readability and make the code less error-prone, define two new helper methods/constructors for StructField: nullable and not_null, which create struct fields having the corresponding nullability.

How was this change tested?

No new functionality, and all existing unit tests still pass.

To minimize the risk of unfaithful refactoring, the change was made in four steps:

  1. Use a multi-file regexp search/replace to convert simple code such like this:
    StructField::new("a", DataType::LONG, true)
    to this:
    StructField::nullable("a", DataType::LONG)
    The exact expression used was: StructField::new(\([^()]*\), true) → StructField::nullable(\1), which ignores any constructor call containing parentheses, to avoid ambiguity.
  2. Use the multi-file regexp search/replace StructField::new(\([^()]*\), false) → StructField::not_null(\1), to convert simple use not_null call sites (see above for details).
  3. Use an interactive multi-file search/replace StructField::new → StructField::nullable, relying on IDE parentheses matching to identify calls that pass the literal true (first pass). As a safety measure, the resulting code is compiled; all changed call sites fail to compile because of the (now unrecognized) third arg, which can then be deleted after verifying it is the literal true.
  4. Use the same two-pass process for StructField::new → StructField::not_null with literal false.

Each step is its own commit, for easier verification.

Copy link

codecov bot commented Jan 15, 2025

Codecov Report

Attention: Patch coverage is 98.76543% with 3 lines in your changes missing coverage. Please review.

Project coverage is 83.66%. Comparing base (616e9ac) to head (c5a8450).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
ffi/src/test_ffi.rs 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #646      +/-   ##
==========================================
- Coverage   83.68%   83.66%   -0.02%     
==========================================
  Files          75       75              
  Lines       16955    16909      -46     
  Branches    16955    16909      -46     
==========================================
- Hits        14188    14147      -41     
+ Misses       2102     2099       -3     
+ Partials      665      663       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@zachschuermann zachschuermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whew, (checked each line haha) LGTM!

Comment on lines +125 to +133
/// Creates a new nullable field
pub fn nullable(name: impl Into<String>, data_type: impl Into<DataType>) -> Self {
Self::new(name, data_type, true)
}

/// Creates a new non-nullable field
pub fn not_null(name: impl Into<String>, data_type: impl Into<DataType>) -> Self {
Self::new(name, data_type, false)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe (this is a big maybe - might not be useful) we should call out in doc comments either here or in the new constructor that these exist just as shortcuts basically?

Copy link
Collaborator

@OussamaSaoudi OussamaSaoudi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting up the commits was really clever! Made reviewing a breeze. LGTM

@scovich scovich merged commit 76c65c8 into delta-io:main Jan 16, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants