-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: incremental Snapshot update #651
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Robert Pack <[email protected]>
Signed-off-by: Robert Pack <[email protected]>
Signed-off-by: Robert Pack <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #651 +/- ##
==========================================
+ Coverage 83.66% 83.69% +0.03%
==========================================
Files 75 75
Lines 16949 17087 +138
Branches 16949 17087 +138
==========================================
+ Hits 14180 14301 +121
- Misses 2085 2088 +3
- Partials 684 698 +14 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Robert Pack <[email protected]>
@@ -166,21 +166,59 @@ impl LogSegment { | |||
.filter_ok(|x| x.is_commit()) | |||
.try_collect()?; | |||
|
|||
// Return a FileNotFound error if there are no new commits. | |||
// Callers can then decide how to handle this case. | |||
require!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting it up makes it clearer that we want nonempty 👌
// Callers can then decide how to handle this case. | ||
require!( | ||
!ascending_commit_files.is_empty(), | ||
Error::file_not_found("No new commits.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the error should be more like "No commits in range", or "Failed to find commits in the range". "New commits" makes sense for an update operation, but CDF could operate over old commits.
/// The other LogSegment must be contiguous with this one, i.e. the end | ||
/// version of this LogSegment must be one less than the start version of | ||
/// the other LogSegment and contain only commit files. | ||
pub(crate) fn extend(&mut self, other: &LogSegment) -> DeltaResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this passes by reference. I'm not sure when we'd need to keep around such a log segment besides extending an existing one. Should we pass ownership and avoid the clone?
|
||
let (metadata, protocol) = log_segment.read_metadata_opt(engine)?; | ||
if let Some(p) = &protocol { | ||
p.ensure_read_supported()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the column mapping mode also be checked in the case only Protocol is updated? the column_mapping_mode
function takes protocol as a parameter, so I imagine whatever the output column_mapping_mode is should be re-validated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact this is another usecase for my TableConfiguration PR where we do all the validations in one place without having to think about these cases.
assert_eq!(snapshot.protocol().min_reader_version(), 1); | ||
assert_eq!(snapshot.table_properties.enable_type_widening, None); | ||
|
||
snapshot.update(1, &engine).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a clean n simple api to update! 👌
@@ -137,6 +137,9 @@ pub struct TableProperties { | |||
/// whether to enable row tracking during writes. | |||
pub enable_row_tracking: Option<bool>, | |||
|
|||
/// whether to enable type widening |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see enableTypeWidening
in the protocol. How come it's not part of the protocol doesn't have it when other docs do?
Note
removed the breaking change label, since calling code should not require updates.
What changes are proposed in this pull request?
This PR proposes a new api on
Snapshot
to handle updating a snapshot to a more recent version.This allows engines like duckdb or delta-rs to hold on to a
Snapshot
and avoid potentially expensive operations on the object store by only the new log entries.This PR affects the following public APIs
update
method toSnapshot
.enable_type_widening
onTableProperties
(this was useful to use the type widening table for testing)Snapshot::try_new
How was this change tested?
LogSegment
andSnapshot::update