Skip to content

Commit

Permalink
doc: Clarify JsonHandler semantics on EngineData ordering (#635)
Browse files Browse the repository at this point in the history
## What changes are proposed in this pull request?
When reading multiple log files during log replay, it is important that
we read commits in order. This ensures the correctness of add/remove
deduplication. Hence, we are implicitly relying on the commit files
being read in order by the json handler.


Moreover when in-commit timestamps is enabled, the ordering of batches
of engine data in a commit is important. A correct delta table should
have the commit info be the _first_ action in a log file.
  • Loading branch information
OussamaSaoudi authored Jan 14, 2025
1 parent b3546f0 commit 9c43bf4
Showing 1 changed file with 14 additions and 2 deletions.
16 changes: 14 additions & 2 deletions kernel/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -371,8 +371,20 @@ pub trait JsonHandler: AsAny {
output_schema: SchemaRef,
) -> DeltaResult<Box<dyn EngineData>>;

/// Read and parse the JSON format file at given locations and return
/// the data as EngineData with the columns requested by physical schema.
/// Read and parse the JSON format file at given locations and return the data as EngineData with
/// the columns requested by physical schema. Note: The [`FileDataReadResultIterator`] must emit
/// data from files in the order that `files` is given. For example if files ["a", "b"] is provided,
/// then the engine data iterator must first return all the engine data from file "a", _then_ all
/// the engine data from file "b". Moreover, for a given file, all of its [`EngineData`] and
/// constituent rows must be in order that they occur in the file. Consider a file with rows
/// (1, 2, 3). The following are legal iterator batches:
/// iter: [EngineData(1, 2), EngineData(3)]
/// iter: [EngineData(1), EngineData(2, 3)]
/// iter: [EngineData(1, 2, 3)]
/// The following are illegal batches:
/// iter: [EngineData(3), EngineData(1, 2)]
/// iter: [EngineData(1), EngineData(3, 2)]
/// iter: [EngineData(2, 1, 3)]
///
/// # Parameters
///
Expand Down

0 comments on commit 9c43bf4

Please sign in to comment.