Skip to content

Commit

Permalink
Merge branch 'main' into cdf_delta_spark_tests
Browse files Browse the repository at this point in the history
  • Loading branch information
OussamaSaoudi authored Jan 14, 2025
2 parents c8a01ac + b3546f0 commit 820a384
Show file tree
Hide file tree
Showing 54 changed files with 492 additions and 96 deletions.
13 changes: 13 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,19 @@ Thanks for sending a pull request! Here are some tips for you:
5. Be sure to keep the PR description updated to reflect all changes.
-->

<!--
PR title formatting:
This project uses conventional commits: https://www.conventionalcommits.org/
Each PR corresponds to a commit on the `main` branch, with the title of the PR (typically) being
used for the commit message on main. In order to ensure proper formatting in the CHANGELOG please
ensure your PR title adheres to the conventional commit specification.
Examples:
- new feature PR: "feat: new API for snapshot.update()"
- bugfix PR: "fix: correctly apply DV in read-table example"
-->

## What changes are proposed in this pull request?
<!--
Please clarify what changes you are proposing and why the changes are needed.
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,10 @@ jobs:
run: cargo clippy --benches --tests --all-features -- -D warnings
- name: lint without default features
run: cargo clippy --no-default-features -- -D warnings
- name: check kernel builds with default-engine
run: cargo build -p feature_tests --features default-engine
- name: check kernel builds with default-engine-rustls
run: cargo build -p feature_tests --features default-engine-rustls
test:
runs-on: ${{ matrix.os }}
strategy:
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
.idea/
.vscode/
.vim
.zed

# Rust
.cargo/
Expand Down
34 changes: 34 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,39 @@
# Changelog

## [v0.6.1](https://github.com/delta-io/delta-kernel-rs/tree/v0.6.1/) (2025-01-10)

[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.6.0...v0.6.1)


### 🚀 Features / new APIs

1. New feature flag `default-engine-rustls` ([#572])

### 🐛 Bug Fixes

1. Allow partition value timestamp to be ISO8601 formatted string ([#622])
2. Fix stderr output for handle tests ([#630])

### ⚙️ Chores/CI

1. Expand the arrow version range to allow arrow v54 ([#616])
2. Update to CodeCov @v5 ([#608])

### Other

1. Fix msrv check by pinning `home` dependency ([#605])
2. Add release script ([#636])


[#605]: https://github.com/delta-io/delta-kernel-rs/pull/605
[#608]: https://github.com/delta-io/delta-kernel-rs/pull/608
[#622]: https://github.com/delta-io/delta-kernel-rs/pull/622
[#630]: https://github.com/delta-io/delta-kernel-rs/pull/630
[#572]: https://github.com/delta-io/delta-kernel-rs/pull/572
[#616]: https://github.com/delta-io/delta-kernel-rs/pull/616
[#636]: https://github.com/delta-io/delta-kernel-rs/pull/636


## [v0.6.0](https://github.com/delta-io/delta-kernel-rs/tree/v0.6.0/) (2024-12-17)

[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/v0.5.0...v0.6.0)
Expand Down
29 changes: 17 additions & 12 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ members = [
"kernel",
"kernel/examples/*",
"test-utils",
"feature-tests",
]
# Only check / build main crates by default (check all with `--workspace`)
default-members = ["acceptance", "kernel"]
Expand All @@ -19,20 +20,24 @@ license = "Apache-2.0"
repository = "https://github.com/delta-io/delta-kernel-rs"
readme = "README.md"
rust-version = "1.80"
version = "0.6.0"
version = "0.6.1"

[workspace.dependencies]
arrow = { version = ">=53, <54" }
arrow-arith = { version = ">=53, <54" }
arrow-array = { version = ">=53, <54" }
arrow-buffer = { version = ">=53, <54" }
arrow-cast = { version = ">=53, <54" }
arrow-data = { version = ">=53, <54" }
arrow-ord = { version = ">=53, <54" }
arrow-json = { version = ">=53, <54" }
arrow-select = { version = ">=53, <54" }
arrow-schema = { version = ">=53, <54" }
parquet = { version = ">=53, <54", features = ["object_store"] }
# When changing the arrow version range, also modify ffi/Cargo.toml which has
# its own arrow version ranges witeh modified features. Failure to do so will
# result in compilation errors as two different sets of arrow dependencies may
# be sourced
arrow = { version = ">=53, <55" }
arrow-arith = { version = ">=53, <55" }
arrow-array = { version = ">=53, <55" }
arrow-buffer = { version = ">=53, <55" }
arrow-cast = { version = ">=53, <55" }
arrow-data = { version = ">=53, <55" }
arrow-ord = { version = ">=53, <55" }
arrow-json = { version = ">=53, <55" }
arrow-select = { version = ">=53, <55" }
arrow-schema = { version = ">=53, <55" }
parquet = { version = ">=53, <55", features = ["object_store"] }
object_store = { version = ">=0.11, <0.12" }
hdfs-native-object-store = "0.12.0"
hdfs-native = "0.10.0"
Expand Down
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Delta-kernel-rs is split into a few different crates:
- kernel: The actual core kernel crate
- acceptance: Acceptance tests that validate correctness via the [Delta Acceptance Tests][dat]
- derive-macros: A crate for our [derive-macros] to live in
- ffi: Functionallity that enables delta-kernel-rs to be used from `C` or `C++` See the [ffi](ffi)
- ffi: Functionality that enables delta-kernel-rs to be used from `C` or `C++` See the [ffi](ffi)
directory for more information.

## Building
Expand Down Expand Up @@ -43,10 +43,10 @@ consumer's own `Engine` trait, the kernel has a feature flag to enable a default
```toml
# fewer dependencies, requires consumer to implement Engine trait.
# allows consumers to implement their own in-memory format
delta_kernel = "0.6"
delta_kernel = "0.6.1"

# or turn on the default engine, based on arrow
delta_kernel = { version = "0.6", features = ["default-engine"] }
delta_kernel = { version = "0.6.1", features = ["default-engine"] }
```

### Feature flags
Expand All @@ -66,12 +66,12 @@ are still unstable. We therefore may break APIs within minor releases (that is,
we will not break APIs in patch releases (`0.1.0` -> `0.1.1`).

## Arrow versioning
If you enable the `default-engine` or `sync-engine` features, you get an implemenation of the
If you enable the `default-engine` or `sync-engine` features, you get an implementation of the
`Engine` trait that uses [Arrow] as its data format.

The [`arrow crate`](https://docs.rs/arrow/latest/arrow/) tends to release new major versions rather
quickly. To enable engines that already integrate arrow to also integrate kernel and not force them
to track a specific version of arrow that kernel depends on, we take as broad dependecy on arrow
to track a specific version of arrow that kernel depends on, we take as broad dependency on arrow
versions as we can.

This means you can force kernel to rely on the specific arrow version that your engine already uses,
Expand All @@ -96,7 +96,7 @@ arrow-schema = "53.0"
parquet = "53.0"
```

Note that unfortunatly patching in `cargo` requires that _exactly one_ version matches your
Note that unfortunately patching in `cargo` requires that _exactly one_ version matches your
specification. If only arrow "53.0.0" had been released the above will work, but if "53.0.1" where
to be released, the specification will break and you will need to provide a more restrictive
specification like `"=53.0.0"`.
Expand All @@ -111,7 +111,7 @@ and then checking what version of `object_store` it depends on.
## Documentation

- [API Docs](https://docs.rs/delta_kernel/latest/delta_kernel/)
- [arcitecture.md](doc/architecture.md) document describing the kernel architecture (currently wip)
- [architecture.md](doc/architecture.md) document describing the kernel architecture (currently wip)

## Examples

Expand Down
3 changes: 3 additions & 0 deletions acceptance/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ readme.workspace = true
version.workspace = true
rust-version.workspace = true

[package.metadata.release]
release = false

[dependencies]
arrow-array = { workspace = true }
arrow-cast = { workspace = true }
Expand Down
6 changes: 1 addition & 5 deletions acceptance/src/data.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ pub fn sort_record_batch(batch: RecordBatch) -> DeltaResult<RecordBatch> {
Ok(RecordBatch::try_new(batch.schema(), columns)?)
}

// Ensure that two schema have the same field names, and dict_id/ordering.
// Ensure that two schema have the same field names, and dict_is_ordered
// We ignore:
// - data type: This is checked already in `assert_columns_match`
// - nullability: parquet marks many things as nullable that we don't in our schema
Expand All @@ -72,10 +72,6 @@ fn assert_schema_fields_match(schema: &Schema, golden: &Schema) {
schema_field.name() == golden_field.name(),
"Field names don't match"
);
assert!(
schema_field.dict_id() == golden_field.dict_id(),
"Field dict_id doesn't match"
);
assert!(
schema_field.dict_is_ordered() == golden_field.dict_is_ordered(),
"Field dict_is_ordered doesn't match"
Expand Down
70 changes: 70 additions & 0 deletions cliff.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# git-cliff configuration file. see https://git-cliff.org/docs/configuration

[changelog]
header = """
# Changelog\n
"""
# Tera template
body = """
## [v{{ version }}](https://github.com/delta-io/delta-kernel-rs/tree/v{{ version }}/) ({{ timestamp | date(format="%Y-%m-%d") }})
[Full Changelog](https://github.com/delta-io/delta-kernel-rs/compare/{{ previous.version }}...v{{ version }})
{% for group, commits in commits | group_by(attribute="group") %}
### {{ group | striptags | trim | upper_first }}
{% for commit in commits %}
{{ loop.index }}. {% if commit.scope %}*({{ commit.scope }})* {% endif %}\
{{ commit.message | split(pat="\n") | first | upper_first | replace(from="(#", to="([#")\
| replace(from="0)", to="0])")\
| replace(from="1)", to="1])")\
| replace(from="2)", to="2])")\
| replace(from="3)", to="3])")\
| replace(from="4)", to="4])")\
| replace(from="5)", to="5])")\
| replace(from="6)", to="6])")\
| replace(from="7)", to="7])")\
| replace(from="8)", to="8])")\
| replace(from="9)", to="9])") }}\
{% endfor %}
{% endfor %}
{% for commit in commits %}
{% set message = commit.message | split(pat="\n") | first %}\
{% set pr = message | split(pat="(#") | last | split(pat=")") | first %}\
[#{{ pr }}]: https://github.com/delta-io/delta-kernel-rs/pull/{{ pr }}\
{% endfor %}\n\n\n
"""
footer = """
"""
# remove the leading and trailing s
trim = true
postprocessors = []

[git]
# parse the commits based on https://www.conventionalcommits.org
conventional_commits = true
# filter out the commits that are not conventional
filter_unconventional = false
# process each line of a commit as an individual commit
split_commits = false
# regex for preprocessing the commit messages
commit_preprocessors = []
# regex for parsing and grouping commits. note that e.g. both doc and docs are matched since we have
# trim = true above.
commit_parsers = [
{ field = "github.pr_labels", pattern = "breaking-change", group = "<!-- 0 --> 🏗️ Breaking changes" },
{ message = "^feat", group = "<!-- 1 -->🚀 Features / new APIs" },
{ message = "^fix", group = "<!-- 2 -->🐛 Bug Fixes" },
{ message = "^doc", group = "<!-- 3 -->📚 Documentation" },
{ message = "^perf", group = "<!-- 4 -->⚡ Performance" },
{ message = "^refactor", group = "<!-- 5 -->🚜 Refactor" },
{ message = "^test", group = "<!-- 6 -->🧪 Testing" },
{ message = "^chore|^ci", group = "<!-- 7 -->⚙️ Chores/CI" },
{ message = "^revert", group = "<!-- 8 -->◀️ Revert" },
{ message = ".*", group = "<!-- 9 -->Other" },
]
# filter out the commits that are not matched by commit parsers
filter_commits = false
# sort the tags topologically
topo_order = false
# sort the commits inside sections by oldest/newest order
sort_commits = "oldest"
19 changes: 19 additions & 0 deletions feature-tests/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[package]
name = "feature_tests"
edition.workspace = true
homepage.workspace = true
keywords.workspace = true
license.workspace = true
repository.workspace = true
readme.workspace = true
version.workspace = true

[package.metadata.release]
release = false

[dependencies]
delta_kernel = { path = "../kernel" }

[features]
default-engine = [ "delta_kernel/default-engine" ]
default-engine-rustls = [ "delta_kernel/default-engine-rustls" ]
12 changes: 12 additions & 0 deletions feature-tests/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
/// This is a compilation test to ensure that the default-engine feature flags are working
/// correctly. Run (from workspace root) with:
/// 1. `cargo b -p feature_tests --features default-engine-rustls`
/// 2. `cargo b -p feature_tests --features default-engine`
/// These run in our build CI.
pub fn test_default_engine_feature_flags() {
#[cfg(any(feature = "default-engine", feature = "default-engine-rustls"))]
{
#[allow(unused_imports)]
use delta_kernel::engine::default::DefaultEngine;
}
}
3 changes: 3 additions & 0 deletions ffi-proc-macros/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ readme.workspace = true
rust-version.workspace = true
version.workspace = true

[package.metadata.release]
release = false

[lib]
proc-macro = true

Expand Down
11 changes: 7 additions & 4 deletions ffi/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ version.workspace = true
rust-version.workspace = true
build = "build.rs"

[package.metadata.release]
release = false

[lib]
crate-type = ["lib", "cdylib", "staticlib"]

Expand All @@ -21,16 +24,16 @@ url = "2"
delta_kernel = { path = "../kernel", default-features = false, features = [
"developer-visibility",
] }
delta_kernel_ffi_macros = { path = "../ffi-proc-macros", version = "0.6.0" }
delta_kernel_ffi_macros = { path = "../ffi-proc-macros", version = "0.6.1" }

# used if we use the default engine to be able to move arrow data into the c-ffi format
arrow-schema = { version = "53.0", default-features = false, features = [
arrow-schema = { version = ">=53, <55", default-features = false, features = [
"ffi",
], optional = true }
arrow-data = { version = "53.0", default-features = false, features = [
arrow-data = { version = ">=53, <55", default-features = false, features = [
"ffi",
], optional = true }
arrow-array = { version = "53.0", default-features = false, optional = true }
arrow-array = { version = ">=53, <55", default-features = false, optional = true }

[build-dependencies]
cbindgen = "0.27.0"
Expand Down
2 changes: 1 addition & 1 deletion ffi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This crate provides a c foreign function internface (ffi) for delta-kernel-rs.
You can build static and shared-libraries, as well as the include headers by simply running:

```sh
cargo build [--release] [--features default-engine]
cargo build [--release]
```

This will place libraries in the root `target` dir (`../target/[debug,release]` from the directory containing this README), and headers in `../target/ffi-headers`. In that directory there will be a `delta_kernel_ffi.h` file, which is the C header, and a `delta_kernel_ffi.hpp` which is the C++ header.
Expand Down
4 changes: 2 additions & 2 deletions ffi/examples/read-table/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ This example is built with [cmake]. Instructions below assume you start in the d
Note that prior to building these examples you must build `delta_kernel_ffi` (see [the FFI readme] for details). TLDR:
```bash
# from repo root
$ cargo build -p delta_kernel_ffi [--release] [--features default-engine, tracing]
$ cargo build -p delta_kernel_ffi [--release] --features tracing
# from ffi/ dir
$ cargo build [--release] [--features default-engine, tracing]
$ cargo build [--release] --features tracing
```

There are two configurations that can currently be configured in cmake:
Expand Down
4 changes: 2 additions & 2 deletions ffi/examples/read-table/arrow.c
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ static GArrowRecordBatch* add_partition_columns(
}

GArrowArray* partition_col = garrow_array_builder_finish((GArrowArrayBuilder*)builder, &error);
if (report_g_error("Can't build string array for parition column", error)) {
if (report_g_error("Can't build string array for partition column", error)) {
printf("Giving up on column %s\n", col);
g_error_free(error);
g_object_unref(builder);
Expand Down Expand Up @@ -144,7 +144,7 @@ static void add_batch_to_context(
}
record_batch = add_partition_columns(record_batch, partition_cols, partition_values);
if (record_batch == NULL) {
printf("Failed to add parition columns, not adding batch\n");
printf("Failed to add partition columns, not adding batch\n");
return;
}
context->batches = g_list_append(context->batches, record_batch);
Expand Down
Loading

0 comments on commit 820a384

Please sign in to comment.