From 043975de1e1c1c0109ea15abf0db525764a8186d Mon Sep 17 00:00:00 2001 From: Jacek Laskowski Date: Tue, 30 Jul 2024 18:25:08 +0200 Subject: [PATCH] Delta Kernel --- docs/kernel/.pages | 4 ++++ docs/kernel/DefaultEngine.md | 3 +++ docs/kernel/Engine.md | 7 +++++++ docs/kernel/FileSystemClient.md | 3 +++ docs/kernel/Table.md | 29 +++++++++++++++++++++++++++++ docs/kernel/TableImpl.md | 32 ++++++++++++++++++++++++++++++++ docs/kernel/index.md | 27 +++++++++++++++++++++++++++ mkdocs.yml | 1 + 8 files changed, 106 insertions(+) create mode 100644 docs/kernel/.pages create mode 100644 docs/kernel/DefaultEngine.md create mode 100644 docs/kernel/Engine.md create mode 100644 docs/kernel/FileSystemClient.md create mode 100644 docs/kernel/Table.md create mode 100644 docs/kernel/TableImpl.md create mode 100644 docs/kernel/index.md diff --git a/docs/kernel/.pages b/docs/kernel/.pages new file mode 100644 index 000000000..049eb4652 --- /dev/null +++ b/docs/kernel/.pages @@ -0,0 +1,4 @@ +title: Delta Kernel +nav: + - index.md + - ... diff --git a/docs/kernel/DefaultEngine.md b/docs/kernel/DefaultEngine.md new file mode 100644 index 000000000..a87662f78 --- /dev/null +++ b/docs/kernel/DefaultEngine.md @@ -0,0 +1,3 @@ +# DefaultEngine + +`DefaultEngine` is...FIXME diff --git a/docs/kernel/Engine.md b/docs/kernel/Engine.md new file mode 100644 index 000000000..6ebddea0e --- /dev/null +++ b/docs/kernel/Engine.md @@ -0,0 +1,7 @@ +# Engine + +`Engine` is an [abstraction](#contract) of [Delta Engines](#implementations) that combine (_encapsulate_) together all the necessary delta clients to work with delta tables. + +## Implementations + +* [DefaultEngine](DefaultEngine.md) diff --git a/docs/kernel/FileSystemClient.md b/docs/kernel/FileSystemClient.md new file mode 100644 index 000000000..1348bd3f4 --- /dev/null +++ b/docs/kernel/FileSystemClient.md @@ -0,0 +1,3 @@ +# FileSystemClient + +`FileSystemClient` is...FIXME diff --git a/docs/kernel/Table.md b/docs/kernel/Table.md new file mode 100644 index 000000000..cb68c86af --- /dev/null +++ b/docs/kernel/Table.md @@ -0,0 +1,29 @@ +# Table + +`Table` is an [abstraction](#contract) of [delta tables](#implementations). + +## Contract (Subset) + +### Checkpoint + +```java +checkpoint( + Engine engine, + long version) +``` + +Checkpoints the delta table (at the given version) + +## Implementations + +* [TableImpl](TableImpl.md) + +## Create Delta Table for Path { #forPath } + +```java +Table forPath( + Engine engine, + String path) +``` + +`forPath` creates a [TableImpl](TableImpl.md#forPath) for the given [Engine](Engine.md) and the `path`. diff --git a/docs/kernel/TableImpl.md b/docs/kernel/TableImpl.md new file mode 100644 index 000000000..4ddb26069 --- /dev/null +++ b/docs/kernel/TableImpl.md @@ -0,0 +1,32 @@ +# TableImpl + +`TableImpl` is a [Table](Table.md) that can be created using [Table.forPath](Table.md#forPath) utility. + +!!! note + There is no configuration property to change the default implementation of [Table](Table.md) abstraction at the moment. + +## Creating Instance + +`TableImpl` takes the following to be created: + +* Table path + +`TableImpl` is created using [TableImpl.forPath](#forPath) utility. + +## Create Delta Table for Path { #forPath } + +```java +Table forPath( + Engine engine, + String path) +``` + +`forPath` requests the given [Engine](Engine.md) for the [getFileSystemClient](Engine.md#getFileSystemClient) to [resolvePath](FileSystemClient.md#resolvePath). + +In the end, `forPath` creates a [TableImpl](TableImpl.md) with the path resolved. + +--- + +`forPath` is used when: + +* [Table.forPath](Table.md#forPath) utility is used diff --git a/docs/kernel/index.md b/docs/kernel/index.md new file mode 100644 index 000000000..adbedc2a4 --- /dev/null +++ b/docs/kernel/index.md @@ -0,0 +1,27 @@ +# Delta Kernel + +**Delta Kernel** is a Java API (abstractions) for working with delta tables, getting their snapshots and creating scan objects to scan a subset of the data in the tables. + +``` text +libraryDependencies += "io.delta" % "delta-kernel-api" % "{{ delta.version }}" +libraryDependencies += "io.delta" % "delta-kernel-defaults" % "{{ delta.version }}" +libraryDependencies += "org.apache.hadoop" % "hadoop-client-api" % "{{ hadoop.version }}" +libraryDependencies += "org.apache.hadoop" % "hadoop-client-runtime" % "{{ hadoop.version }}" +``` + +[Table.forPath](Table.md#forPath) utility is used to create a delta table. A required [Engine](Engine.md) can be created using [DefaultEngine.create](DefaultEngine.md#create) utility. + +```scala +import org.apache.hadoop.conf.Configuration +val hadoopConf = new Configuration(false) + +import io.delta.kernel.defaults.engine.DefaultEngine +val engine = DefaultEngine.create(hadoopConf) + +val dataPath = "/tmp/delta-table" + +import io.delta.kernel.Table +val deltaTable = Table.forPath(engine, dataPath) + +assert(deltaTable.getPath(engine) == s"file:${dataPath}") +``` diff --git a/mkdocs.yml b/mkdocs.yml index 46feff83e..60886ac5e 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -343,5 +343,6 @@ nav: - DeltaProgressReporter: DeltaProgressReporter.md - DeltaLogging: DeltaLogging.md - SQLMetricsReporting: SQLMetricsReporting.md + - ... | kernel/**.md - Contenders: - Contenders: contenders/index.md