From ef829853957d1644dd42530623423bf3d304e3e8 Mon Sep 17 00:00:00 2001
From: Sreesh Maheshwar <smaheshwar@palantir.com>
Date: Sat, 18 Jan 2025 13:57:03 +0000
Subject: [PATCH 1/6] Documentation for Location Providers

---
 mkdocs/docs/configuration.md | 59 ++++++++++++++++++++++++++++++++++++
 pyiceberg/table/locations.py |  7 ++++-
 2 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/mkdocs/docs/configuration.md b/mkdocs/docs/configuration.md
index 06eaac1bed..46a27f0177 100644
--- a/mkdocs/docs/configuration.md
+++ b/mkdocs/docs/configuration.md
@@ -54,6 +54,8 @@ Iceberg tables support table properties to configure table behavior.
 
 ### Write options
 
+***TODO:*** Add LocationProvider-related properties here.
+
 | Key                                    | Options                           | Default | Description                                                                                 |
 | -------------------------------------- | --------------------------------- | ------- | ------------------------------------------------------------------------------------------- |
 | `write.parquet.compression-codec`      | `{uncompressed,zstd,gzip,snappy}` | zstd    | Sets the Parquet compression coddec.                                                        |
@@ -195,6 +197,63 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya
 
 <!-- markdown-link-check-enable-->
 
+## Location Providers
+
+Iceberg works with the concept of a LocationProvider that determines the file paths for a table's data. PyIceberg
+introduces a pluggable LocationProvider module; the LocationProvider used may be specified on a per-table basis via
+table properties. PyIceberg defaults to the [ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider),
+which generates file paths that are optimised for object storage.
+
+### SimpleLocationProvider
+
+The SimpleLocationProvider places file names underneath a `data` directory in the table's storage location. For example,
+a non-partitioned table might have a data file with location:
+
+```txt
+s3://my-bucket/my_table/data/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
+```
+
+When data is partitioned, the files under a given partition are grouped into a subdirectory, with that partition key
+and value as the directory name. For example, a table partitioned over a string column `category` might have a data file
+with location:
+
+```txt
+s3://my-bucket/my_table/data/category=orders/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
+```
+
+The SimpleLocationProvider is enabled for a table by explicitly setting its `write.object-storage.enabled` table property to `false`.
+
+### ObjectStoreLocationProvider
+
+When several files are stored under the same prefix, cloud object stores such as S3 often [throttling requests on prefixes](https://repost.aws/knowledge-center/http-5xx-errors-s3),
+resulting in slowdowns.
+
+The ObjectStoreLocationProvider counteracts this by injecting deterministic hashes, in the form of binary directories,
+into file paths, to distribute files across a larger number of object store prefixes.
+
+Partitions are included in file paths just before the file name, in a similar manner to the [SimpleLocationProvider](configuration.md#simplelocationprovider).
+A table partitioned over a string column `category` might have a data file with location: (note the additional binary directories)
+
+```txt
+s3://my-bucket/my_table/data/0101/0110/1001/10110010/category=orders/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
+```
+
+The `write.object-storage.enabled` table property determines whether the ObjectStoreLocationProvider is enabled for a
+table. It is used by default.
+
+When the ObjectStoreLocationProvider is used, the table property `write.object-storage.partitioned-paths`, which
+defaults to `true`, can be set to `false` as an additional optimisation. This omits partition keys and values from data
+file paths *entirely* to further reduce key size. With it disabled, the same data file above would instead be written
+to: (note the absence of `category=orders`)
+
+```txt
+s3://my-bucket/my_table/data/1101/0100/1011/00111010-00000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
+```
+
+### Loading a Custom LocationProvider
+
+***TODO***. Maybe link to code reference for LocationProvider?
+
 ## Catalogs
 
 PyIceberg currently has native catalog type support for REST, SQL, Hive, Glue and DynamoDB.
diff --git a/pyiceberg/table/locations.py b/pyiceberg/table/locations.py
index 046ee32527..53b41d1e61 100644
--- a/pyiceberg/table/locations.py
+++ b/pyiceberg/table/locations.py
@@ -30,7 +30,12 @@
 
 
 class LocationProvider(ABC):
-    """A base class for location providers, that provide data file locations for write tasks."""
+    """A base class for location providers, that provide data file locations for a table's write tasks.
+
+    Args:
+        table_location (str): The table's base storage location.
+        table_properties (Properties): The table's properties.
+    """
 
     table_location: str
     table_properties: Properties

From 3b9457010d9e2df68af6e87af5213b9c4fe46d09 Mon Sep 17 00:00:00 2001
From: Sreesh Maheshwar <smaheshwar@palantir.com>
Date: Sat, 18 Jan 2025 16:13:40 +0000
Subject: [PATCH 2/6] Finish docs

---
 mkdocs/docs/configuration.md | 61 +++++++++++++++++++++++++-----------
 1 file changed, 42 insertions(+), 19 deletions(-)

diff --git a/mkdocs/docs/configuration.md b/mkdocs/docs/configuration.md
index 46a27f0177..13d4cd914a 100644
--- a/mkdocs/docs/configuration.md
+++ b/mkdocs/docs/configuration.md
@@ -54,17 +54,18 @@ Iceberg tables support table properties to configure table behavior.
 
 ### Write options
 
-***TODO:*** Add LocationProvider-related properties here.
-
-| Key                                    | Options                           | Default | Description                                                                                 |
-| -------------------------------------- | --------------------------------- | ------- | ------------------------------------------------------------------------------------------- |
-| `write.parquet.compression-codec`      | `{uncompressed,zstd,gzip,snappy}` | zstd    | Sets the Parquet compression coddec.                                                        |
-| `write.parquet.compression-level`      | Integer                           | null    | Parquet compression level for the codec. If not set, it is up to PyIceberg                  |
-| `write.parquet.row-group-limit`        | Number of rows                    | 1048576 | The upper bound of the number of entries within a single row group                          |
-| `write.parquet.page-size-bytes`        | Size in bytes                     | 1MB     | Set a target threshold for the approximate encoded size of data pages within a column chunk |
-| `write.parquet.page-row-limit`         | Number of rows                    | 20000   | Set a target threshold for the maximum number of rows within a column chunk                 |
-| `write.parquet.dict-size-bytes`        | Size in bytes                     | 2MB     | Set the dictionary page size limit per row group                                            |
-| `write.metadata.previous-versions-max` | Integer                           | 100     | The max number of previous version metadata files to keep before deleting after commit.     |
+| Key                                      | Options                           | Default | Description                                                                                                                         |
+|------------------------------------------|-----------------------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------|
+| `write.parquet.compression-codec`        | `{uncompressed,zstd,gzip,snappy}` | zstd    | Sets the Parquet compression coddec.                                                                                                |
+| `write.parquet.compression-level`        | Integer                           | null    | Parquet compression level for the codec. If not set, it is up to PyIceberg                                                          |
+| `write.parquet.row-group-limit`          | Number of rows                    | 1048576 | The upper bound of the number of entries within a single row group                                                                  |
+| `write.parquet.page-size-bytes`          | Size in bytes                     | 1MB     | Set a target threshold for the approximate encoded size of data pages within a column chunk                                         |
+| `write.parquet.page-row-limit`           | Number of rows                    | 20000   | Set a target threshold for the maximum number of rows within a column chunk                                                         |
+| `write.parquet.dict-size-bytes`          | Size in bytes                     | 2MB     | Set the dictionary page size limit per row group                                                                                    |
+| `write.metadata.previous-versions-max`   | Integer                           | 100     | The max number of previous version metadata files to keep before deleting after commit.                                             |
+| `write.object-storage.enabled`           | Boolean                           | True    | Enables the [ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider) that adds a hash component to file paths    |
+| `write.object-storage.partitioned-paths` | Boolean                           | True    | Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled |
+| `write.py-location-provider.impl`        | String of form `module.ClassName` | null    | Optional, [custom LocationProvider](configuration.md#loading-a-custom-locationprovider) implementation                              |
 
 ### Table behavior options
 
@@ -210,7 +211,7 @@ The SimpleLocationProvider places file names underneath a `data` directory in th
 a non-partitioned table might have a data file with location:
 
 ```txt
-s3://my-bucket/my_table/data/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
+s3://bucket/ns/table/data/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
 ```
 
 When data is partitioned, the files under a given partition are grouped into a subdirectory, with that partition key
@@ -218,7 +219,7 @@ and value as the directory name. For example, a table partitioned over a string
 with location:
 
 ```txt
-s3://my-bucket/my_table/data/category=orders/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
+s3://bucket/ns/table/data/category=orders/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
 ```
 
 The SimpleLocationProvider is enabled for a table by explicitly setting its `write.object-storage.enabled` table property to `false`.
@@ -231,28 +232,50 @@ resulting in slowdowns.
 The ObjectStoreLocationProvider counteracts this by injecting deterministic hashes, in the form of binary directories,
 into file paths, to distribute files across a larger number of object store prefixes.
 
-Partitions are included in file paths just before the file name, in a similar manner to the [SimpleLocationProvider](configuration.md#simplelocationprovider).
-A table partitioned over a string column `category` might have a data file with location: (note the additional binary directories)
+Paths contain partitions just before the file name, and a `data` directory beneath the table's location, in a similar
+manner to the [SimpleLocationProvider](configuration.md#simplelocationprovider). For example, a table partitioned over a string
+column `category` might have a data file with location: (note the additional binary directories)
 
 ```txt
-s3://my-bucket/my_table/data/0101/0110/1001/10110010/category=orders/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
+s3://bucket/ns/table/data/0101/0110/1001/10110010/category=orders/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
 ```
 
 The `write.object-storage.enabled` table property determines whether the ObjectStoreLocationProvider is enabled for a
 table. It is used by default.
 
+#### Partition Exclusion
+
 When the ObjectStoreLocationProvider is used, the table property `write.object-storage.partitioned-paths`, which
-defaults to `true`, can be set to `false` as an additional optimisation. This omits partition keys and values from data
+defaults to `true`, can be set to `false` as an additional optimisation for object stores. This omits partition keys and values from data
 file paths *entirely* to further reduce key size. With it disabled, the same data file above would instead be written
 to: (note the absence of `category=orders`)
 
 ```txt
-s3://my-bucket/my_table/data/1101/0100/1011/00111010-00000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
+s3://bucket/ns/table/data/1101/0100/1011/00111010-00000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
 ```
 
 ### Loading a Custom LocationProvider
 
-***TODO***. Maybe link to code reference for LocationProvider?
+Similar to FileIO, a custom LocationProvider may be provided for a table by concretely subclassing the abstract base
+class [LocationProvider](../reference/pyiceberg/table/locations/#pyiceberg.table.locations.LocationProvider). The
+table property `write.py-location-provider.impl` should be set to the fully-qualified name of the custom
+LocationProvider (i.e. `module.CustomLocationProvider`). Recall that a LocationProvider is configured per-table,
+permitting different location provision for different tables.
+
+An example, custom `LocationProvider` implementation is shown below.
+
+```py
+import uuid
+
+class UUIDLocationProvider(LocationProvider):
+    def __init__(self, table_location: str, table_properties: Properties):
+        super().__init__(table_location, table_properties)
+
+    def new_data_location(self, data_file_name: str, partition_key: Optional[PartitionKey] = None) -> str:
+        # Can use any custom method to generate a file path given the partitioning information and file name
+        prefix = f"{self.table_location}/{uuid.uuid4()}"
+        return f"{prefix}/{partition_key.to_path()}/{data_file_name}" if partition_key else f"{prefix}/{data_file_name}"
+```
 
 ## Catalogs
 

From 3ee2695ba6ab3ec7cb75d1dce269114a4fb2e82e Mon Sep 17 00:00:00 2001
From: Sreesh Maheshwar <smaheshwar@palantir.com>
Date: Sun, 19 Jan 2025 11:13:36 +0000
Subject: [PATCH 3/6] Minor spelling fixes

---
 mkdocs/docs/configuration.md | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/mkdocs/docs/configuration.md b/mkdocs/docs/configuration.md
index 13d4cd914a..2e766f9882 100644
--- a/mkdocs/docs/configuration.md
+++ b/mkdocs/docs/configuration.md
@@ -200,10 +200,10 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya
 
 ## Location Providers
 
-Iceberg works with the concept of a LocationProvider that determines the file paths for a table's data. PyIceberg
+Iceberg works with the concept of a LocationProvider that determines file paths for a table's data. PyIceberg
 introduces a pluggable LocationProvider module; the LocationProvider used may be specified on a per-table basis via
 table properties. PyIceberg defaults to the [ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider),
-which generates file paths that are optimised for object storage.
+which generates file paths that are optimized for object storage.
 
 ### SimpleLocationProvider
 
@@ -214,7 +214,7 @@ a non-partitioned table might have a data file with location:
 s3://bucket/ns/table/data/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
 ```
 
-When data is partitioned, the files under a given partition are grouped into a subdirectory, with that partition key
+When data is partitioned, files under a given partition are grouped into a subdirectory, with that partition key
 and value as the directory name. For example, a table partitioned over a string column `category` might have a data file
 with location:
 
@@ -222,17 +222,18 @@ with location:
 s3://bucket/ns/table/data/category=orders/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
 ```
 
-The SimpleLocationProvider is enabled for a table by explicitly setting its `write.object-storage.enabled` table property to `false`.
+The SimpleLocationProvider is enabled for a table by explicitly setting its `write.object-storage.enabled` table
+property to `False`.
 
 ### ObjectStoreLocationProvider
 
-When several files are stored under the same prefix, cloud object stores such as S3 often [throttling requests on prefixes](https://repost.aws/knowledge-center/http-5xx-errors-s3),
+When several files are stored under the same prefix, cloud object stores such as S3 often [throttle requests on prefixes](https://repost.aws/knowledge-center/http-5xx-errors-s3),
 resulting in slowdowns.
 
 The ObjectStoreLocationProvider counteracts this by injecting deterministic hashes, in the form of binary directories,
 into file paths, to distribute files across a larger number of object store prefixes.
 
-Paths contain partitions just before the file name, and a `data` directory beneath the table's location, in a similar
+Paths contain partitions just before the file name and a `data` directory beneath the table's location, in a similar
 manner to the [SimpleLocationProvider](configuration.md#simplelocationprovider). For example, a table partitioned over a string
 column `category` might have a data file with location: (note the additional binary directories)
 
@@ -246,9 +247,9 @@ table. It is used by default.
 #### Partition Exclusion
 
 When the ObjectStoreLocationProvider is used, the table property `write.object-storage.partitioned-paths`, which
-defaults to `true`, can be set to `false` as an additional optimisation for object stores. This omits partition keys and values from data
-file paths *entirely* to further reduce key size. With it disabled, the same data file above would instead be written
-to: (note the absence of `category=orders`)
+defaults to `True`, can be set to `False` as an additional optimization for object stores. This omits partition keys and
+values from data file paths *entirely* to further reduce key size. With it disabled, the same data file above would
+instead be written to: (note the absence of `category=orders`)
 
 ```txt
 s3://bucket/ns/table/data/1101/0100/1011/00111010-00000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet

From 6be752a2d28be51f451246cd45462403355181d6 Mon Sep 17 00:00:00 2001
From: Sreesh Maheshwar <smaheshwar@palantir.com>
Date: Sun, 19 Jan 2025 22:09:29 +0000
Subject: [PATCH 4/6] Address some comments

---
 mkdocs/docs/configuration.md | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/mkdocs/docs/configuration.md b/mkdocs/docs/configuration.md
index 2e766f9882..5911cef2be 100644
--- a/mkdocs/docs/configuration.md
+++ b/mkdocs/docs/configuration.md
@@ -54,18 +54,18 @@ Iceberg tables support table properties to configure table behavior.
 
 ### Write options
 
-| Key                                      | Options                           | Default | Description                                                                                                                         |
-|------------------------------------------|-----------------------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------|
-| `write.parquet.compression-codec`        | `{uncompressed,zstd,gzip,snappy}` | zstd    | Sets the Parquet compression coddec.                                                                                                |
-| `write.parquet.compression-level`        | Integer                           | null    | Parquet compression level for the codec. If not set, it is up to PyIceberg                                                          |
-| `write.parquet.row-group-limit`          | Number of rows                    | 1048576 | The upper bound of the number of entries within a single row group                                                                  |
-| `write.parquet.page-size-bytes`          | Size in bytes                     | 1MB     | Set a target threshold for the approximate encoded size of data pages within a column chunk                                         |
-| `write.parquet.page-row-limit`           | Number of rows                    | 20000   | Set a target threshold for the maximum number of rows within a column chunk                                                         |
-| `write.parquet.dict-size-bytes`          | Size in bytes                     | 2MB     | Set the dictionary page size limit per row group                                                                                    |
-| `write.metadata.previous-versions-max`   | Integer                           | 100     | The max number of previous version metadata files to keep before deleting after commit.                                             |
-| `write.object-storage.enabled`           | Boolean                           | True    | Enables the [ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider) that adds a hash component to file paths    |
-| `write.object-storage.partitioned-paths` | Boolean                           | True    | Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled |
-| `write.py-location-provider.impl`        | String of form `module.ClassName` | null    | Optional, [custom LocationProvider](configuration.md#loading-a-custom-locationprovider) implementation                              |
+| Key                                      | Options                                    | Default | Description                                                                                                                                                                                              |
+|------------------------------------------|--------------------------------------------|---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `write.parquet.compression-codec`        | `{uncompressed,zstd,gzip,snappy}`          | zstd    | Sets the Parquet compression coddec.                                                                                                                                                                     |
+| `write.parquet.compression-level`        | Integer                                    | null    | Parquet compression level for the codec. If not set, it is up to PyIceberg                                                                                                                               |
+| `write.parquet.row-group-limit`          | Number of rows                             | 1048576 | The upper bound of the number of entries within a single row group                                                                                                                                       |
+| `write.parquet.page-size-bytes`          | Size in bytes                              | 1MB     | Set a target threshold for the approximate encoded size of data pages within a column chunk                                                                                                              |
+| `write.parquet.page-row-limit`           | Number of rows                             | 20000   | Set a target threshold for the maximum number of rows within a column chunk                                                                                                                              |
+| `write.parquet.dict-size-bytes`          | Size in bytes                              | 2MB     | Set the dictionary page size limit per row group                                                                                                                                                         |
+| `write.metadata.previous-versions-max`   | Integer                                    | 100     | The max number of previous version metadata files to keep before deleting after commit.                                                                                                                  |
+| `write.object-storage.enabled`           | Boolean                                    | True    | Enables the [ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider) that adds a hash component to file paths. Note: the default value of `True` differs from the Java implementation |
+| `write.object-storage.partitioned-paths` | Boolean                                    | True    | Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled                                                                      |
+| `write.py-location-provider.impl`        | String, e.g. `mymodule.myLocationProvider` | null    | Optional, [custom LocationProvider](configuration.md#loading-a-custom-locationprovider) implementation                                                                                                   |
 
 ### Table behavior options
 

From 76f397b35abaa1555ede59ad5c5a4fce8c5f1374 Mon Sep 17 00:00:00 2001
From: Sreesh Maheshwar <smaheshwar@palantir.com>
Date: Sun, 19 Jan 2025 22:37:28 +0000
Subject: [PATCH 5/6] Address all comments

---
 mkdocs/docs/configuration.md | 81 ++++++++++++++++++++----------------
 1 file changed, 44 insertions(+), 37 deletions(-)

diff --git a/mkdocs/docs/configuration.md b/mkdocs/docs/configuration.md
index 5911cef2be..cd6e4a2146 100644
--- a/mkdocs/docs/configuration.md
+++ b/mkdocs/docs/configuration.md
@@ -54,18 +54,18 @@ Iceberg tables support table properties to configure table behavior.
 
 ### Write options
 
-| Key                                      | Options                                    | Default | Description                                                                                                                                                                                              |
-|------------------------------------------|--------------------------------------------|---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `write.parquet.compression-codec`        | `{uncompressed,zstd,gzip,snappy}`          | zstd    | Sets the Parquet compression coddec.                                                                                                                                                                     |
-| `write.parquet.compression-level`        | Integer                                    | null    | Parquet compression level for the codec. If not set, it is up to PyIceberg                                                                                                                               |
-| `write.parquet.row-group-limit`          | Number of rows                             | 1048576 | The upper bound of the number of entries within a single row group                                                                                                                                       |
-| `write.parquet.page-size-bytes`          | Size in bytes                              | 1MB     | Set a target threshold for the approximate encoded size of data pages within a column chunk                                                                                                              |
-| `write.parquet.page-row-limit`           | Number of rows                             | 20000   | Set a target threshold for the maximum number of rows within a column chunk                                                                                                                              |
-| `write.parquet.dict-size-bytes`          | Size in bytes                              | 2MB     | Set the dictionary page size limit per row group                                                                                                                                                         |
-| `write.metadata.previous-versions-max`   | Integer                                    | 100     | The max number of previous version metadata files to keep before deleting after commit.                                                                                                                  |
-| `write.object-storage.enabled`           | Boolean                                    | True    | Enables the [ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider) that adds a hash component to file paths. Note: the default value of `True` differs from the Java implementation |
-| `write.object-storage.partitioned-paths` | Boolean                                    | True    | Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled                                                                      |
-| `write.py-location-provider.impl`        | String, e.g. `mymodule.myLocationProvider` | null    | Optional, [custom LocationProvider](configuration.md#loading-a-custom-locationprovider) implementation                                                                                                   |
+| Key                                      | Options                           | Default | Description                                                                                                                                                                                                      |
+|------------------------------------------|-----------------------------------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `write.parquet.compression-codec`        | `{uncompressed,zstd,gzip,snappy}` | zstd    | Sets the Parquet compression coddec.                                                                                                                                                                             |
+| `write.parquet.compression-level`        | Integer                           | null    | Parquet compression level for the codec. If not set, it is up to PyIceberg                                                                                                                                       |
+| `write.parquet.row-group-limit`          | Number of rows                    | 1048576 | The upper bound of the number of entries within a single row group                                                                                                                                               |
+| `write.parquet.page-size-bytes`          | Size in bytes                     | 1MB     | Set a target threshold for the approximate encoded size of data pages within a column chunk                                                                                                                      |
+| `write.parquet.page-row-limit`           | Number of rows                    | 20000   | Set a target threshold for the maximum number of rows within a column chunk                                                                                                                                      |
+| `write.parquet.dict-size-bytes`          | Size in bytes                     | 2MB     | Set the dictionary page size limit per row group                                                                                                                                                                 |
+| `write.metadata.previous-versions-max`   | Integer                           | 100     | The max number of previous version metadata files to keep before deleting after commit.                                                                                                                          |
+| `write.object-storage.enabled`           | Boolean                           | True    | Enables the [`ObjectStoreLocationProvider`](configuration.md#objectstorelocationprovider) that adds a hash component to file paths. Note: the default value of `True` differs from Iceberg's Java implementation |
+| `write.object-storage.partitioned-paths` | Boolean                           | True    | Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled                                                                              |
+| `write.py-location-provider.impl`        | String of form `module.ClassName` | null    | Optional, [custom `LocationProvider`](configuration.md#loading-a-custom-locationprovider) implementation                                                                                                         |
 
 ### Table behavior options
 
@@ -200,53 +200,58 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya
 
 ## Location Providers
 
-Iceberg works with the concept of a LocationProvider that determines file paths for a table's data. PyIceberg
-introduces a pluggable LocationProvider module; the LocationProvider used may be specified on a per-table basis via
-table properties. PyIceberg defaults to the [ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider),
-which generates file paths that are optimized for object storage.
+Apache Iceberg uses the concept of a `LocationProvider` to manage file paths for a table's data. In PyIceberg, the
+`LocationProvider` module is designed to be pluggable, allowing customization for specific use cases. The
+`LocationProvider` for a table can be specified through table properties.
 
-### SimpleLocationProvider
+PyIceberg defaults to the [`ObjectStoreLocationProvider`](configuration.md#object-store-location-provider), which generates
+file paths that are optimized for object storage.
 
-The SimpleLocationProvider places file names underneath a `data` directory in the table's storage location. For example,
-a non-partitioned table might have a data file with location:
+### Simple Location Provider
+
+The `SimpleLocationProvider` places a table's file names underneath a `data` directory in the table's base storage
+location (this is `table.metadata.location` - see the [Iceberg table specification](https://iceberg.apache.org/spec/#table-metadata)).
+For example, a non-partitioned table might have a data file with location:
 
 ```txt
 s3://bucket/ns/table/data/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
 ```
 
-When data is partitioned, files under a given partition are grouped into a subdirectory, with that partition key
-and value as the directory name. For example, a table partitioned over a string column `category` might have a data file
-with location:
+When the table is partitioned, files under a given partition are grouped into a subdirectory, with that partition key
+and value as the directory name - this is known as the *Hive-style* partition path format. For example, a table
+partitioned over a string column `category` might have a data file with location:
 
 ```txt
 s3://bucket/ns/table/data/category=orders/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
 ```
 
-The SimpleLocationProvider is enabled for a table by explicitly setting its `write.object-storage.enabled` table
+The `SimpleLocationProvider` is enabled for a table by explicitly setting its `write.object-storage.enabled` table
 property to `False`.
 
-### ObjectStoreLocationProvider
+### Object Store Location Provider
 
-When several files are stored under the same prefix, cloud object stores such as S3 often [throttle requests on prefixes](https://repost.aws/knowledge-center/http-5xx-errors-s3),
-resulting in slowdowns.
+PyIceberg offers the `ObjectStoreLocationProvider`, and an optional [partition-exclusion](configuration.md#partition-exclusion)
+optimization, designed for tables stored in object storage. For additional context and motivation concerning these configurations,
+see their [documentation for Iceberg's Java implementation](https://iceberg.apache.org/docs/latest/aws/#object-store-file-layout).
 
-The ObjectStoreLocationProvider counteracts this by injecting deterministic hashes, in the form of binary directories,
+When several files are stored under the same prefix, cloud object stores such as S3 often [throttle requests on prefixes](https://repost.aws/knowledge-center/http-5xx-errors-s3),
+resulting in slowdowns. The `ObjectStoreLocationProvider` counteracts this by injecting deterministic hashes, in the form of binary directories,
 into file paths, to distribute files across a larger number of object store prefixes.
 
-Paths contain partitions just before the file name and a `data` directory beneath the table's location, in a similar
-manner to the [SimpleLocationProvider](configuration.md#simplelocationprovider). For example, a table partitioned over a string
-column `category` might have a data file with location: (note the additional binary directories)
+Paths still contain partitions just before the file name, in Hive-style, and a `data` directory beneath the table's location,
+in a similar manner to the [`SimpleLocationProvider`](configuration.md#simple-location-provider). For example, a table
+partitioned over a string column `category` might have a data file with location: (note the additional binary directories)
 
 ```txt
 s3://bucket/ns/table/data/0101/0110/1001/10110010/category=orders/0000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
 ```
 
-The `write.object-storage.enabled` table property determines whether the ObjectStoreLocationProvider is enabled for a
+The `write.object-storage.enabled` table property determines whether the `ObjectStoreLocationProvider` is enabled for a
 table. It is used by default.
 
 #### Partition Exclusion
 
-When the ObjectStoreLocationProvider is used, the table property `write.object-storage.partitioned-paths`, which
+When the `ObjectStoreLocationProvider` is used, the table property `write.object-storage.partitioned-paths`, which
 defaults to `True`, can be set to `False` as an additional optimization for object stores. This omits partition keys and
 values from data file paths *entirely* to further reduce key size. With it disabled, the same data file above would
 instead be written to: (note the absence of `category=orders`)
@@ -257,11 +262,13 @@ s3://bucket/ns/table/data/1101/0100/1011/00111010-00000-0-5affc076-96a4-48f2-9cd
 
 ### Loading a Custom LocationProvider
 
-Similar to FileIO, a custom LocationProvider may be provided for a table by concretely subclassing the abstract base
-class [LocationProvider](../reference/pyiceberg/table/locations/#pyiceberg.table.locations.LocationProvider). The
-table property `write.py-location-provider.impl` should be set to the fully-qualified name of the custom
-LocationProvider (i.e. `module.CustomLocationProvider`). Recall that a LocationProvider is configured per-table,
-permitting different location provision for different tables.
+Similar to FileIO, a custom `LocationProvider` may be provided for a table by concretely subclassing the abstract base
+class [`LocationProvider`](../reference/pyiceberg/table/locations/#pyiceberg.table.locations.LocationProvider).
+
+The table property `write.py-location-provider.impl` should be set to the fully-qualified name of the custom
+`LocationProvider` (i.e. `mymodule.MyLocationProvider`). Recall that a `LocationProvider` is configured per-table,
+permitting different location provision for different tables. Note also that Iceberg's Java implementation uses a
+different table property, `write.location-provider.impl`, for custom Java implementations.
 
 An example, custom `LocationProvider` implementation is shown below.
 

From 5ee6deca17d248cb5410c136dc03f0c3c889a227 Mon Sep 17 00:00:00 2001
From: Sreesh Maheshwar <smaheshwar@palantir.com>
Date: Sun, 19 Jan 2025 22:49:56 +0000
Subject: [PATCH 6/6] Fix all hyperlinks

---
 mkdocs/docs/configuration.md | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/mkdocs/docs/configuration.md b/mkdocs/docs/configuration.md
index cd6e4a2146..e076afdb93 100644
--- a/mkdocs/docs/configuration.md
+++ b/mkdocs/docs/configuration.md
@@ -54,18 +54,18 @@ Iceberg tables support table properties to configure table behavior.
 
 ### Write options
 
-| Key                                      | Options                           | Default | Description                                                                                                                                                                                                      |
-|------------------------------------------|-----------------------------------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `write.parquet.compression-codec`        | `{uncompressed,zstd,gzip,snappy}` | zstd    | Sets the Parquet compression coddec.                                                                                                                                                                             |
-| `write.parquet.compression-level`        | Integer                           | null    | Parquet compression level for the codec. If not set, it is up to PyIceberg                                                                                                                                       |
-| `write.parquet.row-group-limit`          | Number of rows                    | 1048576 | The upper bound of the number of entries within a single row group                                                                                                                                               |
-| `write.parquet.page-size-bytes`          | Size in bytes                     | 1MB     | Set a target threshold for the approximate encoded size of data pages within a column chunk                                                                                                                      |
-| `write.parquet.page-row-limit`           | Number of rows                    | 20000   | Set a target threshold for the maximum number of rows within a column chunk                                                                                                                                      |
-| `write.parquet.dict-size-bytes`          | Size in bytes                     | 2MB     | Set the dictionary page size limit per row group                                                                                                                                                                 |
-| `write.metadata.previous-versions-max`   | Integer                           | 100     | The max number of previous version metadata files to keep before deleting after commit.                                                                                                                          |
-| `write.object-storage.enabled`           | Boolean                           | True    | Enables the [`ObjectStoreLocationProvider`](configuration.md#objectstorelocationprovider) that adds a hash component to file paths. Note: the default value of `True` differs from Iceberg's Java implementation |
-| `write.object-storage.partitioned-paths` | Boolean                           | True    | Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled                                                                              |
-| `write.py-location-provider.impl`        | String of form `module.ClassName` | null    | Optional, [custom `LocationProvider`](configuration.md#loading-a-custom-locationprovider) implementation                                                                                                         |
+| Key                                      | Options                           | Default | Description                                                                                                                                                                                                         |
+|------------------------------------------|-----------------------------------|---------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `write.parquet.compression-codec`        | `{uncompressed,zstd,gzip,snappy}` | zstd    | Sets the Parquet compression coddec.                                                                                                                                                                                |
+| `write.parquet.compression-level`        | Integer                           | null    | Parquet compression level for the codec. If not set, it is up to PyIceberg                                                                                                                                          |
+| `write.parquet.row-group-limit`          | Number of rows                    | 1048576 | The upper bound of the number of entries within a single row group                                                                                                                                                  |
+| `write.parquet.page-size-bytes`          | Size in bytes                     | 1MB     | Set a target threshold for the approximate encoded size of data pages within a column chunk                                                                                                                         |
+| `write.parquet.page-row-limit`           | Number of rows                    | 20000   | Set a target threshold for the maximum number of rows within a column chunk                                                                                                                                         |
+| `write.parquet.dict-size-bytes`          | Size in bytes                     | 2MB     | Set the dictionary page size limit per row group                                                                                                                                                                    |
+| `write.metadata.previous-versions-max`   | Integer                           | 100     | The max number of previous version metadata files to keep before deleting after commit.                                                                                                                             |
+| `write.object-storage.enabled`           | Boolean                           | True    | Enables the [`ObjectStoreLocationProvider`](configuration.md#object-store-location-provider) that adds a hash component to file paths. Note: the default value of `True` differs from Iceberg's Java implementation |
+| `write.object-storage.partitioned-paths` | Boolean                           | True    | Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled                                                                                 |
+| `write.py-location-provider.impl`        | String of form `module.ClassName` | null    | Optional, [custom `LocationProvider`](configuration.md#loading-a-custom-location-provider) implementation                                                                                                           |
 
 ### Table behavior options
 
@@ -260,7 +260,7 @@ instead be written to: (note the absence of `category=orders`)
 s3://bucket/ns/table/data/1101/0100/1011/00111010-00000-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-00001.parquet
 ```
 
-### Loading a Custom LocationProvider
+### Loading a Custom Location Provider
 
 Similar to FileIO, a custom `LocationProvider` may be provided for a table by concretely subclassing the abstract base
 class [`LocationProvider`](../reference/pyiceberg/table/locations/#pyiceberg.table.locations.LocationProvider).