Skip to content

Commit

Permalink
Update 2025-data-engineering-and-ai-trends.md
Browse files Browse the repository at this point in the history
  • Loading branch information
anna-geller committed Jan 24, 2025
1 parent 14d9512 commit 6dd6b7c
Showing 1 changed file with 3 additions and 5 deletions.
8 changes: 3 additions & 5 deletions content/blogs/2025-data-engineering-and-ai-trends.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,15 +72,13 @@ As more AI and data workloads [enter production](https://cloud.google.com/transf

## 6. Demand for Data Lakes and Open Table Formats

Cost optimization continues to drive renewed interest in data lakes, with teams combining **open table formats** like Apache Iceberg with object storage to balance governance and flexibility. The architecture often leverages Parquet files for columnar storage, while Iceberg’s (metadata layer](https://iceberg.apache.org/spec/)) adds critical features:
Cost optimization continues to drive renewed interest in data lakes, with teams combining **open table formats** like Apache Iceberg with object storage to balance governance and flexibility. The architecture often leverages Parquet files for columnar storage, while Iceberg’s [metadata layer](https://iceberg.apache.org/spec/) adds critical features:

- Row-level deletions for GDPR compliance
- Schema evolution to handle evolving data structures
- Schema evolution to handle changing data models
- RBAC integration through catalogs like [AWS Lake Formation](https://aws.amazon.com/blogs/big-data/interact-with-apache-iceberg-tables-using-amazon-athena-and-cross-account-fine-grained-permissions-using-aws-lake-formation/).

This setup allows teams to query data directly in object storage using [engines like DuckDB](https://kestra.io/blogs/2023-08-11-dataframes) (ad hoc analysis), Polars (complex transformations), or [chDB](https://kestra.io/blogs/embedded-databases) (lightweight aggregations). While data warehouses remain common for managing mission-critical data, the trend favors open **hybrid lakehouse architectures** – Iceberg-governed lakes handle raw data storage and governance, while warehouses manage curated data marts.

Notably, major platforms like Databricks and Snowflake now support Iceberg, reducing vendor lock-in risks as teams prioritize interoperability alongside cost control.
This setup allows teams to query data directly in object storage using engines like [DuckDB](https://kestra.io/blogs/2023-08-11-dataframes) (ad-hoc analysis), [chDB](https://kestra.io/blogs/embedded-databases) (lightweight aggregations), or Polars (complex transformations). While data warehouses remain common for managing mission-critical curated data marts, the trend favors open **hybrid lakehouse architectures** with Iceberg at the core. Notably, major platforms like Databricks and Snowflake now also support Iceberg, reducing vendor lock-in risks as teams prioritize interoperability alongside cost control.

---

Expand Down

0 comments on commit 6dd6b7c

Please sign in to comment.