You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I am using KafkaConnect with a SQL source and Iceberg sink to create and sync my iceberg tables in aws s3 with the tables of my MySQL database.
My desiderata are:
Sync the tables once a day instead of realtime (possibly setting iceberg.control.commit.interval-ms to 1 day in ms)
Store in the table data and metadata bucket only the most recent files and not all the commits
Partition the table in evenly sized parquet files (for instance when exporting via spark I was splitting the table in 20 almost equal partitions based on the id column)
For now I only have an idea on how to solve the first problem (not really sure about it) while I have zero idea on how to change my sink configuration file, in this example I was trying to sync a single table and after 2 hours I had already 80 small files both in the data and metadata folder, as each time it commits 3 more files:
Hello,
I am using KafkaConnect with a SQL source and Iceberg sink to create and sync my iceberg tables in aws s3 with the tables of my MySQL database.
My desiderata are:
iceberg.control.commit.interval-ms
to 1 day in ms)For now I only have an idea on how to solve the first problem (not really sure about it) while I have zero idea on how to change my sink configuration file, in this example I was trying to sync a single table and after 2 hours I had already 80 small files both in the data and metadata folder, as each time it commits 3 more files:
The text was updated successfully, but these errors were encountered: