Internal Error When Querying Empty Tables With List Columns #65

dor-bernstein · 2024-08-28T11:24:56Z

Hey,
I have the following empty iceberg table:

table_name (
1: col1: optional string,
2: col2: optional timestamptz,
3: col3: optional string,
4: col4: optional timestamptz,
5: col5: optional timestamptz,
6: col6: optional list,
7: col7: optional timestamptz,
8: col8: optional list,
9: col9: optional list,
10: col10: optional list,
11: col11: optional string,
12: col12: optional string,
13: col13: optional string,
14: col14: optional list,
15: col15: optional string,
16: col16: optional list,
17: col17: optional list,
18: col18: optional list,
19: col19: optional list,
20: col20: optional list,
21: col21: optional list,
22: col22: optional list,
23: col23: optional boolean,
24: col24: optional list,
25: col25: optional list,
26: col26: optional list,
27: col27: optional timestamptz,
28: col28: optional timestamptz,
29: col29: optional timestamptz
),
partition by: [],
sort order: [],
snapshot: Operation.APPEND: id=3172179944265825688, schema_id=0

When using PyIceberg to scan it into a duckdb conn, it works and returns empty df like expected.
However, when trying to query it directly with duckdb like this:

duckdb.sql(f"""
INSTALL iceberg; 
LOAD iceberg; 
SELECT * FROM iceberg_scan('{table.metadata_location}', skip_schema_inference=True)""").fetchdf()

I get the following error:
InternalException: INTERNAL Error: Value::LIST without providing a child-type requires a non-empty list of values. Use Value::LIST(child_type, list) instead.
This error signals an assertion failure within DuckDB. This usually occurs due to unexpected conditions or errors in the program's logic.
For more information, see https://duckdb.org/docs/dev/internal_errors

If I try to use the schema_inference I get this error:
IOException: IO Error: Invalid field found while parsing field: type

Thanks!

rc-GeorgeAllen · 2024-12-18T23:45:23Z

D select * from iceberg_scan("s3://a-test-bucket/test_db/metadata/00024-364b99cb-1888-46fa-adb8-5d59db1029b1.metadata.json");
IO Error: Invalid field found while parsing field: type
D select * from iceberg_scan("s3://a-test-bucket/test_db/metadata/00024-364b99cb-1888-46fa-adb8-5d59db1029b1.metadata.json", skip_schema_inference=True);
IO Error: Failed to read file "s3://a-test-bucket/test_db/data/ingest_ts_day=2023-06-21/00000-1495-abf583d7-67bb-4a8a-af8a-f061a00b5963-00009.parquet": schema mismatch in glob: column "source_file_identifier" was read from the original file "s3://a-test-bucket/test_db/data/ingest_ts_day=2024-06-26/00000-57-e3eb283b-c2d2-487a-b9f7-989be9b00edb-00001.parquet", but could not be found in file "s3://a-test-bucket/test_db/data/ingest_ts_day=2023-06-21/00000-1495-abf583d7-67bb-4a8a-af8a-f061a00b5963-00009.parquet".
Candidate names: .... snip(list of column names) ....
If you are trying to read files with different schemas, try setting union_by_name=True
D select * from iceberg_scan("s3://a-test-bucket/test_db/metadata/00024-364b99cb-1888-46fa-adb8-5d59db1029b1.metadata.json", skip_schema_inference=True,  union_by_name=True);
Binder Error: Invalid named parameter "union_by_name" for function iceberg_scan
Candidates:
    version_name_format VARCHAR
    version VARCHAR
    mode VARCHAR
    metadata_compression_codec VARCHAR
    allow_moved_paths BOOLEAN
    skip_schema_inference BOOLEAN

(schema has nested structs/arrays)

FROM duckdb_extensions() select extension_name, extension_version;
│ iceberg          │ d62d91d           │

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal Error When Querying Empty Tables With List Columns #65

Internal Error When Querying Empty Tables With List Columns #65

dor-bernstein commented Aug 28, 2024

rc-GeorgeAllen commented Dec 18, 2024

Internal Error When Querying Empty Tables With List Columns #65

Internal Error When Querying Empty Tables With List Columns #65

Comments

dor-bernstein commented Aug 28, 2024

rc-GeorgeAllen commented Dec 18, 2024