Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal Error When Querying Empty Tables With List Columns #65

Open
dor-bernstein opened this issue Aug 28, 2024 · 1 comment
Open

Comments

@dor-bernstein
Copy link

Hey,
I have the following empty iceberg table:

table_name (
1: col1: optional string,
2: col2: optional timestamptz,
3: col3: optional string,
4: col4: optional timestamptz,
5: col5: optional timestamptz,
6: col6: optional list,
7: col7: optional timestamptz,
8: col8: optional list,
9: col9: optional list,
10: col10: optional list,
11: col11: optional string,
12: col12: optional string,
13: col13: optional string,
14: col14: optional list,
15: col15: optional string,
16: col16: optional list,
17: col17: optional list,
18: col18: optional list,
19: col19: optional list,
20: col20: optional list,
21: col21: optional list,
22: col22: optional list,
23: col23: optional boolean,
24: col24: optional list,
25: col25: optional list,
26: col26: optional list,
27: col27: optional timestamptz,
28: col28: optional timestamptz,
29: col29: optional timestamptz
),
partition by: [],
sort order: [],
snapshot: Operation.APPEND: id=3172179944265825688, schema_id=0

When using PyIceberg to scan it into a duckdb conn, it works and returns empty df like expected.
However, when trying to query it directly with duckdb like this:

duckdb.sql(f"""
INSTALL iceberg; 
LOAD iceberg; 
SELECT * FROM iceberg_scan('{table.metadata_location}', skip_schema_inference=True)""").fetchdf()

I get the following error:
InternalException: INTERNAL Error: Value::LIST without providing a child-type requires a non-empty list of values. Use Value::LIST(child_type, list) instead.
This error signals an assertion failure within DuckDB. This usually occurs due to unexpected conditions or errors in the program's logic.
For more information, see https://duckdb.org/docs/dev/internal_errors

If I try to use the schema_inference I get this error:
IOException: IO Error: Invalid field found while parsing field: type

Thanks!

@rc-GeorgeAllen
Copy link

D select * from iceberg_scan("s3://a-test-bucket/test_db/metadata/00024-364b99cb-1888-46fa-adb8-5d59db1029b1.metadata.json");
IO Error: Invalid field found while parsing field: type
D select * from iceberg_scan("s3://a-test-bucket/test_db/metadata/00024-364b99cb-1888-46fa-adb8-5d59db1029b1.metadata.json", skip_schema_inference=True);
IO Error: Failed to read file "s3://a-test-bucket/test_db/data/ingest_ts_day=2023-06-21/00000-1495-abf583d7-67bb-4a8a-af8a-f061a00b5963-00009.parquet": schema mismatch in glob: column "source_file_identifier" was read from the original file "s3://a-test-bucket/test_db/data/ingest_ts_day=2024-06-26/00000-57-e3eb283b-c2d2-487a-b9f7-989be9b00edb-00001.parquet", but could not be found in file "s3://a-test-bucket/test_db/data/ingest_ts_day=2023-06-21/00000-1495-abf583d7-67bb-4a8a-af8a-f061a00b5963-00009.parquet".
Candidate names: .... snip(list of column names) ....
If you are trying to read files with different schemas, try setting union_by_name=True
D select * from iceberg_scan("s3://a-test-bucket/test_db/metadata/00024-364b99cb-1888-46fa-adb8-5d59db1029b1.metadata.json", skip_schema_inference=True,  union_by_name=True);
Binder Error: Invalid named parameter "union_by_name" for function iceberg_scan
Candidates:
    version_name_format VARCHAR
    version VARCHAR
    mode VARCHAR
    metadata_compression_codec VARCHAR
    allow_moved_paths BOOLEAN
    skip_schema_inference BOOLEAN

(schema has nested structs/arrays)

FROM duckdb_extensions() select extension_name, extension_version;
│ iceberg          │ d62d91d           │

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants