Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Orc writes don't fully support Booleans with nulls #11736

Closed
tgravescs opened this issue Nov 19, 2024 · 3 comments · Fixed by #11763
Closed

[BUG] Orc writes don't fully support Booleans with nulls #11736

tgravescs opened this issue Nov 19, 2024 · 3 comments · Fixed by #11763
Assignees
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf

Comments

@tgravescs
Copy link
Collaborator

Describe the bug
Customer reported a job failing with the error:

Caused by: ai.rapids.cudf.CudfException: CUDF failure at: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni-release-39-cuda11/thirdparty/cudf/cpp/src/io/orc/[writer_impl.cu](http://writer_impl.cu/):940: There's currently a bug in encoding boolean columns. Suggested workaround is to convert to int8 type. Please see https://github.com/rapidsai/cudf/issues/6763 for more information.

as the error suggests it points to rapidsai/cudf#6763, which states that cudf doesn't support writing booleans with nulls that don't align on 8 bit boundaries. It requires writing at least 2 row groups where the first rows group has doesn't fully align with 8 bits and leaves unused bits.

@tgravescs tgravescs added ? - Needs Triage Need team to review and classify bug Something isn't working labels Nov 19, 2024
@tgravescs
Copy link
Collaborator Author

We either need to fix the cudf issue or fallback to the CPU if the plugin detects this.

@kuhushukla
Copy link
Collaborator

Also related on the tests side of things -- #11735

@kuhushukla kuhushukla self-assigned this Nov 19, 2024
@kuhushukla
Copy link
Collaborator

kuhushukla commented Nov 19, 2024

I will stopgap this with a fallback fix in a PR soon and open a follow on for actual support add back

@mattahrens mattahrens added cudf_dependency An issue or PR with this label depends on a new feature in cudf and removed ? - Needs Triage Need team to review and classify labels Nov 19, 2024
ustcfy added a commit to ustcfy/spark-rapids that referenced this issue Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants