-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid concatentating multiple host buffers when reading Parquet #11911
Conversation
Signed-off-by: Jason Lowe <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some nits
@@ -2123,7 +2123,7 @@ class MultiFileCloudOrcPartitionReader( | |||
} | |||
} catch { | |||
case e: Throwable => | |||
hostBuffers.foreach(_.hmb.safeClose()) | |||
hostBuffers.foreach(_.hmbs.foreach(_.safeClose())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not technically part of this PR, but couldn't safeClose
throw and then we don't see e
? We should probably also use the Seq variant of safeClose(e)
@@ -2737,7 +2713,7 @@ class MultiFileCloudParquetPartitionReader( | |||
} | |||
} catch { | |||
case e: Throwable => | |||
hostBuffers.foreach(_.hmb.safeClose()) | |||
hostBuffers.foreach(_.hmbs.safeClose()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, could we use safeClose(e)
?
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuParquetScan.scala
Outdated
Show resolved
Hide resolved
I was able to run this in one of our performance cluster. I ran the MULTITHREADED reader, and I compared with and without the patch. The benchmark is ~2% faster, though it doesn't detect a significant improvement there, and one query was 40% faster. q9 is on average 11% better, but it's deemed not a significant change.
|
I'll resolve conflicts. |
build |
build |
Seeing a segfault here all around avro. |
Fix issue in GpuAvroScan.readBatches
ok I reproed an issue locally, and fixed it here: 675fe5f Will re-run CI |
build |
Depends upon rapidsai/cudf#17673.
Updates the multithreaded Parquet reader to leverage the new multiple host buffers reader interface. This removes the need to concatenate multiple host buffers into a single buffer before decoding the data via the GPU. This also makes it easier to accept "late arrivals" in the multithreaded combine reader after waking up with the GPU semaphore, since we only need to fabricate a new footer to accommodate the additional buffers in the read.