-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add support for Hyper Log Log PLus Plus(HLL++) #11638
base: branch-25.02
Are you sure you want to change the base?
Conversation
d42d80a
to
1945192
Compare
} | ||
} | ||
|
||
case class GpuHLL(childExpr: Expression, relativeSD: Double) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let' call by full name like GpuHyperLogLogPlusPlus
to better reflect the CPU version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ReductionAggregation.HLL(numRegistersPerSketch), DType.STRUCT) | ||
override lazy val groupByAggregate: GroupByAggregation = | ||
GroupByAggregation.HLL(numRegistersPerSketch) | ||
override val name: String = "CudfHLL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if "PlusPlus" is necessary.
override val name: String = "CudfHLL" | |
override val name: String = "CudfHyperLogLogPlusPlus" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
0a4939f
to
eb00c2b
Compare
Signed-off-by: Chong Gao <[email protected]>
Ready to review except test cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
expr[HyperLogLogPlusPlus]( | ||
"Aggregation approximate count distinct", | ||
ExprChecks.reductionAndGroupByAgg(TypeSig.LONG, TypeSig.LONG, | ||
Seq(ParamCheck("input", TypeSig.cpuAtomics, TypeSig.all))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Using cpuAtomics
for a GPU field gets to be kind of confusing. Could you please create a gpuAtomics
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update to support map, array and list because this is merged: NVIDIA/spark-rapids-jni#2575
Explain for HLLPP:
6 bits is enough to save a register value.
TODO: @revans2 could you have a look first? |
|
closes ##5199
depends on
Description
Spark
approx_count_distinct
description linkSpark accepts one column(can be nested column) and a double literal
relativeSD
.Depending on JNI PR:
NVIDIA/spark-rapids-jni#2522
TODO
Perf test
correctness
The results are identical between CPU and GPU.
Signed-off-by: Chong Gao [email protected]