Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplified hash calculation for varchars in constant time #608

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

markonussbaum
Copy link
Contributor

A very simple, but fast hash implementation for nullable_varchar.

Instead of iterating over all characters in O(n), it just samples the first, last and median element of the varchar in O(1), resulting in a speedup in many cases.

Result with TPC-H at scale 10 on vh001:

Old implementation:
real 64m30.605s
user 0m15.766s
sys 0m2.302s

With this PR:
real 49m39.123s
user 0m14.275s
sys 0m2.093s

The test was done with

  • aggregate-on-ve
  • amplify-batches
  • --num-executors=1 --executor-cores=8

@markonussbaum markonussbaum requested review from treo and q10 July 1, 2022 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant