Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rustc-hash to the bench? #85

Open
cksac opened this issue Jun 4, 2024 · 3 comments · May be fixed by #87
Open

Add rustc-hash to the bench? #85

cksac opened this issue Jun 4, 2024 · 3 comments · May be fixed by #87
Assignees

Comments

@cksac
Copy link

cksac commented Jun 4, 2024

https://crates.io/crates/rustc-hash

@cksac cksac changed the title Add ustc-hash to the bench? Add rustc-hash to the bench? Jun 4, 2024
@ogxd ogxd added the bench 📈 label Jun 5, 2024
@ogxd
Copy link
Owner

ogxd commented Jun 7, 2024

One of the reasons why such hash is not included in the benchmarks is because of its very poor quality (very bad avalanche and distribution properties). Like FNV-1a (it's a pretty similar algorithm) it does not pass most of the tests of quality benches, such as smhasher for instance.

That said, FNV-1a has been included in the benchs because of how common it is, so maybe we can do the same with FxHash.

I ran a bench on my MacBook pro and here are the results:

gxhash
  | 4 > 6311.01
  | 8 > 12602.92
  | 16 > 25271.20
  | 32 > 28965.08
  | 64 > 40690.10
  | 128 > 39480.40
  | 256 > 52541.93
  | 512 > 65411.56
  | 1024 > 74398.77
  | 2048 > 78583.25
  | 4096 > 82673.24
  | 8192 > 84440.44
  | 16384 > 82664.38
  | 32768 > 84542.57
  
fxhash
  | 4 > 11444.09
  | 8 > 8084.35
  | 16 > 13245.95
  | 32 > 16276.04
  | 64 > 18167.73
  | 128 > 16188.47
  | 256 > 12849.64
  | 512 > 9234.71
  | 1024 > 7388.30
  | 2048 > 5941.86
  | 4096 > 5375.71
  | 8192 > 5127.14
  | 16384 > 5004.31
  | 32768 > 4942.49
  
fnv-1a (for reference)
  | 4 > 2048.04
  | 8 > 2455.38
  | 16 > 2727.91
  | 32 > 2083.73
  | 64 > 1461.80
  | 128 > 1147.11
  | 256 > 922.75
  | 512 > 839.01
  | 1024 > 799.80
  | 2048 > 779.98
  | 4096 > 772.48
  | 8192 > 763.00
  | 16384 > 760.54
  | 32768 > 765.00

fxhash seems very good for 4 bytes inputs. But beware, its quality is very low and may collide a lot, so I am not sure it is a good default choice for any HashMap. For larger inputs performance quickly degrades.

In this repository there is a bench to test quality over a few criteria, here is the output for gxhash / fxhash:

> /gxhash % cargo bench --bench quality

Bench GxHash
  ✅ avalanche::<B,4>()
  ✅ avalanche::<B,10>()
  ✅ avalanche::<B,32>()
  ✅ avalanche::<B,128>()
  ✅ avalanche::<B,512>()
  ✅ distribution_values::<B,4>(128*128)
  ✅ distribution_values::<B,16>(128*128)
  ✅ distribution_values::<B,128>(128*128)
  ✅ distribution_values::<B,512>(128*128)
  ✅ distribution_bits::<B,4>()
  ✅ distribution_bits::<B,16>()
  ✅ distribution_bits::<B,128>()
  ✅ distribution_bits::<B,512>()
  ✅ collisions_padded_zeroes::<B>(128*128)
  ✅ collisions_flipped_bits::<B,2>(9)
  ✅ collisions_flipped_bits::<B,3>(9)
  ✅ collisions_flipped_bits::<B,4>(7)
  ✅ collisions_flipped_bits::<B,5>(6)
  ✅ collisions_flipped_bits::<B,6>(5)
  ✅ collisions_flipped_bits::<B,7>(5)
  ✅ collisions_flipped_bits::<B,9>(4)
  ✅ collisions_flipped_bits::<B,20>(4)
  ✅ collisions_flipped_bits::<B,32>(3)
  ✅ collisions_flipped_bits::<B,64>(3)
  ✅ collisions_flipped_bits::<B,256>(2)
  ✅ collisions_permute::<B,u8>(4,&Vec::from_iter(0..16))
  ✅ collisions_permute::<B,u8>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u16>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u32>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u64>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u128>(4,&Vec::from_iter(0..16))
  ✅ collisions_permute::<B,u128>(42,&Vec::from_iter(0..64))
  ✅ collisions_powerset_bytes::<B>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ collisions_powerset_bytes::<B>(&[0,1,2,4,8,16,32,64,128])
  ✅ hasher_collisions_permute::<B,u8>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ hasher_collisions_permute::<B,u32>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ hasher_collisions_permute::<B,u32>(&[0,1,2,4,8,16,32,64,128,256])
  ✅ hasher_collisions_powerset::<B,u32>(&[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])
  ✅ hasher_collisions_powerset::<B,u32>(&[0,1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384])
  
Bench FxHasher (rustc_hash)
  ❌ avalanche::<B,4>()
     | Score: 0.19. Expected is 0.
  ❌ avalanche::<B,10>()
     | Score: 0.01. Expected is 0.
  ❌ avalanche::<B,32>()
     | Score: 0.11. Expected is 0.
  ❌ avalanche::<B,128>()
     | Score: 0.03. Expected is 0.
  ❌ avalanche::<B,512>()
     | Score: 0.01. Expected is 0.
  ✅ distribution_values::<B,4>(128*128)
  ✅ distribution_values::<B,16>(128*128)
  ✅ distribution_values::<B,128>(128*128)
  ✅ distribution_values::<B,512>(128*128)
  ✅ distribution_bits::<B,4>()
  ✅ distribution_bits::<B,16>()
  ✅ distribution_bits::<B,128>()
  ✅ distribution_bits::<B,512>()
  ❌ collisions_padded_zeroes::<B>(128*128)
     | Score: 0.99993896484375. Expected is 0.
  ✅ collisions_flipped_bits::<B,2>(9)
  ✅ collisions_flipped_bits::<B,3>(9)
  ✅ collisions_flipped_bits::<B,4>(7)
  ✅ collisions_flipped_bits::<B,5>(6)
  ✅ collisions_flipped_bits::<B,6>(5)
  ✅ collisions_flipped_bits::<B,7>(5)
  ❌ collisions_flipped_bits::<B,9>(4)
     | Score: 0.11120755156228948. Expected is 0.
  ❌ collisions_flipped_bits::<B,20>(4)
     | Score: 0.09822390132156604. Expected is 0.
  ❌ collisions_flipped_bits::<B,32>(3)
     | Score: 0.06613427110477443. Expected is 0.
  ❌ collisions_flipped_bits::<B,64>(3)
     | Score: 0.0719024799632759. Expected is 0.
  ❌ collisions_flipped_bits::<B,256>(2)
     | Score: 0.0523635517880522. Expected is 0.
  ✅ collisions_permute::<B,u8>(4,&Vec::from_iter(0..16))
  ✅ collisions_permute::<B,u8>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u16>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u32>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u64>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u128>(4,&Vec::from_iter(0..16))
  ✅ collisions_permute::<B,u128>(42,&Vec::from_iter(0..64))
  ❌ collisions_powerset_bytes::<B>(&[0,1,2,3,4,5,6,7,8,9])
     | Score: 0.0009765625. Expected is 0.
  ❌ collisions_powerset_bytes::<B>(&[0,1,2,4,8,16,32,64,128])
     | Score: 0.001953125. Expected is 0.
  ✅ hasher_collisions_permute::<B,u8>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ hasher_collisions_permute::<B,u32>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ hasher_collisions_permute::<B,u32>(&[0,1,2,4,8,16,32,64,128,256])
  ✅ hasher_collisions_powerset::<B,u32>(&[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])
  ✅ hasher_collisions_powerset::<B,u32>(&[0,1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384])

@ogxd ogxd self-assigned this Jun 7, 2024
@ogxd ogxd linked a pull request Jun 7, 2024 that will close this issue
@DaniPopes
Copy link

DaniPopes commented Jun 27, 2024

FYI rustc-hash version 2 was released which ships a different implementation (rust-lang/rustc-hash#37)

@ogxd
Copy link
Owner

ogxd commented Jun 27, 2024

The test suite is nowhere nearly a good as smhasher's one but this new version rustc_hash is much more robust and passes all quality tests in this repository 👍

Bench FxHasher (rustc_hash)
  ✅ avalanche::<B,4>()
  ✅ avalanche::<B,10>()
  ✅ avalanche::<B,32>()
  ✅ avalanche::<B,128>()
  ✅ avalanche::<B,512>()
  ✅ distribution_values::<B,4>(128*128)
  ✅ distribution_values::<B,16>(128*128)
  ✅ distribution_values::<B,128>(128*128)
  ✅ distribution_values::<B,512>(128*128)
  ✅ distribution_bits::<B,4>()
  ✅ distribution_bits::<B,16>()
  ✅ distribution_bits::<B,128>()
  ✅ distribution_bits::<B,512>()
  ✅ collisions_padded_zeroes::<B>(128*128)
  ✅ collisions_flipped_bits::<B,2>(9)
  ✅ collisions_flipped_bits::<B,3>(9)
  ✅ collisions_flipped_bits::<B,4>(7)
  ✅ collisions_flipped_bits::<B,5>(6)
  ✅ collisions_flipped_bits::<B,6>(5)
  ✅ collisions_flipped_bits::<B,7>(5)
  ✅ collisions_flipped_bits::<B,9>(4)
  ✅ collisions_flipped_bits::<B,20>(4)
  ✅ collisions_flipped_bits::<B,32>(3)
  ✅ collisions_flipped_bits::<B,64>(3)
  ✅ collisions_flipped_bits::<B,256>(2)
  ✅ collisions_permute::<B,u8>(4,&Vec::from_iter(0..16))
  ✅ collisions_permute::<B,u8>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u16>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u32>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u64>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u128>(4,&Vec::from_iter(0..16))
  ✅ collisions_permute::<B,u128>(42,&Vec::from_iter(0..64))
  ✅ collisions_powerset_bytes::<B>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ collisions_powerset_bytes::<B>(&[0,1,2,4,8,16,32,64,128])
  ✅ hasher_collisions_permute::<B,u8>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ hasher_collisions_permute::<B,u32>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ hasher_collisions_permute::<B,u32>(&[0,1,2,4,8,16,32,64,128,256])
  ✅ hasher_collisions_powerset::<B,u32>(&[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])
  ✅ hasher_collisions_powerset::<B,u32>(&[0,1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384])

Regarding performance, the finalization made it a little slower than it was in its previous version for tiny inputs but it performs much better for larger inputs. On my ARM laptop it performs as good as gxhash for up to 32 bytes inputs (which is already substantial!) with the added benefit of better portability given the current state of simd in rust.

fxhash
  | 4 > 6539.48
  | 8 > 13078.97
  | 16 > 26157.93
  | 32 > 25803.41
  | 64 > 27817.52
  | 128 > 28108.48
  | 256 > 26995.50
  | 512 > 25193.92
  | 1024 > 22180.96
  | 2048 > 19405.99
  | 4096 > 18234.17
  | 8192 > 17716.88
  | 16384 > 17462.95
  | 32768 > 17311.70

gxhash
  | 4 > 6311.01
  | 8 > 12602.92
  | 16 > 25271.20
  | 32 > 28965.08
  | 64 > 40690.10
  | 128 > 39480.40
  | 256 > 52541.93
  | 512 > 65411.56
  | 1024 > 74398.77
  | 2048 > 78583.25
  | 4096 > 82673.24
  | 8192 > 84440.44
  | 16384 > 82664.38
  | 32768 > 84542.57

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants