Add rustc-hash to the bench? #85

cksac · 2024-06-04T16:35:02Z

ogxd · 2024-06-07T14:04:12Z

One of the reasons why such hash is not included in the benchmarks is because of its very poor quality (very bad avalanche and distribution properties). Like FNV-1a (it's a pretty similar algorithm) it does not pass most of the tests of quality benches, such as smhasher for instance.

That said, FNV-1a has been included in the benchs because of how common it is, so maybe we can do the same with FxHash.

I ran a bench on my MacBook pro and here are the results:

gxhash
  | 4 > 6311.01
  | 8 > 12602.92
  | 16 > 25271.20
  | 32 > 28965.08
  | 64 > 40690.10
  | 128 > 39480.40
  | 256 > 52541.93
  | 512 > 65411.56
  | 1024 > 74398.77
  | 2048 > 78583.25
  | 4096 > 82673.24
  | 8192 > 84440.44
  | 16384 > 82664.38
  | 32768 > 84542.57
  
fxhash
  | 4 > 11444.09
  | 8 > 8084.35
  | 16 > 13245.95
  | 32 > 16276.04
  | 64 > 18167.73
  | 128 > 16188.47
  | 256 > 12849.64
  | 512 > 9234.71
  | 1024 > 7388.30
  | 2048 > 5941.86
  | 4096 > 5375.71
  | 8192 > 5127.14
  | 16384 > 5004.31
  | 32768 > 4942.49
  
fnv-1a (for reference)
  | 4 > 2048.04
  | 8 > 2455.38
  | 16 > 2727.91
  | 32 > 2083.73
  | 64 > 1461.80
  | 128 > 1147.11
  | 256 > 922.75
  | 512 > 839.01
  | 1024 > 799.80
  | 2048 > 779.98
  | 4096 > 772.48
  | 8192 > 763.00
  | 16384 > 760.54
  | 32768 > 765.00

fxhash seems very good for 4 bytes inputs. But beware, its quality is very low and may collide a lot, so I am not sure it is a good default choice for any HashMap. For larger inputs performance quickly degrades.

In this repository there is a bench to test quality over a few criteria, here is the output for gxhash / fxhash:

> /gxhash % cargo bench --bench quality

Bench GxHash
  ✅ avalanche::<B,4>()
  ✅ avalanche::<B,10>()
  ✅ avalanche::<B,32>()
  ✅ avalanche::<B,128>()
  ✅ avalanche::<B,512>()
  ✅ distribution_values::<B,4>(128*128)
  ✅ distribution_values::<B,16>(128*128)
  ✅ distribution_values::<B,128>(128*128)
  ✅ distribution_values::<B,512>(128*128)
  ✅ distribution_bits::<B,4>()
  ✅ distribution_bits::<B,16>()
  ✅ distribution_bits::<B,128>()
  ✅ distribution_bits::<B,512>()
  ✅ collisions_padded_zeroes::<B>(128*128)
  ✅ collisions_flipped_bits::<B,2>(9)
  ✅ collisions_flipped_bits::<B,3>(9)
  ✅ collisions_flipped_bits::<B,4>(7)
  ✅ collisions_flipped_bits::<B,5>(6)
  ✅ collisions_flipped_bits::<B,6>(5)
  ✅ collisions_flipped_bits::<B,7>(5)
  ✅ collisions_flipped_bits::<B,9>(4)
  ✅ collisions_flipped_bits::<B,20>(4)
  ✅ collisions_flipped_bits::<B,32>(3)
  ✅ collisions_flipped_bits::<B,64>(3)
  ✅ collisions_flipped_bits::<B,256>(2)
  ✅ collisions_permute::<B,u8>(4,&Vec::from_iter(0..16))
  ✅ collisions_permute::<B,u8>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u16>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u32>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u64>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u128>(4,&Vec::from_iter(0..16))
  ✅ collisions_permute::<B,u128>(42,&Vec::from_iter(0..64))
  ✅ collisions_powerset_bytes::<B>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ collisions_powerset_bytes::<B>(&[0,1,2,4,8,16,32,64,128])
  ✅ hasher_collisions_permute::<B,u8>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ hasher_collisions_permute::<B,u32>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ hasher_collisions_permute::<B,u32>(&[0,1,2,4,8,16,32,64,128,256])
  ✅ hasher_collisions_powerset::<B,u32>(&[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])
  ✅ hasher_collisions_powerset::<B,u32>(&[0,1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384])
  
Bench FxHasher (rustc_hash)
  ❌ avalanche::<B,4>()
     | Score: 0.19. Expected is 0.
  ❌ avalanche::<B,10>()
     | Score: 0.01. Expected is 0.
  ❌ avalanche::<B,32>()
     | Score: 0.11. Expected is 0.
  ❌ avalanche::<B,128>()
     | Score: 0.03. Expected is 0.
  ❌ avalanche::<B,512>()
     | Score: 0.01. Expected is 0.
  ✅ distribution_values::<B,4>(128*128)
  ✅ distribution_values::<B,16>(128*128)
  ✅ distribution_values::<B,128>(128*128)
  ✅ distribution_values::<B,512>(128*128)
  ✅ distribution_bits::<B,4>()
  ✅ distribution_bits::<B,16>()
  ✅ distribution_bits::<B,128>()
  ✅ distribution_bits::<B,512>()
  ❌ collisions_padded_zeroes::<B>(128*128)
     | Score: 0.99993896484375. Expected is 0.
  ✅ collisions_flipped_bits::<B,2>(9)
  ✅ collisions_flipped_bits::<B,3>(9)
  ✅ collisions_flipped_bits::<B,4>(7)
  ✅ collisions_flipped_bits::<B,5>(6)
  ✅ collisions_flipped_bits::<B,6>(5)
  ✅ collisions_flipped_bits::<B,7>(5)
  ❌ collisions_flipped_bits::<B,9>(4)
     | Score: 0.11120755156228948. Expected is 0.
  ❌ collisions_flipped_bits::<B,20>(4)
     | Score: 0.09822390132156604. Expected is 0.
  ❌ collisions_flipped_bits::<B,32>(3)
     | Score: 0.06613427110477443. Expected is 0.
  ❌ collisions_flipped_bits::<B,64>(3)
     | Score: 0.0719024799632759. Expected is 0.
  ❌ collisions_flipped_bits::<B,256>(2)
     | Score: 0.0523635517880522. Expected is 0.
  ✅ collisions_permute::<B,u8>(4,&Vec::from_iter(0..16))
  ✅ collisions_permute::<B,u8>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u16>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u32>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u64>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u128>(4,&Vec::from_iter(0..16))
  ✅ collisions_permute::<B,u128>(42,&Vec::from_iter(0..64))
  ❌ collisions_powerset_bytes::<B>(&[0,1,2,3,4,5,6,7,8,9])
     | Score: 0.0009765625. Expected is 0.
  ❌ collisions_powerset_bytes::<B>(&[0,1,2,4,8,16,32,64,128])
     | Score: 0.001953125. Expected is 0.
  ✅ hasher_collisions_permute::<B,u8>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ hasher_collisions_permute::<B,u32>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ hasher_collisions_permute::<B,u32>(&[0,1,2,4,8,16,32,64,128,256])
  ✅ hasher_collisions_powerset::<B,u32>(&[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])
  ✅ hasher_collisions_powerset::<B,u32>(&[0,1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384])

DaniPopes · 2024-06-27T15:50:14Z

FYI rustc-hash version 2 was released which ships a different implementation (rust-lang/rustc-hash#37)

ogxd · 2024-06-27T23:08:38Z

The test suite is nowhere nearly a good as smhasher's one but this new version rustc_hash is much more robust and passes all quality tests in this repository 👍

Bench FxHasher (rustc_hash)
  ✅ avalanche::<B,4>()
  ✅ avalanche::<B,10>()
  ✅ avalanche::<B,32>()
  ✅ avalanche::<B,128>()
  ✅ avalanche::<B,512>()
  ✅ distribution_values::<B,4>(128*128)
  ✅ distribution_values::<B,16>(128*128)
  ✅ distribution_values::<B,128>(128*128)
  ✅ distribution_values::<B,512>(128*128)
  ✅ distribution_bits::<B,4>()
  ✅ distribution_bits::<B,16>()
  ✅ distribution_bits::<B,128>()
  ✅ distribution_bits::<B,512>()
  ✅ collisions_padded_zeroes::<B>(128*128)
  ✅ collisions_flipped_bits::<B,2>(9)
  ✅ collisions_flipped_bits::<B,3>(9)
  ✅ collisions_flipped_bits::<B,4>(7)
  ✅ collisions_flipped_bits::<B,5>(6)
  ✅ collisions_flipped_bits::<B,6>(5)
  ✅ collisions_flipped_bits::<B,7>(5)
  ✅ collisions_flipped_bits::<B,9>(4)
  ✅ collisions_flipped_bits::<B,20>(4)
  ✅ collisions_flipped_bits::<B,32>(3)
  ✅ collisions_flipped_bits::<B,64>(3)
  ✅ collisions_flipped_bits::<B,256>(2)
  ✅ collisions_permute::<B,u8>(4,&Vec::from_iter(0..16))
  ✅ collisions_permute::<B,u8>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u16>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u32>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u64>(42,&Vec::from_iter(0..64))
  ✅ collisions_permute::<B,u128>(4,&Vec::from_iter(0..16))
  ✅ collisions_permute::<B,u128>(42,&Vec::from_iter(0..64))
  ✅ collisions_powerset_bytes::<B>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ collisions_powerset_bytes::<B>(&[0,1,2,4,8,16,32,64,128])
  ✅ hasher_collisions_permute::<B,u8>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ hasher_collisions_permute::<B,u32>(&[0,1,2,3,4,5,6,7,8,9])
  ✅ hasher_collisions_permute::<B,u32>(&[0,1,2,4,8,16,32,64,128,256])
  ✅ hasher_collisions_powerset::<B,u32>(&[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19])
  ✅ hasher_collisions_powerset::<B,u32>(&[0,1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384])

Regarding performance, the finalization made it a little slower than it was in its previous version for tiny inputs but it performs much better for larger inputs. On my ARM laptop it performs as good as gxhash for up to 32 bytes inputs (which is already substantial!) with the added benefit of better portability given the current state of simd in rust.

fxhash
  | 4 > 6539.48
  | 8 > 13078.97
  | 16 > 26157.93
  | 32 > 25803.41
  | 64 > 27817.52
  | 128 > 28108.48
  | 256 > 26995.50
  | 512 > 25193.92
  | 1024 > 22180.96
  | 2048 > 19405.99
  | 4096 > 18234.17
  | 8192 > 17716.88
  | 16384 > 17462.95
  | 32768 > 17311.70

gxhash
  | 4 > 6311.01
  | 8 > 12602.92
  | 16 > 25271.20
  | 32 > 28965.08
  | 64 > 40690.10
  | 128 > 39480.40
  | 256 > 52541.93
  | 512 > 65411.56
  | 1024 > 74398.77
  | 2048 > 78583.25
  | 4096 > 82673.24
  | 8192 > 84440.44
  | 16384 > 82664.38
  | 32768 > 84542.57

cksac changed the title ~~Add ustc-hash to the bench?~~ Add rustc-hash to the bench? Jun 4, 2024

ogxd added the bench 📈 label Jun 5, 2024

ogxd self-assigned this Jun 7, 2024

ogxd linked a pull request Jun 7, 2024 that will close this issue

Add FxHash to bench #87

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rustc-hash to the bench? #85

Add rustc-hash to the bench? #85

cksac commented Jun 4, 2024

ogxd commented Jun 7, 2024 •

edited

Loading

DaniPopes commented Jun 27, 2024 •

edited

Loading

ogxd commented Jun 27, 2024

Add rustc-hash to the bench? #85

Add rustc-hash to the bench? #85

Comments

cksac commented Jun 4, 2024

ogxd commented Jun 7, 2024 • edited Loading

DaniPopes commented Jun 27, 2024 • edited Loading

ogxd commented Jun 27, 2024

ogxd commented Jun 7, 2024 •

edited

Loading

DaniPopes commented Jun 27, 2024 •

edited

Loading