You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 6, 2025. It is now read-only.
TLDR; I think there are two bugs in the shuffling function. I suggest how to fix them. I also built a "sandbox" for testing shuffling functions.
I recently went on a deep-dive into shuffling functions to figure out why the beacon_chain one gets stuck in a loop. I have some findings which I will report here.
I believe there are two errors in the present implementation:
Modulo-bias in when pulling a "random" list index from the seed.
There is one unnecessary extra iteration in the swap.
I'll give a quick intro to the the Fisher-Yates shuffle and modulo bias for some background. I found it interesting, so you might too.
Background
Durstenfeld-Knuth-Fisher-Yates Shuffle
This is a method of shuffling a list "introduced" by Richard Durstenfeld in 1964 and "popularized" by Donald E. Knuth in The Art of Computer Programming. Apparently it's similar to work done by Ronald Fisher and Frank Yates in 1938. More @ wikipedia.
History aside, it looks like this:
-- To shuffle an array a of n elements (indices 0..n-1):
for i from 0 to n−2 do
j ← random integer such that i ≤ j < n
exchange a[i] and a[j]
It seems that this is the primitive for the shuffling algorithm used in the spec, so lets assume this is what we're going to use. Seems optimal.
Modulo-Bias
For background, modulo bias is introduced when using modulo to "fit" an random integer of a large range into a smaller range when the smaller range is not a clean multiple of the larger range. In other words, large_range % small_range > 0.
As an example, consider a function rng() which generates an integer between 0..3 (inclusive) and an application which requires a integer of 0..2 (inclusive). A developer might simply call rng() % 3, however this introduces a bias because both 0 % 3 = 0 and 3 % 3 = 0. This skews the post-modulo result in favor of 0. This is a Bad Thing™.
A method to fix this is to simply filter out all "biased" results, that is any results which are greater than RAND_MAX - (RAND_MAX % n), where RAND_MAX defines the upper bound of an rng (0..RAND_MAX) and n is the upper bound of the desired range (0..n) and RAND_MAX > n. Lets apply this knowledge to the previous example and define the following function:
The developer can now call filtered_rng(3). This will give them results without a modulo bias. The downside is that it's now possible for the function to run indeterminately, because it's relying on rng not to generate certain numbers. We can just assume that this won't actually run forever, instead it'll just slow us down a every now and then.
Proposed changes
I propose the following code to fix these issues (the unused config var stays for simplicity):
You can find this code here in a "shuffling sandbox" I made to test the conclusions I have come to.
First, to address error 1 (modulo-bias), the rand_max calculation has been moved to where it can dynamically filter the "rng" results on each call to the "rng". If you look at the shuffling psuedo-code (above) you can see that the rng is required to be within dynamic bounds i ≤ j < n, therefore the "modulo-biased" rng results need to be filtered specific to these bounds, not as per the overall list result (as in the current implementation).
Secondly, to address error 2 (unnecessary iteration), the while loop has been changed to do one less iteration. The reason this last iteration is unnecessary because we're attempting to swap the last element in the list, n-1, with any other integer greater or equal to n - 1 but less than n. The only possible outcome here is for the last element to be swapped with itself which is meaningless.
Extra Reading
During the process of this, I made another shuffling implementation here. It is slower than the other implementations (due to abstraction overhead in Python), however it separates the logic into "shuffler" and "rng" components. This is useful in gaining an understanding of the fundamental structures involved.
Optimizations
There are a couple of potential speed-ups I thought of whilst doing this, will test them at some point.
Only use the minimum required amount of bits from the seed. E.g., for each swap consume ceil(log2(lst_count - i)) bits. This should reduce the amount of hashes at the cost of doing a little extra math. <-- After some brief testing I no longer think this is a sensible optimization.
Don't copy the list before performing the shuffle, use a shuffle method that naturally creates a new list instead of swapping an existing one.
Point (2) is not very important, it would not change the outcome of the shuffle. Worth noting for client devs. However, point (1) would change the results of the shuffling function and would therefore need to be included in the spec. I'll come back to this once I have some benchmarks on speed increases.
As always, happy for feedback. Sorry about the essay.
The text was updated successfully, but these errors were encountered:
FYI, I played around with optimization 1 (only use the minimum amount of bits) and it doesn't seem to be feasible/sensible. The "optimization" ran orders of magnitude slower than the original version in Python.
The blake2 hashes are impressively quick, so their overhead isn't significant.
FYI, this was discussed in the last Eth 2.0 Implementers Call (approx. 2 mins ago) and the spec has been changed. The spec still does an extra shuffle, however that doesn't affect the outcome of the shuffle, so it's not particularly important.
TLDR; I think there are two bugs in the shuffling function. I suggest how to fix them. I also built a "sandbox" for testing shuffling functions.
I recently went on a deep-dive into shuffling functions to figure out why the beacon_chain one gets stuck in a loop. I have some findings which I will report here.
I believe there are two errors in the present implementation:
I'll give a quick intro to the the Fisher-Yates shuffle and modulo bias for some background. I found it interesting, so you might too.
Background
Durstenfeld-Knuth-Fisher-Yates Shuffle
This is a method of shuffling a list "introduced" by Richard Durstenfeld in 1964 and "popularized" by Donald E. Knuth in The Art of Computer Programming. Apparently it's similar to work done by Ronald Fisher and Frank Yates in 1938. More @ wikipedia.
History aside, it looks like this:
It seems that this is the primitive for the shuffling algorithm used in the spec, so lets assume this is what we're going to use. Seems optimal.
Modulo-Bias
For background, modulo bias is introduced when using modulo to "fit" an random integer of a large range into a smaller range when the smaller range is not a clean multiple of the larger range. In other words,
large_range % small_range > 0
.As an example, consider a function
rng()
which generates an integer between0..3
(inclusive) and an application which requires a integer of0..2
(inclusive). A developer might simply callrng() % 3
, however this introduces a bias because both0 % 3 = 0
and3 % 3 = 0
. This skews the post-modulo result in favor of0
. This is a Bad Thing™.A method to fix this is to simply filter out all "biased" results, that is any results which are greater than
RAND_MAX - (RAND_MAX % n)
, whereRAND_MAX
defines the upper bound of an rng (0..RAND_MAX
) andn
is the upper bound of the desired range (0..n
) andRAND_MAX > n
. Lets apply this knowledge to the previous example and define the following function:The developer can now call
filtered_rng(3)
. This will give them results without a modulo bias. The downside is that it's now possible for the function to run indeterminately, because it's relying onrng
not to generate certain numbers. We can just assume that this won't actually run forever, instead it'll just slow us down a every now and then.Proposed changes
I propose the following code to fix these issues (the unused
config
var stays for simplicity):You can find this code here in a "shuffling sandbox" I made to test the conclusions I have come to.
First, to address error 1 (modulo-bias), the
rand_max
calculation has been moved to where it can dynamically filter the "rng" results on each call to the "rng". If you look at the shuffling psuedo-code (above) you can see that the rng is required to be within dynamic boundsi ≤ j < n
, therefore the "modulo-biased" rng results need to be filtered specific to these bounds, not as per the overall list result (as in the current implementation).Secondly, to address error 2 (unnecessary iteration), the
while
loop has been changed to do one less iteration. The reason this last iteration is unnecessary because we're attempting to swap the last element in the list,n-1
, with any other integer greater or equal ton - 1
but less thann
. The only possible outcome here is for the last element to be swapped with itself which is meaningless.Extra Reading
During the process of this, I made another shuffling implementation here. It is slower than the other implementations (due to abstraction overhead in Python), however it separates the logic into "shuffler" and "rng" components. This is useful in gaining an understanding of the fundamental structures involved.
Optimizations
There are a couple of potential speed-ups I thought of whilst doing this, will test them at some point.
Only use the minimum required amount of bits from the seed. E.g., for each swap consume<-- After some brief testing I no longer think this is a sensible optimization.ceil(log2(lst_count - i))
bits. This should reduce the amount of hashes at the cost of doing a little extra math.Point (2) is not very important, it would not change the outcome of the shuffle. Worth noting for client devs.
However, point (1) would change the results of the shuffling function and would therefore need to be included in the spec. I'll come back to this once I have some benchmarks on speed increases.As always, happy for feedback. Sorry about the essay.
The text was updated successfully, but these errors were encountered: