Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffling tests #1

Closed
mratsim opened this issue Jul 23, 2018 · 4 comments
Closed

Shuffling tests #1

mratsim opened this issue Jul 23, 2018 · 4 comments

Comments

@mratsim
Copy link
Contributor

mratsim commented Jul 23, 2018

The cutoffs function description is the followinf:

An algorithm that we use to split up validators into groups at the start of every epoch, determining at what height they can make attestations and what shard they are making crosslinks for.

The schema:

image

A full Python implementation:

SHARD_COUNT = 1024
ETH_SUPPLY_CAP = 2**27
DEPOSIT_SIZE = 32
MAX_VALIDATOR_COUNT = ETH_SUPPLY_CAP // DEPOSIT_SIZE
EPOCH_LENGTH = 64
SLOT_DURATION = 8
MIN_COMMITTEE_SIZE = 128
END_EPOCH_GRACE_PERIOD = 8

def get_cutoffs(validator_count):
    height_cutoffs = [0]
    # EPOCH_LENGTH // phi
    cofactor = 39
    STANDARD_COMMITTEE_SIZE = MAX_VALIDATOR_COUNT // SHARD_COUNT
    # If there are not enough validators to fill a minimally
    # sized committee at every height, skip some heights
    if validator_count < EPOCH_LENGTH * MIN_COMMITTEE_SIZE:
        height_count = validator_count // MIN_COMMITTEE_SIZE or 1
        heights = [(i * cofactor) % EPOCH_LENGTH
                   for i in range(height_count)]
    # If there are enough validators, fill all the heights
    else:
        height_count = EPOCH_LENGTH
        heights = list(range(EPOCH_LENGTH))

    filled = 0
    for i in range(EPOCH_LENGTH - 1):
        if not i in heights:
            height_cutoffs.append(height_cutoffs[-1])
        else:
            filled += 1
            height_cutoffs.append(filled * validator_count // height_count)
    height_cutoffs.append(validator_count)

    # For the validators assigned to each height, split them up
    # into committees for different shards. Do not assign the
    # last END_EPOCH_GRACE_PERIOD heights in an epoch to any shards.
    shard_cutoffs = [0]
    for i in range(EPOCH_LENGTH - END_EPOCH_GRACE_PERIOD):
        size = height_cutoffs[i+1] - height_cutoffs[i]
        shards = (size + STANDARD_COMMITTEE_SIZE - 1) // STANDARD_COMMITTEE_SIZE
        pre = shard_cutoffs[-1]
        for j in range(1, shards+1):
            shard_cutoffs.append(pre + size * j // shards)

    return height_cutoffs, shard_cutoffs

(h, s) = get_cutoffs(16)

# Note: Python 2 because Apple/Mac ...
print height # [0, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16]
print len(height) # 65
print shard # [0, 16]

The Python implementation does not split like in the example
The example may be a simplification that shows the spirit of the spec but no sharding occurs because 16 validators is too small.

Nim implementation:
https://github.com/status-im/nim-beacon-chain/blob/585072ae154e3d10276b51397e707628a401c096/beacon_chain/private/helpers.nim#L35-L79

# to run from ./build
import ../beacon_chain/private/helpers
import ../beacon_chain/datatypes

let a = 16
var seed: Blake2_256_Digest

echo getShuffling(seed, a) # @[14, 9, 13, 10, 3, 6, 12, 1, 2, 0, 5, 15, 8, 7, 11, 4]
let (height, shard) = getCutoffs(a)

echo height # @[0, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16]
echo height.len # 65
echo shard # @[0, 16]
@mratsim mratsim changed the title Cutoff split function spec or example issue Shuffling tests Nov 26, 2018
@mratsim
Copy link
Contributor Author

mratsim commented Nov 26, 2018

Renaming to shuffling tests as the specs changed significantly and cutoffs are now just shuffling.

See also comments in #20.

The most comprehensive shuffling tests are by Sigp: https://github.com/sigp/lighthouse/blob/ba548e49a52687a655c61b443b6835d79c6d4236/beacon_chain/validator_shuffling/src/shuffle.rs

@mratsim mratsim mentioned this issue Nov 26, 2018
@mratsim
Copy link
Contributor Author

mratsim commented Nov 30, 2018

Also a topic for reference on the current shuffling algorithm design: ethereum/beacon_chain#57

@mratsim
Copy link
Contributor Author

mratsim commented Dec 3, 2018

As of today it seems like besides us no one is in line with the specs, which states:

Specs

shuffle

def shuffle(values: List[Any], seed: Hash32) -> List[Any]:
    """
    Returns the shuffled ``values`` with ``seed`` as entropy.
    """
    values_count = len(values)

    # Entropy is consumed from the seed in 3-byte (24 bit) chunks.
    rand_bytes = 3
    # The highest possible result of the RNG.
    rand_max = 2 ** (rand_bytes * 8) - 1

    # The range of the RNG places an upper-bound on the size of the list that
    # may be shuffled. It is a logic error to supply an oversized list.
    assert values_count < rand_max

    output = [x for x in values]
    source = seed
    index = 0
    while index < values_count - 1:
        # Re-hash the `source` to obtain a new pattern of bytes.
        source = hash(source)
        # Iterate through the `source` bytes in 3-byte chunks.
        for position in range(0, 32 - (32 % rand_bytes), rand_bytes):
            # Determine the number of indices remaining in `values` and exit
            # once the last index is reached.
            remaining = values_count - index
            if remaining == 1:
                break

            # Read 3-bytes of `source` as a 24-bit big-endian integer.
            sample_from_source = int.from_bytes(source[position:position + rand_bytes], 'big')

            # Sample values greater than or equal to `sample_max` will cause
            # modulo bias when mapped into the `remaining` range.
            sample_max = rand_max - rand_max % remaining

            # Perform a swap if the consumed entropy will not cause modulo bias.
            if sample_from_source < sample_max:
                # Select a replacement index for the current index.
                replacement_position = (sample_from_source % remaining) + index
                # Swap the current index with the replacement index.
                output[index], output[replacement_position] = output[replacement_position], output[index]
                index += 1
            else:
                # The sample causes modulo bias. A new sample should be read.
                pass

    return output

split

def split(values: List[Any], split_count: int) -> List[Any]:
    """
    Splits ``values`` into ``split_count`` pieces.
    """
    list_length = len(values)
    return [
        values[(list_length * i // split_count): (list_length * (i + 1) // split_count)]
        for i in range(split_count)
    ]

clamp

def clamp(minval: int, maxval: int, x: int) -> int:
    """
    Clamps ``x`` between ``minval`` and ``maxval``.
    """
    if x <= minval:
        return minval
    elif x >= maxval:
        return maxval
    else:
        return x

get_new_shuffling

def get_new_shuffling(seed: Hash32,
                      validators: List[ValidatorRecord],
                      crosslinking_start_shard: int) -> List[List[ShardAndCommittee]]:
    """
    Shuffles ``validators`` into shard committees using ``seed`` as entropy.
    """
    active_validator_indices = get_active_validator_indices(validators)

    committees_per_slot = clamp(
        1,
        SHARD_COUNT // EPOCH_LENGTH,
        len(active_validator_indices) // EPOCH_LENGTH // TARGET_COMMITTEE_SIZE,
    )

    # Shuffle with seed
    shuffled_active_validator_indices = shuffle(active_validator_indices, seed)

    # Split the shuffled list into epoch_length pieces
    validators_per_slot = split(shuffled_active_validator_indices, EPOCH_LENGTH)

    output = []
    for slot, slot_indices in enumerate(validators_per_slot):
        # Split the shuffled list into committees_per_slot pieces
        shard_indices = split(slot_indices, committees_per_slot)

        shard_id_start = crosslinking_start_shard + slot * committees_per_slot

        shards_and_committees_for_slot = [
            ShardAndCommittee(
                shard=(shard_id_start + shard_position) % SHARD_COUNT,
                committee=indices,
                total_validator_count=len(active_validator_indices),
            )
            for shard_position, indices in enumerate(shard_indices)
        ]
        output.append(shards_and_committees_for_slot)

    return output

Testing

Is a bit tricky due to the random seed, the test vectors should be produced by a literal copy of the python specs with the random seed.

Our implementation

https://github.com/status-im/nim-beacon-chain/blob/0d775eefcb17e528429d123e999c47d567eaafaf/beacon_chain/spec/helpers.nim#L12-L66

https://github.com/status-im/nim-beacon-chain/blob/0d775eefcb17e528429d123e999c47d567eaafaf/beacon_chain/spec/validator.nim#L80-L111

Ethereum/beacon_chain repo

It is still using the old dynasty, and seems like development is now down in Py-EVM.

def shuffle(lst: List[Any],
            seed: Hash32,
            config: Dict[str, Any]=DEFAULT_CONFIG) -> List[Any]:
    lst_count = len(lst)
    assert lst_count <= 16777216
    o = [x for x in lst]
    source = seed
    i = 0
    while i < lst_count:
        source = blake(source)
        for pos in range(0, 30, 3):
            m = int.from_bytes(source[pos:pos+3], 'big')
            remaining = lst_count - i
            if remaining == 0:
                break
            rand_max = 16777216 - 16777216 % remaining
            if m < rand_max:
                replacement_pos = (m % remaining) + i
                o[i], o[replacement_pos] = o[replacement_pos], o[i]
                i += 1
    return o


def split(lst: List[Any], N: int) -> List[Any]:
    list_length = len(lst)
    return [
        lst[(list_length * i // N): (list_length * (i+1) // N)] for i in range(N)
    ]


def get_new_shuffling(seed: Hash32,
                      validators: List['ValidatorRecord'],
                      dynasty: int,
                      crosslinking_start_shard: ShardId,
                      config: Dict[str, Any]=DEFAULT_CONFIG) -> List[List[ShardAndCommittee]]:
    cycle_length = config['cycle_length']
    min_committee_size = config['min_committee_size']
    shard_count = config['shard_count']
    avs = get_active_validator_indices(dynasty, validators)
    if len(avs) >= cycle_length * min_committee_size:
        committees_per_slot = len(avs) // cycle_length // (min_committee_size * 2) + 1
        slots_per_committee = 1
    else:
        committees_per_slot = 1
        slots_per_committee = 1
        while (len(avs) * slots_per_committee < cycle_length * min_committee_size and
               slots_per_committee < cycle_length):
            slots_per_committee *= 2
    o = []

    shuffled_active_validator_indices = shuffle(avs, seed, config)
    validators_per_slot = split(shuffled_active_validator_indices, cycle_length)
    for slot, slot_indices in enumerate(validators_per_slot):
        shard_indices = split(slot_indices, committees_per_slot)
        shard_id_start = crosslinking_start_shard + (
            slot * committees_per_slot // slots_per_committee
        )
        o.append([ShardAndCommittee(
            shard_id=(shard_id_start + j) % shard_count,
            committee=indices
        ) for j, indices in enumerate(shard_indices)])
    return o

Py-EVM

also uses dynasty in get_new_shuffling

https://github.com/ethereum/py-evm/blob/f2d0d5d187400ba46a6b8f5b1f1c9997dc7fbb5a/eth/beacon/utils/random.py#L27-L86

def shuffle(values: Sequence[Any],
            seed: Hash32) -> Iterable[Any]:
    """
    Returns the shuffled ``values`` with seed as entropy.
    Mainly for shuffling active validators in-protocol.
    Spec: https://github.com/ethereum/eth2.0-specs/blob/0941d592de7546a428066c0473fd1000a7e3e3af/specs/beacon-chain.md#helper-functions  # noqa: E501
    """
    values_count = len(values)

    if values_count > SAMPLE_RANGE:
        raise ValueError(
            "values_count (%s) should less than or equal to SAMPLE_RANGE (%s)." %
            (values_count, SAMPLE_RANGE)
        )

    output = [x for x in values]
    source = seed
    index = 0
    while index < values_count:
        # Re-hash the source
        source = blake(source)
        for position in range(0, 30, 3):  # gets indices 3 bytes at a time
            # Select a 3-byte sampled int
            sample_from_source = int.from_bytes(source[position:position + 3], 'big')
            # `remaining` is the size of remaining indices of this round
            remaining = values_count - index
            if remaining == 0:
                break

            # Set a random maximum bound of sample_from_source
            rand_max = SAMPLE_RANGE - SAMPLE_RANGE % remaining

            # Select `replacement_position` with the given `sample_from_source` and `remaining`
            if sample_from_source < rand_max:
                # Use random number to get `replacement_position`, where it's not `index`
                replacement_position = (sample_from_source % remaining) + index
                # Swap the index-th and replacement_position-th elements
                (output[index], output[replacement_position]) = (
                    output[replacement_position],
                    output[index]
                )
                index += 1
            else:
                pass

    return output


@to_tuple
def split(seq: Sequence[TItem], pieces: int) -> Iterable[Any]:
    """
    Returns the split ``seq`` in ``pieces`` pieces in protocol.
    Spec: https://github.com/ethereum/eth2.0-specs/blob/0941d592de7546a428066c0473fd1000a7e3e3af/specs/beacon-chain.md#helper-functions  # noqa: E501
    """
    list_length = len(seq)
    return [
        seq[(list_length * i // pieces): (list_length * (i + 1) // pieces)]
        for i in range(pieces)
    ]

https://github.com/ethereum/py-evm/blob/f2d0d5d187400ba46a6b8f5b1f1c9997dc7fbb5a/eth/beacon/helpers.py#L272-L344

@to_tuple
def get_new_shuffling(*,
                      seed: Hash32,
                      validators: Sequence['ValidatorRecord'],
                      dynasty: int,
                      crosslinking_start_shard: int,
                      cycle_length: int,
                      min_committee_size: int,
                      shard_count: int) -> Iterable[Iterable[ShardAndCommittee]]:
    """
    Return shuffled ``shard_and_committee_for_slots`` (``[[ShardAndCommittee]]``) of
    the given active ``validators``.
    Two-dimensional:
    The first layer is ``slot`` number
        ``shard_and_committee_for_slots[slot] -> [ShardAndCommittee]``
    The second layer is ``shard_indices`` number
        ``shard_and_committee_for_slots[slot][shard_indices] -> ShardAndCommittee``
    Example:
        validators:
            [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
        After shuffling:
            [6, 0, 2, 12, 14, 8, 10, 4, 9, 1, 5, 13, 15, 7, 3, 11]
        Split by slot:
            [
                [6, 0, 2, 12, 14], [8, 10, 4, 9, 1], [5, 13, 15, 7, 3, 11]
            ]
        Split by shard:
            [
                [6, 0], [2, 12, 14], [8, 10], [4, 9, 1], [5, 13, 15] ,[7, 3, 11]
            ]
        Fill to output:
            [
                # slot 0
                [
                    ShardAndCommittee(shard_id=0, committee=[6, 0]),
                    ShardAndCommittee(shard_id=1, committee=[2, 12, 14]),
                ],
                # slot 1
                [
                    ShardAndCommittee(shard_id=2, committee=[8, 10]),
                    ShardAndCommittee(shard_id=3, committee=[4, 9, 1]),
                ],
                # slot 2
                [
                    ShardAndCommittee(shard_id=4, committee=[5, 13, 15]),
                    ShardAndCommittee(shard_id=5, committee=[7, 3, 11]),
                ],
            ]
    NOTE: The spec might be updated to output an array rather than an array of arrays.
    """
    active_validators = get_active_validator_indices(dynasty, validators)
    active_validators_size = len(active_validators)
    committees_per_slot = clamp(
        1,
        shard_count // cycle_length,
        active_validators_size // cycle_length // (min_committee_size * 2) + 1,
    )
    shuffled_active_validator_indices = shuffle(active_validators, seed)

    # Split the shuffled list into cycle_length pieces
    validators_per_slot = split(shuffled_active_validator_indices, cycle_length)
    for index, slot_indices in enumerate(validators_per_slot):
        # Split the shuffled list into committees_per_slot pieces
        shard_indices = split(slot_indices, committees_per_slot)
        shard_id_start = crosslinking_start_shard + index * committees_per_slot
        yield _get_shards_and_committees_for_shard_indices(
            shard_indices,
            shard_id_start,
            shard_count,
        )

Sigp

Sigp is still using the old min_committe_size instead of target committee size

https://github.com/sigp/lighthouse/blob/ba548e49a52687a655c61b443b6835d79c6d4236/beacon_chain/validator_shuffling/src/shuffle.rs

/// Given the validator list, delegates the validators into slots and comittees for a given cycle.
fn generate_cycle(
    validator_indices: &[usize],
    shard_indices: &[usize],
    crosslinking_shard_start: usize,
    cycle_length: usize,
    min_committee_size: usize,
) -> Result<DelegatedCycle, ValidatorAssignmentError> {
    let validator_count = validator_indices.len();
    let shard_count = shard_indices.len();

    if shard_count / cycle_length == 0 {
        return Err(ValidatorAssignmentError::TooFewShards);
    }

    let (committees_per_slot, slots_per_committee) = {
        if validator_count >= cycle_length * min_committee_size {
            let committees_per_slot = min(
                validator_count / cycle_length / (min_committee_size * 2) + 1,
                shard_count / cycle_length,
            );
            let slots_per_committee = 1;
            (committees_per_slot, slots_per_committee)
        } else {
            let committees_per_slot = 1;
            let mut slots_per_committee = 1;
            while (validator_count * slots_per_committee < cycle_length * min_committee_size)
                & (slots_per_committee < cycle_length)
            {
                slots_per_committee *= 2;
            }
            (committees_per_slot, slots_per_committee)
        }
    };

    let cycle = validator_indices
        .honey_badger_split(cycle_length)
        .enumerate()
        .map(|(i, slot_indices)| {
            let shard_start =
                crosslinking_shard_start + i * committees_per_slot / slots_per_committee;
            slot_indices
                .honey_badger_split(committees_per_slot)
                .enumerate()
                .map(|(j, shard_indices)| ShardAndCommittee {
                    shard: ((shard_start + j) % shard_count) as u16,
                    committee: shard_indices.to_vec(),
                }).collect()
        }).collect();
    Ok(cycle)
}

@tersec
Copy link
Contributor

tersec commented Feb 22, 2019

The new shuffling approach should probably, though tests remain always useful, live in a new issue. Closing.

@tersec tersec closed this as completed Feb 22, 2019
stefantalpalaru added a commit that referenced this issue Oct 31, 2019
cheatfate added a commit that referenced this issue Jul 19, 2021
tersec pushed a commit that referenced this issue Jul 19, 2021
* Fix firstSuccess() template missing timeouts.

* Fix validator race condition.
Fix logs to be compatible with beacon_node logs.
Add CatchableError handlers to avoid crashes.
Move some logs from Notice to Debug level.
Fix some [unused] warnings.

* Fix block proposal issue for slots in the past and from the future.

* Change sent to published.

* Address review comments #1.
etan-status pushed a commit that referenced this issue May 12, 2022
etan-status pushed a commit that referenced this issue May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants