-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TODO: SimpleSerialize (SSZ) spec #2
Comments
Repost the gitter discussion about integer types:
|
Good call @hwwhww! Here's our README: https://github.com/sigp/lighthouse/tree/master/ssz EDIT: Oh, I see it's already in your list :) |
This is great, I love having so much information all in one place. A few quick comments: Some of the above on uint vs int and explicit types is covered here and will hopefully clear up this small(?) issue. I'm writing a blog post on the basics of ssz and the rationale/motivation for introducing a new serialization format, I think the biggest issue with something like this is getting people to understand the rationale for a new format/understanding why it's necessary. Cheers. |
One thing I did want to question: in RLP the main motivations were simplicity and guaranteed absolute byte-perfect consistency. Obviously simplicity is carried over, it seems like guaranteed absolute byte-perfect consistency is also a part of simple serialize. Can someone confirm this is the case? |
I have some archaeological discovery that might shed some light on this issue.
Buterin, V. 2014 RLP v.s. external technologies |
Here is Nim implementation, simple tests and the discussion for a common test format. One thing that I'm not too sure of regarding current SSZ are serializing container types. For example: # CrystallizedState
fields = {
# List of validators
'validators': [ValidatorRecord],
# Last CrystallizedState recalculation
'last_state_recalc': 'int64',
# What active validators are part of the attester set
# at what slot, and in what shard. Starts at slot
# last_state_recalc - CYCLE_LENGTH
'shard_and_committee_for_slots': [[ShardAndCommittee]],
# The last justified slot
'last_justified_slot': 'int64',
# Number of consecutive justified slots ending at this one
'justified_streak': 'int64',
# The last finalized slot
'last_finalized_slot': 'int64',
# The current dynasty
'current_dynasty': 'int64',
# Records about the most recent crosslink `for each shard
'crosslink_records': [CrosslinkRecord],
# Used to select the committees for each shard
'dynasty_seed': 'hash32',
# Start of the current dynasty
'dynasty_start': 'int64'
}
# ValidatorRecord
fields = {
# The validator's public key
'pubkey': 'int256',
# What shard the validator's balance will be sent to
# after withdrawal
'withdrawal_shard': 'int16',
# And what address
'withdrawal_address': 'address',
# The validator's current RANDAO beacon commitment
'randao_commitment': 'hash32',
# Current balance
'balance': 'int128',
# Dynasty where the validator is inducted
'start_dynasty': 'int64',
# Dynasty where the validator leaves
'end_dynasty': 'int64'
}
# ShardsAndCommittee
fields = {
# The shard ID
'shard_id': 'int16',
# Validator indices
'committee': ['int24']
}
# CrosslinkRecord
fields = {
# What dynasty the crosslink was submitted in
'dynasty': 'int64',
# What slot
'slot': 'int64',
# The block hash
'hash': 'hash32'
} |
Some thoughts in the past 2 days: Type prefixesReplying to @AlexeyAkhunov ethereum/beacon_chain#94 and ethereum/eth2.0-pm#8 As mentioned by @raulk in ethereum/eth2.0-pm#8 call, Protocol Labs (libp2p) will probably develop specialized Wireshard dissectors for Eth2.0 which should help traffic analysis a lot. However, I do think type prefixes would be useful at the top level for For instance let's look at ActiveState and CrystallizedState # ActiveState
fields = {
# Attestations that have not yet been processed
'pending_attestations': [AttestationRecord],
# Most recent 2 * CYCLE_LENGTH block hashes, older to newer
'recent_block_hashes': ['hash32']
} # CrystallizedState
fields = {
# List of validators
'validators': [ValidatorRecord],
# Last CrystallizedState recalculation
'last_state_recalc': 'int64',
# What active validators are part of the attester set
# at what slot, and in what shard. Starts at slot
# last_state_recalc - CYCLE_LENGTH
'shard_and_committee_for_slots': [[ShardAndCommittee]],
# The last justified slot
'last_justified_slot': 'int64',
# Number of consecutive justified slots ending at this one
'justified_streak': 'int64',
# The last finalized slot
'last_finalized_slot': 'int64',
# The current dynasty
'current_dynasty': 'int64',
# Records about the most recent crosslink `for each shard
'crosslink_records': [CrosslinkRecord],
# Used to select the committees for each shard
'dynasty_seed': 'hash32',
# Start of the current dynasty
'dynasty_start': 'int64'
} To be able to deserialize and distinguish between those efficiently you would need to send or agree on the types beforehand. Also having type prefixes for top-level types would allow to concat multiple messages in a single one. I don't think we should use type prefixes for 'hash32' or 'address' unless they are serialized stand-alone that would add a lot of bytes especially for types that can appear in a list. On schema versionFrom ethereum/beacon_chain#94 as well I think we should have 2 bytes at the start to encode SSZ major.minor version On length prefix vs length suffixethereum/beacon_chain#94 and ethereum/eth2.0-pm#8
|
A community member has made progress with developing Wireshark dissectors for some basic libp2p protocols, like SecIO (security), Multistream (protocol selection), Yamux (multiplexing), etc. However, the challenge is accessing the session secrets to further decrypt the payload. We've briefly discussed that in this issue: libp2p/specs#46. Wireshark uses an "onion" approach to dissect frames. It calls dissectors iteratively to decapsulate the different layers in the message. At one point the dissection is going to encounter an application specific protocol in the payload (eth2.0 protocol), and to decrypt it, a Wireshark dissector should be registered for it. It's best if those dissectors are developed and maintained by the Ethereum 2.0 community, as they are application-specific and they'll need to evolve in lockstep with the protocol. EDIT: the existing dissectors are likely outdated and not exhaustively tested. EDIT 2: Contributions super welcome. Let’s buidl this tooling together. |
the argument is that you don't keep all data in memory at all - ie not even in a vector. Consider a linked list - there are two common implementations - one that has
This depends - if you know the size of the full buffer (as is common when working with a datagram-style protocol, or if like above, you're storing the total size somewhere), you can work backwards from there to figure out the positions of the fields (just like when it's at the front, and you're implicitly assuming the buffer starts at "0"). |
Based on my understanding, one of the priorities of the format was efficient in-memory access (along with a willingness to pay a price in terms of space efficiency) I just went through the exercise of hardening our implementation against overruns and misaligned pointers (status-im/nimbus-eth2#7 - hopefully caught all of them 😄) - the following things are noteworthy, when writing cross-platform code:
A possible solution to this issue is to introduce a rule to the serializer that ensures that padding is mandatory and deterministic:
For a good example of a well-aligned format, check out https://google.github.io/flatbuffers/flatbuffers_internals.html It is worth to note that modern X86 processors have mostly solved unaligned access and the performance penalty is small - nonetheless, for several cases of optimized access, even X86 requires aligned data as the optimization level of the compiler gets cranked up - sse4.2 for example, as seen here: http://pzemtsov.github.io/2016/11/06/bug-story-alignment-on-x86.html |
I was reading through the current spec. Couple of thoughts:
@arnetheduck Here's another nice example of a well-designed serialization format meant for fast mmaping: https://capnproto.org/index.html I do think that takes alignment into consideration will necessarily not be 'simple'. |
@recmo |
@arnetheduck I know it handles alignment, that's why i brought it up. :) The point I failed to make was that handling alignment and creating a "Simple" format are somewhat conflicting goals. |
Let's take discussion of alignment and it's relevance to another issue if necessary |
…diasg First pass review
Open an issue for following up our discussion on gitter.
Specification requirements
Anything else? :)
Reference
cc @vbuterin @djrtwo @arnetheduck @mratsim @paulhauner @NatoliChris @PoSeyy
The text was updated successfully, but these errors were encountered: