Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collator Protocol Revamp Draft #616

Open
2 tasks
eskimor opened this issue Jun 12, 2023 · 18 comments
Open
2 tasks

Collator Protocol Revamp Draft #616

eskimor opened this issue Jun 12, 2023 · 18 comments
Labels
I6-meta A specific issue for grouping tasks or bugs of a specific category.

Comments

@eskimor
Copy link
Member

eskimor commented Jun 12, 2023

Board (to be filled): https://github.com/orgs/paritytech/projects/68

Note: Spam resilient accepting of messages is generally useful. So ideally we abstract that part and make it useful in other places as well. E.g. for offchain XCMP on collators.

Scope

In addition the collator protocol changes below, it would be good to take the opportunity to revamp the whole backing pipe line. Those subsystems are very much interconnected and merging them into one subsystem with a hierarchical pipe-line structure would allow for better code encapsulation, improving maintainability and reasoning of the code. E.g. quite a few types would no longer need to be exposed globally, but could be the interface of concrete modules, making it

(a) clear for the reader immediately what the scope and purpose is of those types.
(b) allows for better guarantee of invariants: Fields of the type can be protected and invariants can be guaranteed by the interface.

Current Status

The collator protocol as implemented has not aged very well. Code quality is modest, missing structure, separation of concerns and an overarching design. Apart from that, the connection management to collators needs improvements.

Requirements and Goals of the Collator Protocol

The purpose of the collator protocol is providing the means for collators to provide collations to validators. What complicates matters is that collators are not trusted and even worse, unknown. Any computer on the internet can pretend to be a collator and try to consume validator resources. One important ramification of this is, that collations are not sent by collators directly, but they announce it first and then the validator can decide whether to fetch it via request response or not.

Therefore it is not possible for nodes to just eat all the bandwidth that easily, still with lots of nodes they could open up lots of connections causing DoS this way.

Another concern is nodes announcing a collation, but then when the validator tries to fetch it, either just delivers data very slowly or sending invalid data. In both cases the node would waste time and resources, which could also be used for DoS.

We would like to establish a persistent reputation system for the collator protocol, where nodes can gain the "trust" of validators over time and other collators can then get their collations to the validator via those "trusted" nodes. I am quoting the word "trust" here, because obviously we never trust anybody, what I mean here is that the validator will be willing to invest more resource for those proven peers than others.

Even a persistent reputation system, where only some nodes need to be able to connect, needs to be bootstrapped. Matters are worse for on-demand parachains as there the bootstrapping could take even longer. For this reason there will also be the possibility of paying fees and locking DOT on the relay chain to register PeerIds which will then be in that preferred/trusted set from the very beginning.

Collator Side

The protocol on the collator side should take care of the following:

  • Find the correct backing group to connect to, based on the ParaId we are a collator for.
  • Connection management: When to connect, when to disconnect.
  • Announcing and transmitting collations as provided.

Currently connection management is tied to collations. If a collation is a provided via DistributeCollation we try to establish the required connections. This should be changed so that connection management is independent of actually provided collations. We should be instructed to establish connections for some set of ParaIds. By separating concerns here, we can connect in advance before we even have a collation and thus speed things up: Once we receive the collation we can immediately advertise.

If a collation is provided without a request to establish connections first, we can still try to connect on the fly. This means this could be a backwards compatible change.

In general it should be possible for a single node to collate for multiple parachains at the same time and at least collate for other collators as well. Goal is, if there is at least one "good" node able to connect to validators there should be a path for collators to get their collation to validators. While persistent reputations are per ParaId, being a collator/collation providing node for multiple parachains is not directly benefiting your score, but you gain a slight advantage as your connection might already be established when the next parachain comes up you are collating for. Being a collation providing node for multiple parachains is a rather weak goal, if it complicates matters we can drop it, separating collation production from providing it to validators is a must though.

Requirements of projects like https://www.tanssi.network/ should be taken into account as well.

Validator Side

Retrieving collations from nodes/collators by being robust against the following DoS attacks:

  • Attackers opening up connections at a massive rate, making it impossible for honest nodes to even connect.
  • Attackers pretending to have collation, but then letting the fetching timeout/or providing garbage data.

Separating PoV production from providing a PoV

Maintaining liveness becomes a lot easier, if we don't require the collator producing a collation also being the node advertising to the backing group. With this, instead of requiring all honest nodes to be able to connect, we reduce the problem to: "Some honest node needs to be able to connect."

Connection Management

We keep a persisted HashMap of known peers for each ParaId. This HashMap is keyed by the PeerId of the connected peer and stores a reputation score. We will keep in memory those HashMaps for any currently relevant assignments, let's call them score maps.

By default we will accept any incoming connection, up to a certain threshold of already established connections, where we will start becoming picky: Once the threshold is reached we will lookup the PeerId of incoming connection requests in our score maps to get the score of the peer, if not found, we default to a score of 0. We will then only accept the connection, if its score is higher than the connection score (see below) of any established connection and we will disconnect that lower scored connection (if there are multiple, then the lowest scored). If the connecting peer is scored lower than all the live connections, we will reject the connection. In case the connecting peer has the same score as the lowest scored connection, we decide in some random fashion (reservoir sampling) whether to cancel or terminate one of those low scored existing connections: Connections having a collation fetch in process, will be immune.

Given that we will always only utilize relevant score maps at any given point in time, we will automatically stop accepting connections from peers associated with parachains we currently don't care about - iff there is pressure on available connection slots. This is different from the status quo, where we will immediately drop connections if the announced ParaId is not relevant right now.

Put differently: Connection Management on the validator side is changed to be very permissive, if available slots allow for it. We allow any incoming connection and also keep them as long as there is no scarcity, only then will be disconnect based on score, which is time dependent based on current assignments.

Assuming a suitable score management, this should allow connections of at least some honest peers.

Robust Collation Fetching Problem

Once peers successfully connected, they usually will advertise collations. Now we are facing the next problem and DoS vector: From which peer will we actually fetch the collation? Collations are heavy, we expect them to take several seconds to download them. If we picked a bad peer, it could provide a very slow transmission, having us run into a timeout, with the need to try the next peer, which might do the same. Torrent style fetching of PoVs could actually help there, as we could then start fetching different chunks from all peers advertising the same collation in parallel, without wasting resources. If a peer does not deliver a particular chunk, we can cheaply request it from another peer.

For simplicity we will not start up with torrent style fetching though, but would like to resort to simpler means. One thing that immediately comes to mind, is using the peer's score also for deciding from whom to fetch. The problem: If we simply only fetch from the peer with the highest score, this leads to only one peer building up a score (if always available).

Solving this problem is harder and ties in strongly with how we determine the score, see section Determining the Score.

Connection Score

Connections, when established, get the score as determined by the entry in the current score maps of the connecting peer (peer score), but then it will change over time: At some point the current assignments go out of scope and the score maps get replaced. For peers only serving one parachain, this will usually mean the connection score will drop to 0, if the peer is serving multiple parachains it will be its score as found in the now current score maps.

Peer Score

Proof of Work

We don't know anything about collators. In fact any libp2p node on the internet can try to connect to backing validators pretending to be a collator. This is problematic, because they would eat up available connection slots, potentially preventing any real collators from connecting. The underlying problem is that being a libp2p node (having a PeerId) is very cheap, hence it is easy to issue millions of connection requests. What we need to be able to do, is churn through them as quickly as possible. By simply looking up the PeerId in those score maps, we have a very fast way of dropping low scored peers. Therefore to be able to cause DoS you will need to build up a good score for your PeerId, which requires effort and is no longer cheap. In essence we are introducing a proof of work concept, but with the work being useful (advertising and providing collations).

Determining the Score

Finding a good algorithm for determining the peer score is challenging. We have a few conflicting requirements:

  1. Advancing the peer score should not be parallelizable: Not so much a problem for connection management, but would be a problem in collation fetching.
  2. We would like to avoid centralization, where a single peer is always chosen. This would be bad, especially if scores of all other peers had no way of advancing because of this, as then we would have a single point of failure and the system would not "learn".
  3. We would like to fetch from peers with higher scores as those have been proven and are less likely to be messing with us.

For solving (2), we could increase peer score on each valid/expected advertisement, if a corresponding collation was successfully fetched and validated from any peer. The problem here is, that this would violate (1) as we would potentially advance scores for a lot of peers at the same time. This would likely still be fine for connection management, but not for collation fetching: E.g. assume 100 bad peers in the set and we try one after the other, every time with a timeout of a few seconds: This would be a significant service degradation, which is also easy to keep up, as we can only punish on actual fetches. The good news is, a complete DoS over longer periods of time could be avoided, but an attacker could still mess with block times quite a bit.

For solving (1) we could only advance score on successful collation fetches after they passed validation. We can think of at least two potential issues with this approach:

  1. Score values will only advance very slowly. Hence those peers should be very long lived and we need a good bootstrapping story.
  2. In combination with (3): We would only improve scores of peers already having a high score.

The compromise, I am proposing, should work with async backing:

  1. We randomly pick a connected peer, preferring lower scored ones.
  2. If that peer fails to provide a good collation, we fall back to the highest scored connection.

This way, lower scored peers get an opportunity ramping up their score, but at the same time malicious actors will hardly affect quality of service. We could even make the system adaptive, by keeping track of the current error rate: If it reaches some threshold we could fallback to parallel fetches: Instead of trying a low scored peer first and then falling back to a high score peer, we download from both at the same time. The purpose of the download from the lower scored peer is only for advancing its score.

The downside of this algorithm is, that it will learn very slowly when under attack. Good seeding would help in those situations, see the next section.

Not providing an advertised collation in time when requested, will lower the peer's reputation the same amount as it would have been increased when providing it. Given that we only score up and down on providing/not providing valid collations that amount can likely be 1 score point.

Note: With asynchronous backing it becomes a lot easier to provide lots of valid collations (bounded, but not just 1), which are valid, but garbage in the sense that only one of them will end up getting backed on chain. Therefore it might make sense to keep track of candidates and only reward providers once we saw them backed on chain. The scraping for this is already in place.

Seeding and on-demand parachains

As mentioned, the systems is learning rather slowly, especially when under attack. To make the system more robust it makes sense to offer the possibility of seeding the score map. We can provide an extrinsic for registering PeerIds on chain, which will then be used to seed the score map. By requiring a fee and a deposit for this registration, DoS attempts would become very expensive.

Especially in the light of shared collator sets, where collators offer their services to multiple paras it might make sense to even make this weighted. E.g. there can be a high QoS collator service provider who asks for rather high fees, but in exchange accepts higher opportunity costs via a larger deposit - which grants it a higher initial score.

Alternatives we punt on for now

While being able to churn quickly through connection requests should get us a long way, there is actually an even more resilient way of having "good" connections: Have the validator connect to the collator! This is super robust and virtually non DoS able, the problem is that we have the requirement that we want the collator set to be open and opaque to validators. For this reason we punt on this possibility for now.

What could be sensible in a future iteration is, once we reached our connection threshold and we are dropping connection requests, we could try to establish a few connections to nodes with a known high score ourselves.

Relation to substrate peer set slots

In a first version we will implement this purely in the Polkadot code base in the collator protocol itself. We will set the limit on incoming connections to a quite high value (e.g. 3000) and then start connection dropping at a fraction of that value. Later on we might want to think of pushing this to the substrate level itself as then we could drop connections faster - e.g. right at connection establishment.

Relevant Assignments

Validators have to keep track of their core assignments. They should always maintain a view of currently relevant parachains (as given by currently active leaves) and also any upcoming parachains. We should then make sure to always have the score maps relevant to those relevant assignments loaded.

For a start we will not differentiate between current and upcoming assignments and assume they are equally important. Which should be good enough: It might seem that current assignments are more important, but in fact those might already have provided a collation, while the upcoming ones could profit a lot from the early connection establishment, especially when there is an attack where it might take a bit to establish "good" connections.

Code Structure

WIP
The when and how a connection should be established should be separated from the how and when to produce/provide a collation:

  • Pre-connect
  • Separating block producers from block providers

Cumulus Changes

To fully utilize the new protocol, parachains should gossip blocks within their own network and have multiple collators/nodes provide the collation to the backing group.

The providing node does not even have to be a collator. Technically it would be possible to have service providers maintaining a good reputation with validators to offer their "collating" services to multiple parachains.

Previous Work on this

Anything still relevant should have made it through to this ticket, regardless for reference, there exists an older variant of this ticket.

Alternative discussion on connection management leading to the solution proposed here.

Implementation steps

Draft, just notes for me to not forget things:

  • Write RFC
  • Implement backing tracking - only ramp up score once successfully backed on chain: Might be no silver bullet though, as we might end up "punishing" the wrong guy. Problem statement: If we assume that there is only one collator eligible at a given point in time, then the attack vector would be that the collator messes with providers, by providing additional (equivalent, but not identical) collations to other providers. The provider could easily protect himself though, by detecting this behavior, also for the separation an incentivization scheme has to be found anyways, which should be cover these things. Worth mentioning though that such an attack with the right timing would work 100% of the time. For collators being the providers (initially), the attack does not work.
@rphmeier
Copy link
Contributor

rphmeier commented Jun 13, 2023

Thanks for this writeup! It would make a good RFC once the fellowship/OpenGov goes live on Polkadot.


Let's consider supporting collators which simultaneously collate for several parachains at once. A few reasons:

  • Collators which rotate across parachains as they are part of some higher-level mercenary collator pool (cc @girazoki, Tanssi is the initial real user of this functionality)
  • Collators which don't collate at all but are just responsible for distributing collations made by parachains to Polkadot validators (strengthening the "one honest node" property mentioned)
  • Collators which simultaneously collate on multiple parachains (and author blocks with atomic transactions between them, i.e. shared collators)

On another point: In light of the research I've been doing on blockspace regions I believe that collators (or even collations themselves) should tag parachain blocks with the region they are intended to occupy. We should support this in the initial design.

The goal should be for parachains to be able to hold an unbounded number of regions and for validators not to have to do the messy work of figuring out which parachain blocks are assigned to which regions (as the chain extension logic for parachains has to take this into account anyway). In the last post in that forum thread I suggested that regions should have a unique identifier, which validators receiving announcements can check to see if the region is in surplus or will be soon.

@eskimor
Copy link
Member Author

eskimor commented Feb 15, 2024

Slightly related: #3168

Also consider merging all tightly coupled backing subsystems into one with encapsulated modules and nice separation of concerns - clear parent/child relationships.

@eskimor
Copy link
Member Author

eskimor commented Apr 25, 2024

Async backing parameters (max depth and such), should take into account how many cores are assigned to a parachain.

@eskimor
Copy link
Member Author

eskimor commented Oct 18, 2024

Update:

Original ticket still holds. Except of the following:

Drop idea of relayers

It is no longer the goal that other nodes can provide a collation for you. Reasoning:

  1. The idea of separate relayer nodes sounds nice in theory, but in practice they would only be useful in attacks scenarios, so it is questionable whether people would be interested in incentivizing them. Setting up that incentivization layer also would add significant amount of complexity.
  2. This would actually require collators to not only gossip blocks, but also provide the full PoV to other nodes. Right now they don't do it, which means an attacker who wanted to gain some reputation by providing good collators would need to rebuild it itself from the block, which requires time and effort. This is a good property worth keeping.

New Reputation Mechanism

We will use a much faster learning algorithm and thus are also able to punt on the bootstrapping improvements.

Faster learning

Building up a good reputation, due to lack of consensus data happened locally on each validator. Meaning each validator has to learn individually. But indeed since RFC-0103 we have the means to do better.

We will introduce another UMPSignal in addition to SelectCore, ApprovedPeer:

pub enum UMPSignal {
     /// already existing `SelectCore`
    SelectCore(CoreSelector, ClaimQueueOffset),

    /// Signal to the relay chain a peer that should be considered good by the relay chain:
   ApprovedPeer(PeerId),
}

A collator can provide that PeerId to the runtime via an inherent and the parachain runtime will put it in the commitments by sending the ApprovedPeer relay chain signal.

The relay chain will now emit events for those approved peers, whenever a candidate got backed successfully. Validators will scrape the chain for those events and will thus all validators will learn about those approved peers when ever the chain produces a block.

What PeerId the collator provides does largely not matter. Usually it would be its own, but it could also use its opportunity to promote other peers of the network.

Reputation Management

Increasing reputation can now be entirely be sourced by learnings from those relay chain events. Everytime you see such an event for a parachain you would increase the reputation of that peer by 1 for that parachain.

We would decrease reputation whenever a peer wasted our time: Advertised a collation that could not be fetched or turned out invalid. In those cases we would decrease the peer's reputation by the number of cores available on the system times 2. Increases happen system wide, while decreases happen per core/validator. Reputations are positive numbers, we don't drop below 0. We do want negative reputation changes be more aggressive than positive ones (factor 2 above), to prevent censoring collators from pushing out new colleagues.

Staying connected while not advertising a collation should not be punished. We either just allow enough connections to be able to keep all connections to peers open with good reputations. E.g. 2000 would allow for each parachain to have up to a thousand good nodes. This might work, an alternative is to disconnect connections without pending advertisements in a random fashion when running out of connections.

Reputations will be stored in a persisted LRU. The LRU should have a size of 1000 per parachain.

When multiple connected peers advertise a collation for some paraid, we will always fetch from the peer with the highest reputation (module cheap sanity checks - e.g. parent_head_data_hash, if the candidate forms a fork, we can ignore it right away).

@burdges
Copy link

burdges commented Oct 18, 2024

We're adding this into existing backing events, not adding new events, right?

I'd kinda assumed the PVF new the block builder already when I suggested outputting this id, but yeah fine it's some new input instead. I'm now worried the PVF should've more control over block building though, but maybe that's another topic.

1000 parachains nodes should be overkill, but likely your LRU data strcucture reallocates with more memory when required, with 1000 being some hard limit?

@sandreim
Copy link
Contributor

I'd kinda assumed the PVF new the block builder already when I suggested outputting this id, but yeah fine it's some new input instead. I'm now worried the PVF should've more control over block building though, but maybe that's another topic.

I think it is fine in this case, not sure what you mean by PVF to have more control over block building.

@sandreim
Copy link
Contributor

sandreim commented Oct 19, 2024

  1. The idea of separate relayer nodes sounds nice in theory, but in practice they would only be useful in attacks scenarios, so it is questionable whether people would be interested in incentivizing them. Setting up that incentivization layer also would add significant amount of complexity.
  1. This would actually require collators to not only gossip blocks, but also provide the full PoV to other nodes. Right now they don't do it, which means an attacker who wanted to gain some reputation by providing good collators would need to rebuild it itself from the block, which requires time and effort. This is a good property worth keeping.

Requiring collator advertising the collation to be also the block author would be a really good simplification.

We introduce an additional PVF entry point (collator_peer_id or smth ) that just returns the peer id of the block author. We call this before executing the PVF. This will be a very fast check and allows to avoid wasteful execution and drop candidates early.
To get the PeerId we need to require collators to provide it to the collator-selection cumulus pallet.

@bkchr
Copy link
Member

bkchr commented Oct 20, 2024

Reputations will be stored in a persisted LRU. The LRU should have a size of 1000 per parachain.

So, how will new parachain nodes be able to join this reputation system? If the answer is "the parachain will need to handle this somehow", you are just moving the problem to someone else.

We introduce an additional PVF entry point (collator_peer_id or smth ) that just returns the peer id of the block author. We call this before executing the PVF. This will be a very fast check and allows to avoid wasteful execution and drop candidates early.

If you already have downloaded the entire POV, you can also directly call into validate_block because one of the first checks is that the signature is valid. Not sure what else you would get from checking the peer id of the block author.

@sandreim
Copy link
Contributor

sandreim commented Oct 21, 2024

We introduce an additional PVF entry point (collator_peer_id or smth ) that just returns the peer id of the block author. We call this before executing the PVF. This will be a very fast check and allows to avoid wasteful execution and drop candidates early.

If you already have downloaded the entire POV, you can also directly call into validate_block because one of the first checks is that the signature is valid. Not sure what else you would get from checking the peer id of the block author.

I was assuming this happens after the block is imported. If we do check and stop execution immediately we should also check if the provided peer id in the inherent is the one of the collator that built the block.

Currently a malicious collator needs to import the block before he can create a PoV with his own peer id in the inherent to gain some more reputation, so this makes it not possible. But if we implement transaction streaming or any kind of consensus on what goes in the next block, a malicious collator could build the same block without waiting for it to execute and potentially provide the collation fast enough to gain some free reputation.

@bkchr
Copy link
Member

bkchr commented Oct 21, 2024

I was assuming this happens after the block is imported.

Imported where? I'm confused.

If we do check and stop execution immediately we should also check if the provided peer id in the inherent is the one of the collator that built the block.

I don't get this. Where do we check this?

Currently a malicious collator needs to import the block before he can create a PoV with his own peer id in the inherent to gain some more reputation, so this makes it not possible.

A malicious collator doesn't has the private key of the expected author and thus, can not sign any valid block that the validation function will accepted. They can not just add some inherent, because that would change the hash of the block and thus, the signature would fail to verify the block.

But if we implement transaction streaming or any kind of consensus on what goes in the next block, a malicious collator could build the same block without waiting for it to execute and potentially provide the collation fast enough to gain some free reputation.

The same as above. The malicious collator doesn't has the private key of the expected author and can not just add some inherent.

@eskimor
Copy link
Member Author

eskimor commented Oct 21, 2024

Not sure what else you would get from checking the peer id of the block author.

You learn faster as you learn good peers from the entire network not just from your own backing. If a collation makes it through somewhere in the entire network, all learn that good peer immediately.

So, how will new parachain nodes be able to join this reputation system? If the answer is "the parachain will need to handle this somehow", you are just moving the problem to someone else.

The very first block will be best effort - just as it is now (always). Once the collator brought one block through, it is good. Another possibility is cooperation of an existing collator. There is no requirement that that output PeerId is your own, hence a friendly collator with an already sufficient reputation can instead choose to provide a PeerId of a new-joiner.

Requiring collator advertising the collation to be also the block author would be a really good simplification.

Not sure what this would gain us, apart from preventing the above. You provided a good candidate, you get the opportunity bump the reputation of some peer. Put differently, why would we want to turn down a good collation, because we received it from a different peer?

We introduce an additional PVF entry point (collator_peer_id or smth ) that just returns the peer id of the block author. We call this before executing the PVF. This will be a very fast check and allows to avoid wasteful execution and drop candidates early.

Could be a useful additional enhancement. If that entry point has a much shorter timeout than the actual PVF, we could more cheaply try out peers with a low reputation (0), to give them an opportunity to gain a reputation. In practice this should not really be an issue though: Conceptually for any non-forky parachain (e.g. Aura based) there is only one eligible block producer, hence only one collator can advertise a collation and then also being able to provide it. Not providing it will cause a reputation drop, hence you can not DoS a reputation 0 node by getting preferred, without losing reputation on your own. (Reputation losses must be larger than reputation increase for providing good collations)

By assuming block producers are POV providers the reputation system is made easier, because we no longer need to pick lower scored peers to give them an opportunity. We only need to pick them, if no higher scored peer is advertising a collation for that chain.

@eskimor
Copy link
Member Author

eskimor commented Oct 22, 2024

Once ready, we can also use this for improving the bootstrapping story.

Technically #6173 could also replace the mechanism suggested here, I am undecided whether that is a good idea:

  1. What is proposed here is conceptually simpler. E.g. validators don't need to connect to a parachain to retrieve reputation data. I am also pretty sure that we can get the solution sketched above implemented faster.
  2. Apart from bootstrapping, it is also more robust in practice as collators are not required to do anything "extra" to get good reputation. They just need to upgrade their nodes.
  3. (2) also translates to a better UX.
  4. Proof of DOT #6173 does not implement a reputation system in itself. We would still need to keep track of provided good and garbage collations. Doing this network wide is still beneficial: E.g. a peer not always able to provide a collation in time to you specifically: We would not make matters worse by banning that peer if we learned from the network that this is only a local issue.

@burdges
Copy link

burdges commented Nov 19, 2024

We should be careful that reputation still permits archive nodes to download from availability.

A useful infrastructure model for a parachain is: There exist parahcain archive nodes, which the parachain team has a high confidence never disappear, in part because they never make blocks or output anything. There are on-chain rewards which encurage parachain collators, but if they all go away or become malicious, then the parachain team or others can create new collators from the state held by the archive node. I guess this requires having some reserved collator slots too, which folks have always imagined doing by PoW but should probably be done by delegation via some certificate.

@tmpolaczyk
Copy link
Contributor

Connection management: When to connect, when to disconnect

Any update on this? I believe currently collators don't properly disconnect from validators, because I see these logs in the validators when a parachain runs out of core time:

2024-11-27 15:04:30.422 DEBUG tokio-runtime-worker parachain::network-bridge-rx: action="PeerConnected" peer_set=Collation version=2 peer=PeerId("12D3KooWPJT4QoqgwDWJzHZHDL8iCgkjKzswTwWGcYHHfEjBEerv") role=Full
2024-11-27 15:04:30.423 DEBUG tokio-runtime-worker parachain::collator-protocol: Declared as collator for unneeded para. Current assignments: {} peer_id=PeerId("12D3KooWPJT4QoqgwDWJzHZHDL8iCgkjKzswTwWGcYHHfEjBEerv") collator_id=Public(8e6e0feedba7494a19662e3178bc66b6801716ee4c12e304c78fde02cc96941c (14DkVhzA...)) para_id=Id(2001)
2024-11-27 15:04:30.423 DEBUG tokio-runtime-worker parachain::reputation-aggregator: Reduce reputation peer=PeerId("12D3KooWPJT4QoqgwDWJzHZHDL8iCgkjKzswTwWGcYHHfEjBEerv") rep=CostMinor("An unneeded collator connected")
2024-11-27 15:04:30.423 DEBUG tokio-runtime-worker parachain::network-bridge-rx: action="PeerDisconnected" peer_set=Collation peer=PeerId("12D3KooWPJT4QoqgwDWJzHZHDL8iCgkjKzswTwWGcYHHfEjBEerv")
2024-11-27 15:04:30.549 DEBUG tokio-runtime-worker parachain::network-bridge-rx: action="PeerConnected" peer_set=Collation version=2 peer=PeerId("12D3KooWPjcG7TYtZfkoyiTK1esowizuP48uff7RNZHR5BraMXqH") role=Full

It stays like this forever, with the same peer trying to connect and getting banned exactly every 1 second. Should I create a separate issue? Or any advice on how to fix it? Although since it keeps connecting forever, the collator doesn't actually get banned so this is not a problem? Not sure

@sandreim
Copy link
Contributor

@tmpolaczyk please create a separate ticket. This one is just for discussing the proposed collator protocol design updates.

@alindima
Copy link
Contributor

alindima commented Jan 7, 2025

With this revamp, I don't think we're solving the big problem of a malicious collator being able to regenerate their peerid very easily after getting a bad reputation.

When multiple connected peers advertise a collation for some paraid, we will always fetch from the peer with the highest reputation (module cheap sanity checks - e.g. parent_head_data_hash, if the candidate forms a fork, we can ignore it right away).

so how is this not leading to centralisation?
or is this no longer a problem because we assume the block author accepted by the parachain runtime is very likely unique (via aura or some other consensus algorithm).

@eskimor
Copy link
Member Author

eskimor commented Jan 9, 2025

or is this no longer a problem because we assume the block author accepted by the parachain runtime is very likely unique (via aura or some other consensus algorithm).

Exactly. An important prerequisite is that we drop the idea of block providers being disjoint from block produces. We now assume that the block producers themselves are providing the PoV. Then the level of decentralization is up to the parachain.

What we also then get from the parachain consensus is some cost to the attack. While a collator with good reputation could try to mess with block production opportunities from his colleagues, he would get punished and will therefore lose out on rewards himself.

@eskimor
Copy link
Member Author

eskimor commented Jan 9, 2025

Good point came up in a discussion with @alindima today: We should likely consider in case of only reputation 0 advertisement to wait some time before attempting to fetch to give a positive reputation node a chance to connect and advertise too. This would further help reducing the impact malicious nodes can have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I6-meta A specific issue for grouping tasks or bugs of a specific category.
Projects
Status: Backlog
Status: To do
Development

No branches or pull requests

8 participants