Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

litep2p authority discovery is very slow 5m vs 1h #7077

Open
Tracked by #7076
alexggh opened this issue Jan 7, 2025 · 2 comments · May be fixed by #7099
Open
Tracked by #7076

litep2p authority discovery is very slow 5m vs 1h #7077

alexggh opened this issue Jan 7, 2025 · 2 comments · May be fixed by #7099

Comments

@alexggh
Copy link
Contributor

alexggh commented Jan 7, 2025

Our kusama nodes that run with litep2p discover very slowly their peers, which will result in them being sparsely connected, with lib2p on connect to 95% of nodes on 5m while with litep2p it takes around ~30m to get to around 90% of connections and another 30 min to reach around 95%.

Screenshot 2025-01-07 at 15 47 20

Why is this bad

Sparsely connected validators contribute negatively to the network in a few ways:

  1. They won't see enough assignments and approvals to approve the candidate, so they won't vote on finality.

  2. Because they aren't connected to at least 1/3 of the network they won't be able to approve candidates, but assignments will be triggered and distributed, so they will count as no-shows on other nodes. Other nodes will cover the no-show, but that still means they contribute to finality being delayed for at least no-show period.

My take

If we have enough of this type of nodes all restarting at the same time, we will end up stress testing the network, theoretically we should be able to support 1/3 of the network being in this state at the same time, but if we get past the 1/3 threshold the finality will lag until the fail safe kicks in. With that in mind I think we should first improve litep2p on this dimension before we enable it on a significant numbers of validators in kusama.

cc: @paritytech/networking

@lexnv
Copy link
Contributor

lexnv commented Jan 7, 2025

The slow discovery of authority records is related to the fact that litep2p provides found records after the kademlia query finishes execution. In contrast, libp2p provides the record as soon as it receives it from the network.

Adopting a similar approach for litep2p (provide records as soon as we discover them), leads to significant improvements, outpeforming libp2p:

  • libp2p: discovering 1k records took 10 minutes (left graph)
  • litep2p without improvements: 37 minutes (middle graph)
  • litep2p with improvements: 2.5 minutes (right graph

I'll have another look tomorrow and investigate CPU consumption, which might increase with the number of propagated messages:

Screenshot 2025-01-07 at 17 57 56

@paradox-tt
Copy link
Contributor

paradox-tt commented Jan 9, 2025

I appreciate that these issues are generally for bugs and the like. However, thus far my experience is good on all thirty of my validators minus a few hiccups at the beginning. Given that we may now have more peers using litep2p, how do the numbers look, has the time to reach 95% peers reduced?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants