litep2p authority discovery is very slow 5m vs 1h #7077

alexggh · 2025-01-07T14:00:23Z

Our kusama nodes that run with litep2p discover very slowly their peers, which will result in them being sparsely connected, with lib2p on connect to 95% of nodes on 5m while with litep2p it takes around ~30m to get to around 90% of connections and another 30 min to reach around 95%.

Why is this bad

Sparsely connected validators contribute negatively to the network in a few ways:

They won't see enough assignments and approvals to approve the candidate, so they won't vote on finality.
Because they aren't connected to at least 1/3 of the network they won't be able to approve candidates, but assignments will be triggered and distributed, so they will count as no-shows on other nodes. Other nodes will cover the no-show, but that still means they contribute to finality being delayed for at least no-show period.

My take

If we have enough of this type of nodes all restarting at the same time, we will end up stress testing the network, theoretically we should be able to support 1/3 of the network being in this state at the same time, but if we get past the 1/3 threshold the finality will lag until the fail safe kicks in. With that in mind I think we should first improve litep2p on this dimension before we enable it on a significant numbers of validators in kusama.

cc: @paritytech/networking

lexnv · 2025-01-07T16:03:39Z

The slow discovery of authority records is related to the fact that litep2p provides found records after the kademlia query finishes execution. In contrast, libp2p provides the record as soon as it receives it from the network.

Adopting a similar approach for litep2p (provide records as soon as we discover them), leads to significant improvements, outpeforming libp2p:

libp2p: discovering 1k records took 10 minutes (left graph)
litep2p without improvements: 37 minutes (middle graph)
litep2p with improvements: 2.5 minutes (right graph

I'll have another look tomorrow and investigate CPU consumption, which might increase with the number of propagated messages:

paradox-tt · 2025-01-09T15:27:10Z

I appreciate that these issues are generally for bugs and the like. However, thus far my experience is good on all thirty of my validators minus a few hiccups at the beginning. Given that we may now have more peers using litep2p, how do the numbers look, has the time to reach 95% peers reduced?

alexggh mentioned this issue Jan 7, 2025

Kusama Validators Litep2p - Monitoring and Feedback #7076

Open

This was referenced Jan 8, 2025

kad: Provide partial results to speedup GetRecord queries paritytech/litep2p#315

Open

litep2p: Provide partial results to speedup GetRecord queries #7099

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

litep2p authority discovery is very slow 5m vs 1h #7077

litep2p authority discovery is very slow 5m vs 1h #7077

alexggh commented Jan 7, 2025 •

edited

Loading

lexnv commented Jan 7, 2025

paradox-tt commented Jan 9, 2025 •

edited

Loading

litep2p authority discovery is very slow 5m vs 1h #7077

litep2p authority discovery is very slow 5m vs 1h #7077

Comments

alexggh commented Jan 7, 2025 • edited Loading

Why is this bad

My take

lexnv commented Jan 7, 2025

paradox-tt commented Jan 9, 2025 • edited Loading

alexggh commented Jan 7, 2025 •

edited

Loading

paradox-tt commented Jan 9, 2025 •

edited

Loading