You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our kusama nodes that run with litep2p discover very slowly their peers, which will result in them being sparsely connected, with lib2p on connect to 95% of nodes on 5m while with litep2p it takes around ~30m to get to around 90% of connections and another 30 min to reach around 95%.
Why is this bad
Sparsely connected validators contribute negatively to the network in a few ways:
They won't see enough assignments and approvals to approve the candidate, so they won't vote on finality.
Because they aren't connected to at least 1/3 of the network they won't be able to approve candidates, but assignments will be triggered and distributed, so they will count as no-shows on other nodes. Other nodes will cover the no-show, but that still means they contribute to finality being delayed for at least no-show period.
My take
If we have enough of this type of nodes all restarting at the same time, we will end up stress testing the network, theoretically we should be able to support 1/3 of the network being in this state at the same time, but if we get past the 1/3 threshold the finality will lag until the fail safe kicks in. With that in mind I think we should first improve litep2p on this dimension before we enable it on a significant numbers of validators in kusama.
cc: @paritytech/networking
The text was updated successfully, but these errors were encountered:
The slow discovery of authority records is related to the fact that litep2p provides found records after the kademlia query finishes execution. In contrast, libp2p provides the record as soon as it receives it from the network.
Adopting a similar approach for litep2p (provide records as soon as we discover them), leads to significant improvements, outpeforming libp2p:
libp2p: discovering 1k records took 10 minutes (left graph)
litep2p without improvements: 37 minutes (middle graph)
litep2p with improvements: 2.5 minutes (right graph
I'll have another look tomorrow and investigate CPU consumption, which might increase with the number of propagated messages:
I appreciate that these issues are generally for bugs and the like. However, thus far my experience is good on all thirty of my validators minus a few hiccups at the beginning. Given that we may now have more peers using litep2p, how do the numbers look, has the time to reach 95% peers reduced?
Our kusama nodes that run with litep2p discover very slowly their peers, which will result in them being sparsely connected, with lib2p on connect to 95% of nodes on 5m while with litep2p it takes around ~30m to get to around 90% of connections and another 30 min to reach around 95%.
Why is this bad
Sparsely connected validators contribute negatively to the network in a few ways:
They won't see enough assignments and approvals to approve the candidate, so they won't vote on finality.
Because they aren't connected to at least 1/3 of the network they won't be able to approve candidates, but assignments will be triggered and distributed, so they will count as no-shows on other nodes. Other nodes will cover the no-show, but that still means they contribute to finality being delayed for at least no-show period.
My take
If we have enough of this type of nodes all restarting at the same time, we will end up stress testing the network, theoretically we should be able to support 1/3 of the network being in this state at the same time, but if we get past the 1/3 threshold the finality will lag until the fail safe kicks in. With that in mind I think we should first improve litep2p on this dimension before we enable it on a significant numbers of validators in kusama.
cc: @paritytech/networking
The text was updated successfully, but these errors were encountered: