Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent fork-choice: Validator sometimes chooses own secondary block over others primary #6013

Open
2 tasks done
ToufeeqP opened this issue Oct 10, 2024 · 5 comments
Open
2 tasks done
Assignees
Labels
D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. I2-bug The node fails to follow expected behavior.

Comments

@ToufeeqP
Copy link
Contributor

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

I have observed an inconsistency in the fork-choice rule on our Substrate-based chain. As per the protocol, when there are two blocks for the same slot—one primary and one secondary—the validator should consider the primary block as the best block. However, this is not consistently followed.

Issue Details:

  • Expected behaviour: When a validator has to choose between its own secondary block and another validator’s primary block for the same slot, the primary block should be considered the best.
  • Actual behaviour: Sometimes, a validator chooses its own secondary block over the primary block. This does not occur consistently but has been observed multiple times.
  • Impact: As a result of this issue, some slots are missed, resulting in fewer blocks in an epoch than expected. This happens very rarely but has a noticeable impact.

Steps to reproduce

You can reproduce this using a substrate-node and a local chain setup with two validators. Monitor the fork-choice rule by observing the best block selection when validators must choose between primary and secondary blocks for the same slot. With a two-validator setup, this phenomenon typically occurs within 4 or 5 epochs

@ToufeeqP ToufeeqP added I10-unconfirmed Issue might be valid, but it's not yet known. I2-bug The node fails to follow expected behavior. labels Oct 10, 2024
@bkchr
Copy link
Member

bkchr commented Oct 10, 2024

Could you please provide some logs from when this is happening? Best with -lbabe=trace.

@ToufeeqP
Copy link
Contributor Author

node1.txt
node2.txt

Focus on slot 576204411 where node2 was primary & node1 was secondary, but node1 selected its own secondary as best & proposed the next block on top of it for slot 576204412.

Let me know if you need additional info

@bkchr
Copy link
Member

bkchr commented Oct 14, 2024

I checked the code and I think I know what the problem is. Block production and block import from the network are running in parallel.

The way the best weight is determined by loading it from the backend, but the backend is only updated when the block is imported. This means that when a block is imported from the network and the locally built block is imported in parallel, we can run into the issue that you are describing.

To solve this we need to store the block weight information in the shared state.

@bkchr bkchr added this to SDK Node Oct 14, 2024
@github-project-automation github-project-automation bot moved this to backlog in SDK Node Oct 14, 2024
@bkchr bkchr added D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. and removed I10-unconfirmed Issue might be valid, but it's not yet known. labels Oct 14, 2024
@Nathy-bajo
Copy link

I want to try attempting this, assign me please.

@bkchr
Copy link
Member

bkchr commented Jan 24, 2025

The issue should be in this code:

block.fork_choice = {
let (last_best, last_best_number) = (info.best_hash, info.best_number);
let last_best_weight = if &last_best == block.header.parent_hash() {
// the parent=genesis case is already covered for loading parent weight,
// so we don't need to cover again here.
parent_weight
} else {
aux_schema::load_block_weight(&*self.client, last_best)
.map_err(|e| ConsensusError::ChainLookup(e.to_string()))?
.ok_or_else(|| {
ConsensusError::ChainLookup(
"No block weight for parent header.".to_string(),
)
})?
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. I2-bug The node fails to follow expected behavior.
Projects
Status: backlog
Development

No branches or pull requests

3 participants