Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core Index Mismatch in Commitments and Descriptor #7107

Closed
sw10pa opened this issue Jan 9, 2025 · 0 comments · Fixed by #7104
Closed

Core Index Mismatch in Commitments and Descriptor #7107

sw10pa opened this issue Jan 9, 2025 · 0 comments · Fixed by #7104
Assignees
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.

Comments

@sw10pa
Copy link
Member

sw10pa commented Jan 9, 2025

Description

A bug was identified during the implementation of the malus collator using undying collator (PR #6924). The following error message was observed from the collation-generation subsystem when the normal (non-malus) undying collator attempted to generate and submit collations to 3 assigned cores:

ERROR tokio-runtime-worker parachain::collation-generation: Failed to construct and distribute collation: V2 core index check failed: The core index in commitments doesn't match the one in descriptor.

The issue arises because the current code provides core indexes sequentially from the claim queue for the descriptor, while the commitments can include core indexes determined by the parachain using a core selector from UMP signals, potentially in a different sequence. This mismatch leads to the observed error.

Steps to Reproduce

  • Set up a network with an undying collator;
  • Ensure that the parachain is assigned to 3 or more cores;
  • Configure the core indexes to be provided by UMP signals.

Why was this not detected earlier?

  • Test collators (e.g., adder and undying) were not using UMP signals, as they are optional;
  • Elastic scaling collators (e.g., slot-based collators) do not have this problem, as it was addressed in PR #5372;
  • UMP signals are a new thing and collators using collator_fn have not been tested much, as they should not be used in production.

Proposed Solution

  • Modify the logic for constructing the CandidateDescriptorV2 to get the core index from commitments (via UMP signals) instead of using sequential indexes got from the claim queue;
  • Ensure backward compatibility for parachains that do not use UMP signals, allowing the system to function as before in those cases;
  • Add a check to stop processing if the parachain selects the same core multiple times when multiple cores are assigned.
@sw10pa sw10pa added I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known. labels Jan 9, 2025
@sw10pa sw10pa self-assigned this Jan 9, 2025
github-merge-queue bot pushed a commit that referenced this issue Jan 22, 2025
…ents core index (#7104)

## Issue
[[#7107] Core Index Mismatch in Commitments and
Descriptor](#7107)

## Description
This PR resolves a bug where normal (non-malus) undying collators failed
to generate and submit collations, resulting in the following error:

`ERROR tokio-runtime-worker parachain::collation-generation: Failed to
construct and distribute collation: V2 core index check failed: The core
index in commitments doesn't match the one in descriptor.`

More details about the issue and reproduction steps are described in the
[related issue](#7107).

## Summary of Fix
- When core selectors are provided in the UMP signals, core indexes will
be chosen using them;
- The fix ensures that functionality remains unchanged for parachains
not using UMP signals;
- Added checks to stop processing if the same core is selected
repeatedly.

## TODO
- [X] Implement the fix;
- [x] Add tests;
- [x] Add PRdoc.
@github-project-automation github-project-automation bot moved this from Backlog to Completed in parachains team board Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.
Projects
Status: Completed
1 participant