Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do election in order based on failed primary rank to avoid voting conflicts #1018

Merged
merged 9 commits into from
Jan 11, 2025

Conversation

enjoy-binbin
Copy link
Member

@enjoy-binbin enjoy-binbin commented Sep 11, 2024

When multiple primary nodes fail simultaneously, the cluster can not recover
within the default effective time (data_age limit). The main reason is that
the vote is without ranking among multiple replica nodes, which case too many
epoch conflicts.

Therefore, we introduced into ranking based on the failed primary shard-id.
Introduced a new failed_primary_rank var, this var means the rank of this
myself instance in the context of all failed primary list. This var will be
used in failover and we will do the failover election packets in order based
on the rank, this can effectively avoid the voting conflicts.

If a single primary is down, the behavior is the same as before. If multiple
primaries are down, their replica election initiation time will be delayed
by 500ms according to the ranking.

…flicts

When multiple primary nodes fail simultaneously, the cluster can not recover
within the default effective time (data_age limit). The main reason is that
the vote is without ranking among multiple replica nodes, which case too many
epoch conflicts.

Therefore, we introduced into ranking based on the failed primary node name.
Introduced a new failed_primary_rank var, this var means the rank of this
myself instance in the context of all failed primary list. This var will be
used in failover and we will do the failover election packets in order based
on the rank, this can effectively avoid the voting conflicts.

Signed-off-by: Binbin <[email protected]>
@enjoy-binbin enjoy-binbin added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Sep 11, 2024
Signed-off-by: Binbin <[email protected]>
Copy link

codecov bot commented Sep 14, 2024

Codecov Report

Attention: Patch coverage is 96.15385% with 1 line in your changes missing coverage. Please review.

Project coverage is 70.85%. Comparing base (b3b4bdc) to head (6abc3c1).

Files with missing lines Patch % Lines
src/cluster_legacy.c 96.15% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1018      +/-   ##
============================================
+ Coverage     70.83%   70.85%   +0.01%     
============================================
  Files           120      120              
  Lines         64911    64937      +26     
============================================
+ Hits          45982    46012      +30     
+ Misses        18929    18925       -4     
Files with missing lines Coverage Δ
src/cluster_legacy.c 86.77% <96.15%> (-0.02%) ⬇️

... and 12 files with indirect coverage changes

Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. I like this idea. Thanks @enjoy-binbin!

src/cluster_legacy.c Outdated Show resolved Hide resolved
src/cluster_legacy.c Show resolved Hide resolved
@madolson
Copy link
Member

madolson commented Jan 3, 2025

Seems like a good idea to me as well.

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directionally good with this change, the remaining tests still look good.

Signed-off-by: Binbin <[email protected]>
@enjoy-binbin enjoy-binbin added the release-notes This issue should get a line item in the release notes label Jan 11, 2025
@enjoy-binbin enjoy-binbin merged commit 211b250 into valkey-io:unstable Jan 11, 2025
1 check passed
@enjoy-binbin enjoy-binbin deleted the primary_fail_rank branch January 11, 2025 02:43
proost pushed a commit to proost/valkey that referenced this pull request Jan 17, 2025
…flicts (valkey-io#1018)

When multiple primary nodes fail simultaneously, the cluster can not recover
within the default effective time (data_age limit). The main reason is that
the vote is without ranking among multiple replica nodes, which case too many
epoch conflicts.

Therefore, we introduced into ranking based on the failed primary shard-id.
Introduced a new failed_primary_rank var, this var means the rank of this
myself instance in the context of all failed primary list. This var will be
used in failover and we will do the failover election packets in order based
on the rank, this can effectively avoid the voting conflicts.

If a single primary is down, the behavior is the same as before. If multiple
primaries are down, their replica election initiation time will be delayed
by 500ms according to the ranking.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: proost <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants