Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow down consensus (increase timeouts) when vertex store is close to being full #859

Open
wants to merge 3 commits into
base: feature/vertex-store-overflow-mitigations
Choose a base branch
from

Conversation

LukasGasior1
Copy link
Contributor

No description provided.

Copy link

github-actions bot commented Mar 5, 2024

Docker tags
docker.io/radixdlt/private-babylon-node:pr-859
docker.io/radixdlt/private-babylon-node:ed4abba607
docker.io/radixdlt/private-babylon-node:sha-ed4abba


// It should already be in the [0, 1] range, but we're nonetheless sanitizing the input
final var vertexStoreUtilizationRatioClamped =
Math.max(0, Math.min(1, vertexStoreUtilizationRatio));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guava has the clamp function, with a conveniently overcomplicated name: Doubles.constrainToRange(vertexStoreUtilizationRatio, 0, 1)

final var multiplier =
Math.max(
1, // Multiplier is 1 (i.e. no-op) if we're below the threshold
lerp(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess what... Guava has the lerp function, with a cool (if not too-fluent) style: LinearTransformation.mapping(0.66, 1.0).and(1.0, 10.0).transform(ratio);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guava has it all 😄

public @interface AdditionalRoundTimeIfProposalReceivedMs {}
public PacemakerTimeoutCalculatorConfig {
Preconditions.checkArgument(
baseTimeoutMs > 0, "timeoutMs must be > 0 but was " + baseTimeoutMs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nitpick) this supports formatting (%s).

@jakrawcz-rdx
Copy link
Contributor

image

😱

Copy link

sonarqubecloud bot commented Mar 6, 2024

Copy link
Contributor

@jakrawcz-rdx jakrawcz-rdx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I only have one non-critical remark 👍

new PacemakerTimeoutCalculatorConfig(baseTimeout, 2.0, 0, 0L, 0.6, 10));

// spotless:off
final Map<Double, Long> testCases =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) There is a lightweight way for test parameterization: @RunWith(JUnitParamsRunner.class) (do not confuse with the lame @RunWith(Parameterized.class)!)

(see e.g.

@Parameters(method = "unsolicitedSyncResponseExceptions")
)

Copy link
Contributor

@dhedey dhedey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice, thanks

bind(PacemakerTimeoutCalculatorConfig.class)
.toInstance(new PacemakerTimeoutCalculatorConfig(3000L, 1.2, 8, 30_000L, 0.66, 10));

// Delayed resolution is disabled for now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be worth explaining why in this comment? (i.e. because we cannot create QCs on fallback vertices, because we don't just sign the ledger header, but also the BFT header, which captures the previous certificate chain, and all the nodes have a different certificate chain for their fallback vertex)

(double) -1, baseTimeout,
0.5, baseTimeout,
0.6, baseTimeout,
0.61, 1225L,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's slightly weird to me that half of these are in terms of baseTimeout and half of them are absolute numbers - can we standardize?

we start multiplying the timeout by a linearly increasing value, up to 10x.
So a maximum theoretical timeout is 130s. */
bind(PacemakerTimeoutCalculatorConfig.class)
.toInstance(new PacemakerTimeoutCalculatorConfig(3000L, 1.2, 8, 30_000L, 0.66, 10));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm finding it a little hard to read this - because all of these don't have names any more.

Perhaps we should have a PacemakerTimeoutCalculatorConfig::default() and a PacemakerTimeoutCalculatorConfig::testing()? And inside those methods, we can label the values with variable names, and then pass the variables into the constructor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For review purposes, these are:

    long baseTimeoutMs: 3 000
    double consecutiveTimeoutSlowdownRate: 1.2
    int consecutiveTimeoutMaxExponent: 8 // 1.2^8 = 4.29981696
    long additionalRoundTimeIfProposalReceivedMs: 30 000
    double vertexStoreMultiplierThreshold: 0.66
    double maxVertexStoreMultiplier: 10


// We're only applying the multiplier if vertexStoreUtilizationRatio is
// on or above vertexStoreMultiplierThreshold: we're translating from
// range [vertexStoreMultiplierThreshold, 1] to [1, maxVertexStoreMultiplier].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was partially expecting to see an exponential here

(So roughly take linearly map: [vertexStoreMultiplierThreshold, 1] to exponent: [0, maxVertexStoreExponent] and then return Math.round(config.baseTimeoutMs() * timeoutExponential * vertexSizeExponential);)

But I think linear is fine too and does its job.

import javax.inject.Qualifier;
public record PacemakerTimeoutCalculatorConfig(
long baseTimeoutMs,
double rate,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should rename these to e.g. consecutiveTimeoutSlowdownRate, consecutiveTimeoutMaxExponent to make clear they just apply to that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants