Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build blocks asynchronously #38

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Build blocks asynchronously #38

wants to merge 7 commits into from

Conversation

yacovm
Copy link
Collaborator

@yacovm yacovm commented Jan 2, 2025

When it is a node's turn to propose a block, it calls the BlockBuilder's BuildBlock method.
However, this method may take a long time to complete, and during this time, the Epoch instance
cannot process messages (since proposing a new block may be called indirectly via HandleMessage).

This commit adds an asynchronous task scheduler which the Epoch uses to register a callback to make itself propose the block after building it asynchronously.

@yacovm yacovm marked this pull request as draft January 2, 2025 00:32
epoch.go Outdated Show resolved Hide resolved
@yacovm yacovm changed the title [WIP] Build blocks asynchronously Build blocks asynchronously Jan 2, 2025
@yacovm yacovm marked this pull request as ready for review January 2, 2025 21:52
@yacovm yacovm marked this pull request as draft January 2, 2025 21:55
@yacovm yacovm marked this pull request as ready for review January 2, 2025 22:18
Base automatically changed from epoch to main January 3, 2025 17:56
@yacovm yacovm force-pushed the async branch 3 times, most recently from 0ccaf2d to 3425471 Compare January 3, 2025 21:46
@yacovm yacovm marked this pull request as draft January 3, 2025 23:33
@yacovm yacovm force-pushed the async branch 10 times, most recently from 92eb71e to 0f3ab5e Compare January 14, 2025 21:27
When it is a node's turn to propose or verify a block, it calls the BlockBuilder's BuildBlock() and Verify() methods respectively.
However, this method may take a long time to complete, and during this time, the Epoch instance cannot process messages (since proposing a new block may be called indirectly via HandleMessage).

This commit adds an asynchronous task scheduler which the Epoch uses to register a callback to make itself propose the block or verify it
it asynchronously and continue the program flow from that point onwards.

While the block is verified or built, Simplex is free to handle other messages.

Signed-off-by: Yacov Manevich <[email protected]>
@yacovm yacovm marked this pull request as ready for review January 14, 2025 21:31
epoch.go Outdated

// This finalization may correspond to a proposal from a future round, or to the proposal of the current round
// which we are still verifying.
if (e.round < finalization.Round && finalization.Round-e.round < e.maxRoundWindow) || pendingProposal {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would be a little more clear if we dont have a pendingProposal variable and simply perform all the logic in if statement above? Would make the log more descriptive, and any common code could be separated into a helper method.

Copy link
Collaborator Author

@yacovm yacovm Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epoch.go Outdated
msgsForRound.finalization = message
return nil
}

// Have we already finalized this round?
round, exists := e.rounds[finalization.Round]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we just initialize round, exists before line 428 and remove this line?

Copy link
Collaborator Author

@yacovm yacovm Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epoch.go Outdated
msgsForRound.vote = message
return nil
}

// TODO: what if we've received a vote for a round we didn't instantiate yet?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont think this comment is relevant anymore

Copy link
Collaborator Author

@yacovm yacovm Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epoch.go Outdated
return nil
}

func (e *Epoch) scheduleBlockVerification(block Block, md BlockHeader, from NodeID, vote Vote) func() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a better name could be createBlockVerificationTask since this method doesn't actually schedule

Copy link
Collaborator Author

@yacovm yacovm Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -705,11 +775,24 @@ func (e *Epoch) handleBlockMessage(message *BlockMessage, _ NodeID) error {
return nil
}

pendingBlocks := e.sched.Size()
if pendingBlocks > e.maxPendingBlocks {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im concerned some malicious node in the future could dos the scheduler and single handedly fill it up. This would starve all blockMessageMessages from other nodes even if they are from rounds before the malicious nodes block. Should we add some check to avoid adding duplicate tasks to the scheduler or is this handled in a different way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point.

We should not accept a block for a round that is more than e.maxRoundWindow ahead. I thought this check was being performed, but after going through the code I see it doesn't enforce it explicitly.

I added the check + a test

epoch.go Outdated
// TODO: timeout
}
// Create a task that will verify the block in the future, after its predecessors have also been verified.
task := e.scheduleBlockVerification(block, md, from, vote)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the point of lines 813-827 to do some basic pre verification?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we couple votes from block proposers with the blocks they propose, we don't want some nefarious actor to take a vote from the real proposer and send us its own block + the vote from the real block proposer.

We therefore check the vote actually votes on the block, and is not voting for an alternative block.

epoch.go Outdated
round.votes[string(vote.Signature.Signer)] = &vote

if err := e.doProposed(block, vote, from); err != nil {
e.Logger.Debug("Failed voting on block", zap.Error(err))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change the logger levels to Warn on error?

Copy link
Collaborator Author

@yacovm yacovm Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epoch.go Outdated
Comment on lines 897 to 911
func (e *Epoch) isBlockReadyToBeScheduled(seq uint64, prev Digest) bool {
var ready bool

if seq > 0 {
// A block can be scheduled if its predecessor either exists in storage,
// or there exists a round object for it.
// Since we only create a round object after we verify the block,
// it means we have verified this block in the past.
_, ok := e.locateBlock(seq-1, prev[:])
ready = ok
} else {
// The first block is always ready to be scheduled
ready = true
}
return ready
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (e *Epoch) isBlockReadyToBeScheduled(seq uint64, prev Digest) bool {
var ready bool
if seq > 0 {
// A block can be scheduled if its predecessor either exists in storage,
// or there exists a round object for it.
// Since we only create a round object after we verify the block,
// it means we have verified this block in the past.
_, ok := e.locateBlock(seq-1, prev[:])
ready = ok
} else {
// The first block is always ready to be scheduled
ready = true
}
return ready
func (e *Epoch) isBlockReadyToBeScheduled(seq uint64, prev Digest) bool {
if seq > 0 {
// A block can be scheduled if its predecessor either exists in storage,
// or there exists a round object for it.
// Since we only create a round object after we verify the block,
// it means we have verified this block in the past.
_, ok := e.locateBlock(seq-1, prev[:])
return ok
}
return false

Copy link
Collaborator Author

@yacovm yacovm Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed but keep in mind that your suggestion isn't semantically the same as the original code, because if seq == 0 then we should return true, not false.

if as.close {
return
}
as.signal.Wait()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im concerned we could deadlock here.

  • Say we enter maybeExecuteTask and execute a task that calls Schedule(ready = true)
  • Schedule will call Signal on as.signal
  • We will then finish executing the original task and this line as.signal.Wait() would have missed the prior Signal call, therefore waiting when it should be executing another task.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we execute a task that is ready to run, it will add itself to the ready queue.

Then, it will perform another iteration of the loop in maybeExecuteTask which will not exit, as the ready queue is still not empty, and will execute the next task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants