-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Batch proposal spend limits #3471
base: staging
Are you sure you want to change the base?
Conversation
This commit also modifies how transmissions are included in a batch. They are only drained from the workers once their validity has been checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope it helps
let process = self.ledger.vm().process(); | ||
|
||
// Deserialize the transaction. If the transaction exceeds the maximum size, then return an error. | ||
let transaction = match transaction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deserialization is extremely expensive. I would consider moving this calculation into ledger.rs:check_transaction_basic
, where we already deserialize. Perhaps that function can return the compute cost? Or some other design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider it and should revisit the idea but intuitively we might want to avoid coupling the cost calculation with check_transaction_basic
as it's only called in propose_batch
and not in process_batch_propose_from_peer
, at least not directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point... That is unfortunate, because in the case of process_batch_propose_from_peer
, we might be retrieving the transmission from disk, in which case we'll most certainly have to incur the deserialization cost.
Maybe if we let fn compute_cost
cost take in a Transaction
instead of Data<Transaction>
, it can at least be made explicit. For our own proposal we call it from within check_transaction_basic
, for incoming proposals we'll need to deserialize before calling it.
And if it turns out to be a bottleneck, we can always refactor the locations where we deserialize more comprehensively, and potentially create a cache for the compute_cache if needed.
@@ -235,4 +235,9 @@ impl<N: Network> LedgerService<N> for MockLedgerService<N> { | |||
self.height_to_round_and_hash.lock().insert(block.height(), (block.round(), block.hash())); | |||
Ok(()) | |||
} | |||
|
|||
/// TODO: is this reasonable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// TODO: is this reasonable?
I guess if our mock ledger doesn't have the compute_cost
function, yes its reasonable, no need to extend every test abstraction which exists.
// If the worker is empty, break early. | ||
if worker_transmissions.peek().is_none() { | ||
break 'outer; | ||
let mut worker_transmissions = worker.transmissions().into_iter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So correct me if I'm wrong, but the previous behaviour was to selectively drain transmissions, and if they are duplicates or are incorrect, we discard them.
Your current implementation completely drains transmissions, so besides discaring duplicates or invalid ones, once we hit the size or compute limit, remaining transmissions are discarded.
Some options:
- preprocess the compute spend of a transmission and to cache it in the worker's
Ready
queue. But this would unfortunately require a lot of code changes and wouldn't make sense everywhere. - We drain transmissions one by one. If we notice post-deserialization that we're above the compute limit, we reinsert that one transmission back into the queue using
worker.reinsert
. This has strictly less overhead than draining all transmissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the primary would preemptively drain the workers by batches, sized based on the remaining proposal space. This was fine as transactions were either included or dropped altogether but we need more nuance: we might choose to save a transaction for the following batch if it pushes the batch cost over the spend limit.
For the PoC, I opted to clone the transmissions (happens in worker.transmissions()
), to avoid reinsertion logic initially, leaving the underlying collection untouched until the number of valid transactions is determined and draining it only at the end of the worker loop. Ideally, I'd want to peek
the next item in the collection, and it might still be the most efficient. The next best thing would be to reinsert but that's still O(n) and requires more logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Peeking one by one would work!
Or draining one by one. If we optionally have to reinsert the very last one we drained if it crosses the compute threshold, I'm not sure what would be O(n)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(n) comes from shifting all the transmissions by one in the collection when reinserting at index 0 (to preserve fairness in the ordering of transmissions). Peeking directly likely won't work because it requires either holding the lock (not a good idea) or cloning. But perhaps doing the lookup one-by-one, followed by a removal could work. Though any operation that isn't inserting or removing values from the tail of the map will incur an O(n) shift cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you for explaining.
I take everything back, your approach seems close to optimal (assuming we don't change or introduce additional data structures).
But perhaps doing the lookup one-by-one, followed by a removal could work.
Indeed you can clone and lookup one-by-one (to avoid cloning the entire ready queue). And then batch drain at the end like you're already doing.
let mut worker_transmissions = worker.transmissions().into_iter(); | ||
|
||
// Check the transactions for inclusion in the batch proposal. | ||
while num_transmissions_included_for_worker < num_transmissions_per_worker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering whether it is possible and desirable to put this check in the same place as the compute cost check, because they're both indicating "the batch proposal is too full now".
The loop could then be something like While let Some(transmission) = worker.drain(1)
.
Up to you, just a suggestion
This PR introduces spend limit checks on batch proposals at construction and prior to signing. This requires some changes to batch construction: the workers are now drained only after the transmissions have been checked for inclusion in a batch. This, to avoid reinserting transmissions into the memory pool once the spend limit is surpassed. It was also necessary to expose a
compute_cost
function on theLedgerService
trait—its internals might still be moved into snarkVM.This changeset will need to be refined and tested, hence the draft. CI is currently expected to fail.
Related PRs:
BLOCK_SPEND_LIMIT
snarkVM#2565 (previous discussion)