[Feature] Batch proposal spend limits #3471

niklaslong · 2025-01-22T14:45:31Z

This PR introduces spend limit checks on batch proposals at construction and prior to signing. This requires some changes to batch construction: the workers are now drained only after the transmissions have been checked for inclusion in a batch. This, to avoid reinserting transmissions into the memory pool once the spend limit is surpassed. It was also necessary to expose a compute_cost function on the LedgerService trait—its internals might still be moved into snarkVM.

This changeset will need to be refined and tested, hence the draft. CI is currently expected to fail.

Related PRs:

[Feature] Batch proposal spend limits snarkVM#2601 (required for merge)
[Feature] Introduce BLOCK_SPEND_LIMIT snarkVM#2565 (previous discussion)

This commit also modifies how transmissions are included in a batch. They are only drained from the workers once their validity has been checked.

vicsn

Hope it helps

vicsn · 2025-01-22T15:53:26Z

node/bft/ledger-service/src/ledger.rs

+        let process = self.ledger.vm().process();
+
+        // Deserialize the transaction. If the transaction exceeds the maximum size, then return an error.
+        let transaction = match transaction {


Deserialization is extremely expensive. I would consider moving this calculation into ledger.rs:check_transaction_basic, where we already deserialize. Perhaps that function can return the compute cost? Or some other design?

I did consider it and should revisit the idea but intuitively we might want to avoid coupling the cost calculation with check_transaction_basic as it's only called in propose_batch and not in process_batch_propose_from_peer, at least not directly.

Ah good point... That is unfortunate, because in the case of process_batch_propose_from_peer, we might be retrieving the transmission from disk, in which case we'll most certainly have to incur the deserialization cost.

Maybe if we let fn compute_cost cost take in a Transaction instead of Data<Transaction>, it can at least be made explicit. For our own proposal we call it from within check_transaction_basic, for incoming proposals we'll need to deserialize before calling it.

And if it turns out to be a bottleneck, we can always refactor the locations where we deserialize more comprehensively, and potentially create a cache for the compute_cache if needed.

vicsn · 2025-01-22T15:54:28Z

node/bft/ledger-service/src/mock.rs

@@ -235,4 +235,9 @@ impl<N: Network> LedgerService<N> for MockLedgerService<N> {
        self.height_to_round_and_hash.lock().insert(block.height(), (block.round(), block.hash()));
        Ok(())
    }
+
+    /// TODO: is this reasonable?


/// TODO: is this reasonable?

I guess if our mock ledger doesn't have the compute_cost function, yes its reasonable, no need to extend every test abstraction which exists.

node/bft/ledger-service/src/prover.rs

vicsn · 2025-01-22T16:13:01Z

node/bft/src/primary.rs

-                // If the worker is empty, break early.
-                if worker_transmissions.peek().is_none() {
-                    break 'outer;
+            let mut worker_transmissions = worker.transmissions().into_iter();


So correct me if I'm wrong, but the previous behaviour was to selectively drain transmissions, and if they are duplicates or are incorrect, we discard them.

Your current implementation completely drains transmissions, so besides discaring duplicates or invalid ones, once we hit the size or compute limit, remaining transmissions are discarded.

Some options:

preprocess the compute spend of a transmission and to cache it in the worker's Ready queue. But this would unfortunately require a lot of code changes and wouldn't make sense everywhere.

We drain transmissions one by one. If we notice post-deserialization that we're above the compute limit, we reinsert that one transmission back into the queue using worker.reinsert. This has strictly less overhead than draining all transmissions.

Previously the primary would preemptively drain the workers by batches, sized based on the remaining proposal space. This was fine as transactions were either included or dropped altogether but we need more nuance: we might choose to save a transaction for the following batch if it pushes the batch cost over the spend limit.

For the PoC, I opted to clone the transmissions (happens in worker.transmissions()), to avoid reinsertion logic initially, leaving the underlying collection untouched until the number of valid transactions is determined and draining it only at the end of the worker loop. Ideally, I'd want to peek the next item in the collection, and it might still be the most efficient. The next best thing would be to reinsert but that's still O(n) and requires more logic.

Peeking one by one would work!

Or draining one by one. If we optionally have to reinsert the very last one we drained if it crosses the compute threshold, I'm not sure what would be O(n)

O(n) comes from shifting all the transmissions by one in the collection when reinserting at index 0 (to preserve fairness in the ordering of transmissions). Peeking directly likely won't work because it requires either holding the lock (not a good idea) or cloning. But perhaps doing the lookup one-by-one, followed by a removal could work. Though any operation that isn't inserting or removing values from the tail of the map will incur an O(n) shift cost.

I see, thank you for explaining.

I take everything back, your approach seems close to optimal (assuming we don't change or introduce additional data structures).

But perhaps doing the lookup one-by-one, followed by a removal could work.

Indeed you can clone and lookup one-by-one (to avoid cloning the entire ready queue). And then batch drain at the end like you're already doing.

vicsn · 2025-01-22T16:14:29Z

node/bft/src/primary.rs

+            let mut worker_transmissions = worker.transmissions().into_iter();
+
+            // Check the transactions for inclusion in the batch proposal.
+            while num_transmissions_included_for_worker < num_transmissions_per_worker {


Wondering whether it is possible and desirable to put this check in the same place as the compute cost check, because they're both indicating "the batch proposal is too full now".

The loop could then be something like While let Some(transmission) = worker.drain(1).

Up to you, just a suggestion

niklaslong added 6 commits January 22, 2025 12:29

feat: add compute_cost to LedgerService trait

0559062

feat: add cost calculation to batch construction

cf78a4c

This commit also modifies how transmissions are included in a batch. They are only drained from the workers once their validity has been checked.

ref: no need to label nested loops

75e8a0f

feat: check batch spend limit when building proposal

e0eee7f

feat: check batch spend limit before signing proposal

9986f84

ref: prefer let else to match and removes dead code

a408a4d

vicsn reviewed Jan 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Batch proposal spend limits #3471

[Feature] Batch proposal spend limits #3471

niklaslong commented Jan 22, 2025 •

edited

Loading

vicsn left a comment

vicsn Jan 22, 2025

niklaslong Jan 23, 2025

vicsn Jan 23, 2025

vicsn Jan 22, 2025

vicsn Jan 22, 2025

niklaslong Jan 23, 2025 •

edited

Loading

vicsn Jan 23, 2025

niklaslong Jan 23, 2025 •

edited

Loading

vicsn Jan 23, 2025

vicsn Jan 22, 2025

[Feature] Batch proposal spend limits #3471

Are you sure you want to change the base?

[Feature] Batch proposal spend limits #3471

Conversation

niklaslong commented Jan 22, 2025 • edited Loading

vicsn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niklaslong Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niklaslong Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niklaslong commented Jan 22, 2025 •

edited

Loading

niklaslong Jan 23, 2025 •

edited

Loading

niklaslong Jan 23, 2025 •

edited

Loading