Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Skip costly validation for stored data #2574

Draft
wants to merge 3 commits into
base: staging
Choose a base branch
from

Conversation

vicsn
Copy link
Collaborator

@vicsn vicsn commented Nov 13, 2024

Motivation

Currently, every time an object is deserialized, input validation is performed. Ideally, we only perform input validation on data received from the network, not from disk.

This PR takes an initial stab at skipping the most expensive input validation with the smallest possible code diff. Deserialization of a Subdag and its contents is now about 10x faster, see the new dag.rs benchmark (half of which is due to skipping is_in_correct_subgroup_assuming_on_curve).

Whenever we (de)serialize from disk, we use bincode (which in its current use just prepends some bytes to to_bytes_le). Bincode calls into Deserialize::deserialize, which now calls into FromBytesUncheckedDeserializer, which now calls into from_bytes_le_unchecked.

Test Plan

  • CI passes
  • Run a local network, with traffic and restarts

Open TODOs

  • Ensure that Deserialize::deserialize is only used when getting data from disk, consider enforcing types are Verified in the DataMap interface.
  • Profile which objects we're actually reading from disk in practice, given that we have so many caches.
  • Look into correctness of from_x_coordinate_unchecked
  • New functions and types are all defined consistently, in the same order, with clear comments
  • Update Cargo.lock
  • Review if the recently added transmissions cache is still warranted

@@ -146,6 +146,11 @@ impl<N: Network> Subdag<N> {
// Ensure the leader certificate is an even round.
Ok(Self { subdag })
}

/// Initializes a new subdag.
pub fn from_unchecked(subdag: BTreeMap<u64, IndexSet<BatchCertificate<N>>>) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a more std-like name for such a function would be from_unchecked_parts or from_raw_parts; I'd personally prefer it, as it clearly indicates that the underlying objects are of interest, as opposed to an alternative representation of the same object

the same suggestion applies to the other instances of this name

Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very much in favor of this direction; once something is in our own local storage, it should be considered trustworthy/valid.

Check that Deserialize::deserialize is only used when getting data from disk, consider enforcing this somehow.

If we feel that we need to be more careful about the handling of objects that may not be verified, we could introduce a wrapper struct to indicate with certainty that they either are or are not (determined by how they were deserialized), and use it as a parameter/generic in the relevant functions/methods. This, however, could have a large impact on the diff (but not performance, as it would be zero-cost).

@vicsn vicsn changed the title Skip costly validation for stored data [Perf] Skip costly validation for stored data Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants