Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch usize conversion and Err more consistently #1499

Open
wants to merge 11 commits into
base: testnet3
Choose a base branch
from

Conversation

vicsn
Copy link
Collaborator

@vicsn vicsn commented May 2, 2023

Motivation

See for a motivation this comment.

However, my initial intuition was wrong, we can't reduce developer confusion with a "use usize as little as possible" because Rust's library uses usize all over the place. We instead increase clarity and safety by being more rigorous about our usage of usize. See the CONTRIBUTING doc for more details.

Test Plan

Given that the main concern for this is consensus issues, I suggest to take extra care to stress test snark{VM,OS} on both 32-bit and 64-bit machines, generating test input from the different machine or constructing it manually. Let me know if you want me to help setting that up already, otherwise I'll make a TODO for once we start testing AleoBFT more extensively.

@vicsn vicsn requested review from Pratyush and howardwu May 2, 2023 13:44
Comment on lines 242 to 250
let mut z_b_s = BTreeMap::new();
for (circuit_id, circuit_state) in state.circuit_specific_states.iter() {
let mut z_b_i = Vec::with_capacity(usize::try_from(circuit_state.batch_size)?);
for i in 0..circuit_state.batch_size {
let z_b = witness_label(*circuit_id, "z_b", usize::try_from(i)?);
z_b_i.push(LinearCombination::new(z_b.clone(), [(F::one(), z_b)]));
}
z_b_s.insert(circuit_id, z_b_i);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use Itertools::try_collect here, instead of rewriting as a for loop

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to return the Result, I have to unwrap the result in order to use z_b on line 246. I got it to compile if we return a Result anyway for all paths, but I don't think we'd be happy with this (the compiler asks for those type annotations:

        let z_b_s = state
        .circuit_specific_states
        .iter()
        .map(|(&circuit_id, circuit_state)| {
            let z_b_i = (0..circuit_state.batch_size)
                .map(|i| {
                    let z_b = witness_label(circuit_id, "z_b", usize::try_from(i)?);
                    Ok::<LinearCombination<F>, std::num::TryFromIntError>(LinearCombination::new(z_b.clone(), [(F::one(), z_b)]))
                }).collect::<Result<Vec<_>,_>>();
            Ok::<(CircuitId, Vec<LinearCombination<F>>), std::num::TryFromIntError>((circuit_id, z_b_i?))
        })
        .collect::<Result<BTreeMap<_, _>,_>>()?;

Meanwhile, online fora on related issues keep suggesting to just use a forloop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it down to 4916a3c but it's still more code than the for-loop

})
.collect::<BTreeMap<_, _>>();
.collect::<Result<Vec<_>, _>>()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here, I think we can use try_collect to still collect into a BTreeMap

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we do that, the code that uses z_b_s_at_beta should go back to its old state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't use try_collect because I need to annotate the types, but I can use ? within the map now and collect: 4916a3c

@@ -74,13 +74,14 @@ impl<F: PrimeField, MM: MarlinMode> AHPForR1CS<F, MM> {
#[allow(clippy::type_complexity)]
pub fn prover_first_round<'a, R: RngCore>(
mut state: prover::State<'a, F, MM>,
batch_sizes: impl Iterator<Item = (&'a CircuitId, &'a usize)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we passing this separately? Isn't it included in state?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in state, but we created this iterator already earlier so I thought why not reuse it for line 138 below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we passing this in if it's in state?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment right above yours... But since you both think it's weird, removed it: 7f2a6c8

@@ -94,7 +95,7 @@ impl<F: PrimeField, MM: MarlinMode> AHPForR1CS<F, MM> {
let c_domain = circuit_state.constraint_domain;
let i_domain = circuit_state.input_domain;

let mut circuit_r_b_s = Vec::with_capacity(batch_size);
let mut circuit_r_b_s = Vec::with_capacity(z_a.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change? We do still have batch_size in scope

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rng: &mut R,
) -> Result<prover::State<'a, F, MM>, AHPError> {
let round_time = start_timer!(|| "AHP::Prover::FirstRound");
let mut r_b_s = Vec::with_capacity(state.circuit_specific_states.len());
let mut job_pool = snarkvm_utilities::ExecutionPool::with_capacity(3 * state.total_instances);
let mut job_pool = snarkvm_utilities::ExecutionPool::with_capacity(usize::try_from(3 * state.total_instances)?);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the following works, I think we should switch to it, instead of using the more verbose usize::try_from

Suggested change
let mut job_pool = snarkvm_utilities::ExecutionPool::with_capacity(usize::try_from(3 * state.total_instances)?);
let mut job_pool = snarkvm_utilities::ExecutionPool::with_capacity((3 * state.total_instances).try_into()?);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -114,13 +114,13 @@ impl<'a, F: PrimeField, MM: MarlinMode> State<'a, F, MM> {

let first_padded_public_inputs = &variable_assignments[0].0;
let input_domain = EvaluationDomain::new(first_padded_public_inputs.len()).unwrap();
let batch_size = variable_assignments.len();
let batch_size = u32::try_from(variable_assignments.len())?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, let's change this to try_into()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let mut x_polys = Vec::with_capacity(batch_size);
let mut padded_public_variables = Vec::with_capacity(batch_size);
let mut private_variables = Vec::with_capacity(batch_size);
let mut z_as = Vec::with_capacity(variable_assignments.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make a batch_size_usize local variable and use that here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed would be easier to read 5ebe7b2

Comment on lines 96 to 101
let mut batch_sizes = BTreeMap::new();
for (circuit_id, batch_size) in
state.circuit_specific_states.iter().map(|(circuit_id, s)| (circuit_id, s.batch_size))
{
batch_sizes.insert(*circuit_id, usize::try_from(batch_size)?);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, try_collect and try_into.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardwu howardwu requested a review from ljedrz May 5, 2023 01:49
CONTRIBUTING.md Outdated Show resolved Hide resolved

use std::{collections::BTreeMap, fmt};

/// Stores a sparse polynomial in coefficient form.
#[derive(Clone, PartialEq, Eq, Hash, Default, CanonicalSerialize, CanonicalDeserialize)]
#[derive(Clone, PartialEq, Eq, Hash, Default)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: it might be a good idea to check if any other objects can be "trimmed" in a similar fashion for faster compilation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a thumbsup! rust-lang/rust#50927

Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@ljedrz
Copy link
Collaborator

ljedrz commented May 5, 2023

If we're concerned with verbosity, we could always introduce helper macros in snarkvm_utilities that would substitute calls like

usize::try_from(x)?

with sth like

to_usize!(x)

Such calls are cheap and they provide us with much more cross-platform safety, so even if it's an annoyance, it is IMO well worth it.

@vicsn
Copy link
Collaborator Author

vicsn commented May 5, 2023

For the record, I cannot reproduce this CI error locally: thread 'package::run::tests::test_run_with_import' panicked at 'called `Result::unwrap()` on an `Err` value: marlin: PolynomialCommitmentError(AnyhowError(Expected size of 12582920, found size of 12075008))', vm/package/run.rs:145:44

@ljedrz
Copy link
Collaborator

ljedrz commented May 5, 2023

For the record, I cannot reproduce this CI error locally: thread 'package::run::tests::test_run_with_import' panicked at 'called Result::unwrap() on an Err value: marlin: PolynomialCommitmentError(AnyhowError(Expected size of 12582920, found size of 12075008))', vm/package/run.rs:145:44

some CI tests checking object sizes have been flaky recently

@howardwu
Copy link
Member

howardwu commented May 5, 2023

@vicsn the CI failures on synthesizer are concerning. Our serialization sizes appear to have genuinely changed. Those failures aren't flaky. (only the ParameterError ones are flaky, which are not present in synthesizer).

@vicsn
Copy link
Collaborator Author

vicsn commented May 7, 2023

@vicsn the CI failures on synthesizer are concerning. Our serialization sizes appear to have genuinely changed. Those failures aren't flaky. (only the ParameterError ones are flaky, which are not present in synthesizer).

I should have tackled this right away! The failing tests can be explained by the fact that we moved from usize (which was 64 bit on everyone's machine so far) to u32. So we can either (1) change the code to use u64 or (2) resample parameters yet again and update the tests. As a batch size of u32 gives us enough space I vote for this option.

In more detail:

  • The block:: tests fail because they can't deserialize Proof - which is hardcoded. Resampling genesis resolved the issue for me.
  • The vm:: tests fail because they are 8 bytes (for two proof instances) or 4 bytes (for one proof instance) less than expected. 1690b3b

Base automatically changed from staging to testnet3 May 10, 2023 19:02
@vicsn vicsn force-pushed the catch_usize_and_err branch from 1690b3b to 8fb4a79 Compare May 12, 2023 10:20
@howardwu
Copy link
Member

@vicsn can you try adding the following line to the top of snarkvm-algorithms/src/lib.rs and see if the compiler warns?

#![warn(clippy::cast_possible_truncation)]

We use this flag in circuit and console to ensure there are no unsafe casts.

@vicsn
Copy link
Collaborator Author

vicsn commented May 15, 2023

warn(clippy::cast_possible_truncation)]

Added it. For the adjustments in the last commits I:

  • let structs use fixed-size types instead of usize to make it clear to readers what size to expect for operations
  • if the callchain doesn't use Results or requires a lot of casting, see if we can avoid the overhead of try_from and wrap the function in #[allow(...)]. I added comments about why casting is safe, if you think it is too ugly we can remove those but at least it can help your review. A big downside of this approach is that it can give a false sense of security in the future - just like all uses of #[allow(...)] in the codebase. It would be cool if #[allow(...)] codeblocks would have to be manually approved/enabled in CI whenever there is a commit which changes something in the respective codeblock. Perhaps for now it suffices to add a line to the PR template or CONTRIBUTING: "- [ ] I checked whether the functions I changed are still allowed to have it's #[allow(...)]". @ljedrz shall I add that?

@vicsn vicsn requested a review from ljedrz May 15, 2023 06:41
@ljedrz
Copy link
Collaborator

ljedrz commented May 17, 2023

@vicsn I think it's always better to have more comments to avoid any misunderstanding/misuse 👍. As for future-proofing, I think the safest course of action is to have more comprehensive tests, including extremes in terms of the number of items, but having some extra information in CONTRIBUTING is also valuable.

What we could also do is add assertions in some of the places (the ones where we know the value is smaller than some other value) where we perform the "currently valid" casts - simple comparisons shouldn't be noticeable at all perf-wise.

Comment on lines 287 to 288
let mut batch_sizes = Vec::with_capacity(batch_sizes_in.len());
for (z_b_evals, batch_size) in evaluations.z_b_evals.iter().zip(batch_sizes_in.values()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice allocation removal 👌

Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The further changes look fine to me 👍.

@vicsn vicsn force-pushed the catch_usize_and_err branch from a1b5201 to 0c8a7f6 Compare May 17, 2023 15:18
@vicsn
Copy link
Collaborator Author

vicsn commented May 19, 2023

@howardwu if you like the PR's logic, I suggest we merge and I will tackle this piece by piece in the coming weeks: https://github.com/AleoHQ/snarkVM/issues/1567

@vicsn vicsn mentioned this pull request May 24, 2023
@howardwu howardwu changed the base branch from testnet3 to staging May 25, 2023 21:38
})
}
}

impl<E: PairingEngine> ToBytes for CommitterKey<E> {
// Values are safe to cast to u32 because they are read as u32.
#[allow(clippy::cast_possible_truncation)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: f339c4c

@@ -131,6 +131,8 @@ impl<E: PairingEngine> FromBytes for UniversalParams<E> {
}
}

// It is safe to cast `supported_degree_bounds.len()` and individual bounds to a `u32` because they are read and used as u32
#[allow(clippy::cast_possible_truncation)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: f339c4c

fn update_buckets<G: AffineCurve>(
base: &G,
mut scalar: <G::ScalarField as PrimeField>::BigInteger,
w_start: usize,
c: usize,
buckets: &mut [G::Projective],
) {
assert!(w_start <= u32::MAX as usize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using assert?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for extra clarity, removed the assert now

fn batched_window<G: AffineCurve>(
bases: &[G],
scalars: &[<G::ScalarField as PrimeField>::BigInteger],
w_start: usize,
c: usize,
) -> (G::Projective, usize) {
assert!(w_start <= u32::MAX as usize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using assert?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for extra clarity, we'd already panic if the conversion would fail so I now removed the asserts.

pub(super) fn batch_add<G: AffineCurve>(
num_buckets: usize,
bases: &[G],
bucket_positions: &mut [BucketPosition],
) -> Vec<G> {
// assert_eq!(bases.len(), bucket_positions.len());
assert!(!bases.is_empty());
assert!(num_buckets <= u32::MAX as usize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using assert?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead of panicking on a failed cast, we'd panic somewhat more clearly on this assert.

But anyway, gave it more thought, removed the assert and now passing a u32 to this function so we only have to reason about safe casting one level higher.

/// Obtain the limbs directly from a big int
pub fn get_limbs_representations_from_big_integer<TargetField: PrimeField>(
elem: &<TargetField as PrimeField>::BigInteger,
optimization_type: OptimizationType,
) -> SmallVec<[F; 10]> {
let params = get_params(TargetField::size_in_bits(), F::size_in_bits(), optimization_type);
assert!(params.bits_per_limb <= u32::MAX as usize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using assert?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think it's fine to error out, or we can have params.bits_per_limb be a u32. Generally when we have a malformed config it's better to assert and fail quickly. This avoid polluting function signatures with Results

@vicsn vicsn force-pushed the catch_usize_and_err branch 2 times, most recently from d56a16c to d33b4e2 Compare June 6, 2023 16:43
@vicsn vicsn requested a review from howardwu June 6, 2023 16:45
Base automatically changed from staging to testnet3 June 6, 2023 16:57
@vicsn vicsn requested a review from ljedrz June 6, 2023 17:04
@vicsn vicsn force-pushed the catch_usize_and_err branch from 2500e08 to 5239635 Compare June 6, 2023 17:41
@vicsn vicsn mentioned this pull request Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants