Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovery data usage improvements #6635

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

maan2003
Copy link
Member

@maan2003 maan2003 commented Jan 1, 2025

goal: reduce data usage for recovery

  • changed from hex to base64: 33% saving
  • client can only download from single peer and verify using signature: num peers times saving

for 4 guardian federation: overall 6x less data download

Happy new year!

goal: reduce data usage for recovery

- changed from hex to base64: 33% saving
- client can only download from single peer and verify using signature: num peers times saving
@maan2003 maan2003 force-pushed the ma/recover-happy-new-year branch from df42b92 to 51fd30e Compare January 1, 2025 12:18
/// access to a decoder registry also makes decoding structs impossible that
/// themselves contain module dyn-types (e.g. a module output containing a
/// fedimint transaction).
pub fn try_into_inner_known_module_kind(&self, decoder: &Decoder) -> Result<T, DecodeError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sus. Why.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied everything from original

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: verify that the original wasn't written by me and silly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so it's used to decode session/output outcomes in client modules sometimes, doesn't panic. We always take it from current consensus of peers (except now, where we check the signature so effectively the same thing). Looks OK.

fedimint-core/src/module/mod.rs Outdated Show resolved Hide resolved
async fn get_session_status_raw(
&self,
block_index: u64,
decoders: &ModuleDecoderRegistry,
) -> anyhow::Result<SessionStatus> {
debug!(target: LOG_CLIENT_NET_API, block_index, "Fetching block's outcome from Federation");
let verify = |_: &SignedSessionOutcome| -> bool { todo!() };
// TODO: check api version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just make it a new function, and let the caller verify. I think all over the code we do the api versioning checks in the higher layer. Though passing api version here wouldn't be bad either.

@@ -129,12 +132,56 @@ where
.map_err(|e| anyhow!(e.to_string()))
}

fn select_peers_for_status(&self) -> impl Iterator<Item = PeerId> + '_ + Clone {
// TODO: better selection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shuffle should do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking about avoiding the down guardian but random is still pretty good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still a massive improvement and whatever is simple should work.

@maan2003 maan2003 force-pushed the ma/recover-happy-new-year branch 2 times, most recently from c0b7822 to ef7a1f6 Compare January 1, 2025 18:53
@maan2003
Copy link
Member Author

maan2003 commented Jan 1, 2025

simplified the control flow

@maan2003
Copy link
Member Author

maan2003 commented Jan 1, 2025

@dpc thoughts about 9bf8539?

I don't think it will create any locking issues

@maan2003 maan2003 force-pushed the ma/recover-happy-new-year branch from 9bf8539 to 9912576 Compare January 1, 2025 19:17
@dpc
Copy link
Contributor

dpc commented Jan 1, 2025

@maan2003 I might be off and not have time to fully bring it all in, but there were two caches because session status can have partial results, or something.

I would avoid adding too much changes to one PR for now unless absolutely necessary.

@maan2003 maan2003 force-pushed the ma/recover-happy-new-year branch from 9912576 to ac9db1a Compare January 1, 2025 19:35
@dpc
Copy link
Contributor

dpc commented Jan 1, 2025

    /// In theory these two LRUs have the same content, but one is locked by
    /// potentially long-blocking operation, while the other non-blocking one.
    /// Given how tiny they are, it's not worth complicating things to unify
    /// them.

Await blocks, while holding the lock there, right. So there is an issue, or at least at some point I believe there was (could have been wrong). Later this/next week I could investigate to fully understand the issue, but for now not worth blocking on.

@maan2003 maan2003 force-pushed the ma/recover-happy-new-year branch from ac9db1a to e8b4413 Compare January 1, 2025 19:39
@dpc
Copy link
Contributor

dpc commented Jan 1, 2025

Right. So the goal of "await" is blocking until the result is there, while a concurrent "check status" for the same session wants to get a partial result, and not block, so would get stuck if there was a blocked "await", leading to random and unwanted hanging in certain cases (which we might add later). Even if there's a way to somehow unify these caches, the complexity of it is probably not worth the tiny memory savings.

@maan2003 maan2003 force-pushed the ma/recover-happy-new-year branch 2 times, most recently from 241d5c0 to 8a99bb9 Compare January 3, 2025 16:32
@maan2003 maan2003 force-pushed the ma/recover-happy-new-year branch from 8a99bb9 to 3c30dc9 Compare January 3, 2025 16:33
@maan2003
Copy link
Member Author

maan2003 commented Jan 3, 2025

👀 looks like recovery tests is failing, will debug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants