Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return effective trim point LSN on successful trim request #2468

Closed
wants to merge 4 commits into from

Conversation

pcholakov
Copy link
Contributor

@pcholakov pcholakov commented Jan 6, 2025

This change enables restatectl reporting of the effective trim point as well as positive validation in places like the upcoming trim-gap handling test (#2463).

It's mostly just plumbing and a couple of opportunistic cleanups.

@tillrohrmann - making you the primary reviewer since you've been quite engaged with this!

cc: @AhmedSoliman for visibility - I've slightly updated the Bifrost log server Trimmed response RPC to support the updated semantics.

Copy link

github-actions bot commented Jan 6, 2025

Test Results

  7 files  ±0    7 suites  ±0   4m 21s ⏱️ -6s
 47 tests ±0   46 ✅ ±0  1 💤 ±0  0 ❌ ±0 
182 runs  ±0  179 ✅ ±0  3 💤 ±0  0 ❌ ±0 

Results for commit d55eee4. ± Comparison against base commit 3ff1c0c.

♻️ This comment has been updated with latest results.

@pcholakov pcholakov force-pushed the feat/trim-log-report-lsn branch from c54764d to 7f9d532 Compare January 6, 2025 12:46
@pcholakov pcholakov requested a review from muhamadazmy January 6, 2025 12:47
@pcholakov pcholakov force-pushed the feat/trim-log-report-lsn branch from 7f9d532 to b86ac06 Compare January 6, 2025 12:50
@pcholakov pcholakov marked this pull request as ready for review January 6, 2025 12:57
@@ -587,7 +587,7 @@ impl<T: TransportConnect> Service<T> {
info!(
?log_id,
trim_point_inclusive = ?trim_point,
"Manual trim log command received");
"Trim log command received");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is slightly more accurate; it might have been issued by automation.

Comment on lines -219 to -220
/// Trim the log to the minimum of new_trim_point and last_committed_offset
/// new_trim_point is inclusive (will be trimmed)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better documented in the trait.

@@ -46,7 +46,7 @@ pub(super) struct ReplicatedLoglet<T> {
#[debug(skip)]
networking: Networking<T>,
#[debug(skip)]
logservers_rpc: LogServersRpc,
log_servers_rpc: LogServersRpc,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could not resist some trivial fixups.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for creating this PR @pcholakov. This is a good improvement. Before we can merge this PR we need to revisit the contract of trim wrt to the return value. I would suggest to not return Option<Lsn> and instead the last trim point independent whether this call did the trimming or not. We should also make sure that the different loglet implementations are aligned wrt the contract we settle on.

@@ -94,6 +94,11 @@ message TrimLogRequest {
uint64 trim_point = 2;
}

message TrimLogResponse {
uint32 log_id = 1;
optional uint64 trim_point = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under which condition would a none trim_point value a valid response? Maybe document it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack; one could also make the case we don't need an optional here as we could represent Lsn::INVALID as a uint64 but I rather like the cleaner semantics of a None value.

Comment on lines 408 to 420
/// Trim the log to the specified LSN trim point (inclusive). Returns the new trim point LSN if
/// the log was actually trimmed by this call, or `None` otherwise.
pub async fn trim(&self, log_id: LogId, trim_point: Lsn) -> Result<Option<Lsn>, Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether changing this contract t always return the lowest available lsn is a bit nicer. In the current form if I send trim(0, 5) twice, my first request will receive a Some(5) as response and the second None. Differently asked, why is it important to be able to distinguish whether a trim call has actually performed the trim or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It behaves in the way you describe now! I was over-indexing on the needs of testing, if you zoom out a bit it's pretty clear this is how the API should work.

crates/bifrost/src/bifrost.rs Outdated Show resolved Hide resolved
if current_trim_point >= actual_trim_point {
return Ok(());
if current_trim_point >= requested_trim_point {
return Ok(Some(current_trim_point));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implementation is not in line with how you described it earlier. According to your specification, it should return None if nothing is done. The local loglet is implemented differently in this regard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for flagging this! I've revised the method contract as you suggested, and this is now the correct behavior. I've also added some minimal coverage for these edge cases to gapless_loglet_smoke_test which runs against all loglet providers.

@pcholakov
Copy link
Contributor Author

Agree, this is probably the better API contract as it makes the operation naturally idempotent. I was keen to determine whether a specific call had caused the trim for the purposes of testing but I think that's not very important overall. Will update, thanks for the input!

@pcholakov pcholakov force-pushed the feat/trim-log-report-lsn branch from b3bc1c4 to 73943fb Compare January 7, 2025 15:11
@pcholakov pcholakov requested a review from tillrohrmann January 7, 2025 15:33
@pcholakov pcholakov dismissed tillrohrmann’s stale review January 7, 2025 15:34

Updated the method contract to return the actual trim point regardless of whether a trim was performed (or None, if the log has never been trimmed / the trim position is 0).

@pcholakov pcholakov removed the request for review from muhamadazmy January 7, 2025 15:36
@AhmedSoliman AhmedSoliman self-requested a review January 7, 2025 15:59
Comment on lines +141 to +143
// If we get a quorum, we will acknowledge the trim request with the highest trim point
// reported by any log server. This may be higher than what was requested.
let mut response_trim_point = LogletOffset::INVALID;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On re-reading the change, I realized that the replicated loglet retured as the actual trim point that reported by the latest log server to respond as we reach the write consensus requirement. A more paranoid approach is to take the max of all the reported trim points thus far - this could be higher than was was requested, but it's probably safer to over-report.

@AhmedSoliman
Copy link
Contributor

As discussed offline. There is a directional change that needs to happen with trimming and I'm a little concerned that starting to return the trim-point may introduce an unintentional dependency on the current semantics. I'd prefer if we park this and use get_trim_point() after trimming instead.

@pcholakov
Copy link
Contributor Author

Ack, let's can this.

@pcholakov pcholakov closed this Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants