Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support stale-while-revalidate option #43

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Embraser01
Copy link

@Embraser01 Embraser01 commented Nov 16, 2023

Currently, there was an issue with the stale-while-revalidate options. The satisfiesWithoutRevalidation method didn't allow stale response to be returned when swr was set.

I changed the logic so that the stale response is returned in that case.
I also added tests on this and on some edge case with the max-stale request option (I'm not 100% sure it the right way but didn't find any specs on this).

Related issues: #41, #27

@kornelski
Copy link
Owner

Thanks for the PR. This functionality is quite desirable.

However, my concern is how to communicate to library users that they still need to do the revalidation. If someone simply wrote code like recommended so far:

if (cached.satisfiesWithoutRevalidation(req)) {
    return recycle(cached);
    // gone! no request made
}

then they won't know that they still need to perform revalidation in the background. I wouldn't want them to have to check headers themselves separately.

Any ideas how the interface could be changed to be more robust in this case?

I'm thinking that perhaps satisfiesWithoutRevalidation could continue to return false, and users would use some other method in the "cache miss" code path to check if they can return stale in the meantime. I'd rather require caches supporting stale-while-revalidate to have extra code, than to have a pitfall of existing users not revalidating when needed.

@Embraser01
Copy link
Author

I agree that this is not the best user interface. I tried to limit the impact of my changes so that it could be usable (for now it isn't really usable).
It would be nice for the user to know if the response given by satisfiesWithoutRevalidation is staled or fresh (given that max-stale can also return stale data).

I think having a boolean as a return value prevent complexity and is simple to use and understand however, there is some cases (swr, max-stale) where this boolean doesn't give enough info on what to do.
For now with max stale, it's easy to know if the res is staled content or not with the canStale() function (if it satisfies and can stale => staled content), but it isn't straight forward and doesn't work with stale while revalidate.

I'm not sure on what interface would work better, maybe instead of returning a boolean, it should return an enum or maybe something more flexible like an object like:

const { satisfied, revalidate } = cached.satisfiesWithoutRevalidation(req);

Where satisfied is true as long as the cached content can be returned ASAP
revalidate is true if cache is expired and should be refreshed

Most cases would be satisfied(true), revalidate(false) or satisfied(false); revalidate(true).
When using stale-while-revalidate you could have satisfied(true), revalidate(true)

That could be a new method so that would break anything

@kornelski
Copy link
Owner

Perhaps a small change could be to keep satisfiesWithoutRevalidation return false for s-w-r, and repurpose useStaleWhileRevalidate(req) to be equivalent of satisfiesWithoutRevalidation that supports s-w-r.

I also maintain a Rust version of the same package, and I'll have to add it there too. Thanks to Rust's support for enums the API ended up looking completely different:

match policy.before_request(req) {
     Fresh(response) => return response,
     Stale(request) => return policy.after_response(send(request)),
     // SWF(request, response) => { update_cache(request)); return response }
}

So maybe in JS this could be generalized to something like this:

let { request, response } = policy.evaluate(request, response);
if (request) { update_cache(request) }
if (response) { return response } else { wait for the cache here…? }

@Embraser01
Copy link
Author

Perhaps a small change could be to keep satisfiesWithoutRevalidation return false for s-w-r, and repurpose useStaleWhileRevalidate(req) to be equivalent of satisfiesWithoutRevalidation that supports s-w-r.

I think it would be better to keep a single API for all the cases. To not create a breaking change, it can be a new recommended API that would handle the swr case and maybe other cases like in your evaluate solution.

So maybe in JS this could be generalized to something like this:

let { request, response } = policy.evaluate(request, response);
if (request) { update_cache(request) }
if (response) { return response } else { wait for the cache here…? }

I'm a bit confused on the request & response naming, it doesn't give any semantics information. I think the evaluate function could return (naming TBD):

  • revalidate: true as soon as the request should be sent to the origin server
  • asyncRevalidate: true for stale-while-revalidate, false otherwise
  • cacheHit: true when the cached response can be sent (cannot be false if revalidate is false)
  • responseHeaders: (optional) The updated response headers (only if applicable and not too computing intensive 🤔)

That system would allow users to only implement simple case (no swr)

const { revalidate, asyncRevalidate, cacheHit } = policy.evaluate(req, res);

if (revalidate) {
	const newRes = await update_cache(req);
	return newRes;
}
if (cacheHit) {
	res.headers = policy.responseHeaders(res);
	return res;
}

but with easy support of swr

const { revalidate, asyncRevalidate, cacheHit } = policy.evaluate(req, res);

if (asyncRevalidate) {
	update_cache(req).catch(err => logError(err));
} else if (revalidate) {
	const newRes = await update_cache(req);
	return newRes;
}
if (cacheHit) {
	res.headers = policy.responseHeaders(res); // Could directly come from evaluate
	return res;
}

@kornelski
Copy link
Owner

kornelski commented Nov 20, 2023

I've jumped a few steps ahead in my explanation.

The updated response headers (only if applicable and not too computing intensive 🤔)

Update of headers is not optional. You're simply not allowed to skip that step, even if it took whole day to compute (but it doesn't). Therefore, I want to change the API to always give you the updated headers.

Similarly with response revalidation. If the cached version is unusable, you'll have to make a request. If it's s-w-r, you have to make the request anyway. So I want to provide up-to-date request headers for fetch/revalidate whenever it's needed.

So if you have something to return (it's fresh or s-w-r), you'll get response headers to use. If you have to make a request to the origin (because it's a miss or s-w-r), you'll get a set of headers for making that request. The observation here is that it doesn't really matter if it's a request due to cache miss or a revalidation request, because you're going to send it to the origin anyway, and it'll go back to the cache to update it.

So you don't really need to know the details of what happened. You only have two possible actions: return a response or make an upstream request. S-w-r is a case where you do both, but doesn't change what inputs you need — it's still a response to return and a request to make. That's why I suggested { request, response } as a returned object, but I take this may be unclear naming.

How about this:

  • { response: { headers: {…} } } if it matches,
  • { staleResponse: { headers: {…} }, originRequest: { headers: {…} } } if it's s-w-r,
  • { originRequest: { headers: {…} } } if it's a miss.

@Embraser01
Copy link
Author

Thanks for the explanation, much clearer now!
I like the new naming, but after trying to write some of this to test a bit how that would work in real life, I found that it wasn't really easy to use and still let some tricky thing up to the user.

I tried again but with some inspiration from your rust example. I think I found something that would solve this:

The idea would be to have the evaluate function look like this:

type EvaluateRes<T> = {
  cacheResponse: Response;
  asyncRevalidationPromise?: Promise<T>;
};

async evaluate<T>(
  req: Request,
  res: Response,
  doRequest: (req: Request) => Promise<T>,
): Promise<EvaluateRes<T>> {
  // If cache is fresh
  //    return updated response
  // If cache is stale and stale-while-revalidate is valid
  //    call doRequest(req).catch(logSomewhere)
  //    return updated response
  // Else
  //    call await doRequest(req);
  //    send updated response

  return {} as EvaluateRes<T>;
}

And then, use it like this:

async function doRequest<T>(request: Request): Promise<T> {
  const response = await callHTTPEndpoint(request);

  const policy = new CachePolicy(request, response, {});
  if (policy.storable()) {
    await cache.save(request, { policy, response }, policy.timeToLive());
  }
  return response;
}

export async function test() {
  // And later, when you receive a new request:
  const newReq: Request = { headers: {} };

  const cacheRes = await cache.get(newReq);
  if (!cacheRes) {
    return await doRequest(newReq);
  }
  const { policy, response } = cacheRes;

  const { cacheResponse, asyncRevalidationPromise } = await policy.evaluate(
    newReq,
    response,
    doRequest,
  );

  // (optional) Let the user have some control on stale while revalidate
  if (asyncRevalidationPromise) {
    asyncRevalidationPromise.catch(() =>
      console.error('SWR request was not successful'),
    );
  }
  return cacheResponse;
}

This system remove the need for the user to know when to do the request and if they should await or not the response. It always gives back an updated response.

@Embraser01
Copy link
Author

@kornelski friendly ping on this discussion 🙂

@kornelski
Copy link
Owner

kornelski commented Dec 5, 2023

I'm not a fan of inversion of control approach. While it does make it a bit more foolproof, it also makes the API more opaque, and I'm worried it could be too restrictive for some advanced uses.

I think there can be two ways forward:

  1. Keep satisfiesWithoutRevalidation return false for s-w-r, and repurpose useStaleWhileRevalidate(req) to be equivalent of satisfiesWithoutRevalidation that supports s-w-r, e.g. evaluateRequest(req).

  2. Add beforeRequest and afterResponse like this: https://docs.rs/http-cache-semantics/latest/http_cache_semantics/struct.CachePolicy.html#method.before_request https://docs.rs/http-cache-semantics/latest/http_cache_semantics/struct.CachePolicy.html#method.after_response mapped to something like {freshResponse|staleResponse|originRequest} mentioned in my previous comment.

davidaucoin7377

This comment was marked as off-topic.

davidaucoin7377

This comment was marked as spam.

Copy link
Owner

@kornelski kornelski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

davidaucoin7377

This comment was marked as spam.

@kornelski kornelski requested review from davidaucoin7377 and kornelski and removed request for davidaucoin7377 and kornelski July 1, 2024 17:26
@stevengssns
Copy link

Is there any update on this issue?
I would love to try and use this library, but I need stale-while-revalidate. It seems simple enough to fork and modify, but I would like to avoid that if possible.

I have tried to read the thread, and I was wondering if it would be a valid solution to add an option to satisfiesWithoutRevalidation to modify its behaviour and take stale-while-revalidate into account. The default behaviour would remain as it is today, but it would allow new code to opt in to the behaviour.

If this would seem acceptable I wouldn't mind to try and provide a PR.

@kornelski
Copy link
Owner

I'm not planning to merge this PR as-is, because the issues were not addressed: #43 (comment)

If you could apply those changes and make another PR I'd be happy to move this forward.

@stevengssns
Copy link

Thanks for your fast reply.

I could have a look if I can provide a PR that would implement option 1. It shouldn't be that very hard (famous last words).

The only thing I was wondering is whether you had considered the "opt in" suggestion? But I'll have a look if I can provide this alternative PR if I find the time.

@kornelski
Copy link
Owner

Can you tell me more what do you expect the opt-in to do?

If you want stale-while-revalidate to be automatic, handled as an implementation detail by the library, then I'm afraid this is not possible with the current design. This library is not controlling any caches, and it's not making any requests. It merely answers questions about cacheability.

The answer to what to do with stale-while-revalidate is more complicated than the current interface can say, and it requires users of the library to perform the revalidation while using a stale response. That's a new behavior that requires code changes from users.

@stevengssns
Copy link

stevengssns commented Jan 14, 2025

I quickly created a PR #49 that maybe illustrates better what I mean. Though I can imagine that there might be some subtle details that I am missing or misinterpreting.

The way I see it, as an implementor that is interested in adding support for stale-while-revalidate, I would explicitly set the option to also satisfy these additional cases. This would of course imply that, when satisfiesWithoutRevalidation returns true, that I would need to start a background job that actually does revalidation (if the response is stale of course).

I do understand that the naming ....WithoutRevalidation is somewhat akward. Though useStaleWhileRevalidate(req) does not feel all that clear when it comes to naming, or as an alternative for satisfiesWithoutRevalidation if you need the extra functionality.

@stevengssns
Copy link

stevengssns commented Jan 14, 2025

In my head the usage of PR #49 would look like this.

if (cached.satisfiesWithoutRevalidation(req, { enableStaleWhileRevalidate: true })) {
    if (cached.stale()) {
        // start revalidation in the background.
    }
    return recycle(cached); // maybe also return a Promise for the result of the background job.
}

But again, I know I might be missing something here. So feel free to point it out to me.

P.S. When looking at it know, then I guess I'm missing some cases where stale is true, but there was no `stale-while-revalidate' header. In which case you should not trigger the background revalidation job.... I'll have a fresh look at it tomorrow.

@kornelski
Copy link
Owner

kornelski commented Jan 15, 2025

As you see, satisfiesWithoutRevalidation needing revalidation is a misleading function name, and the cached entry can be stale and still usable without revalidation (max-stale can be in the request asking for a stale non-revalidated response).

So the instructions for the user of the library need to be clear, so that they don't have to understand HTTP semantics themselves.

Instead of bending the meaning of satisfiesWithoutRevalidation, just add a more neutral function evaluateRequest(req). You can make satisfiesWithoutRevalidation be implemented around evaluateRequest to avoid copypasting the logic.

evaluateRequest instead of returning a simple boolean, should return an Object with instructions what to do, since hit/s-w-r/miss is no longer a true/false answer.

In #43 (comment) I've suggested beforeRequest and afterResponse, but this may be unidiomatic in JS. Optional object fields like in #43 (comment) could be better?

The complete usage example could look like this:

// When serving requests from cache:
const { oldPolicy, oldResponse } = letsPretendThisIsSomeCache.get(
    newRequest.url
);

const { withoutRevalidation, revalidationRequest } = oldPolicy.evaluateRequest(newRequest);

if (withoutRevalidation) {
    // satisfies without revalidation
    oldResponse.headers = withoutRevalidation.responseHeaders;
    return oldResponse;
}

const doRequestAsync = async () => {
    // Change the request to ask the origin server if the cached response can be used
    newRequest.headers = revalidationRequest.requestHeaders;

    // Send request to the origin server. The server may respond with status 304
    const newResponse = await makeRequest(newRequest);

    // Create updated policy and combined response from the old and new data
    const { policy, modified, responseHeaders } = oldPolicy.revalidatedPolicy(
        newRequest,
        newResponse
    );
    const response = modified ? newResponse : oldResponse;

    // Update the cache with the newer/fresher response
    letsPretendThisIsSomeCache.set(
        newRequest.url,
        { policy, response },
        policy.timeToLive()
    );

    // And proceed returning cached response as usual
    response.headers = responseHeaders;
    return response;
};

if (revalidationRequest.staleWhileRevalidate) {
    doRequestAsync(); // no await        
    oldResponse.headers = revalidationRequest.staleWhileRevalidate.responseHeaders;
    return oldResponse;
} else {
    return doRequestAsync();
}

I've changed the API to return headers you need immediately instead of asking user to call policy.revalidationHeaders() or policy.responseHeaders(). I think this will make it easier to use the API, since you don't need to know that you have to call these methods, you can inspect the returned objects and see what's there for you.

Also making revalidationRequest.staleWhileRevalidate nested, instead of sharing response data with withoutRevalidation, it avoids having an API that could be implemented incorrectly and on s-w-r return stale response and not revalidate.

@stevengssns
Copy link

Thanks for the reply. I needed to make sure I understood the needs for the PR correctly. That they are mostly related to API design, and that I was not making any mistakes about the actual HTTP semantics.

I will try to make some time to see if I can forge something based on all the input.

Of course I would avoid duplication logic for the new API, but I still have one question about this. I assume you want the current satisfiesWithoutRevalidation to remain completely unaltered? In other words, it should not return true if swr allows it?
Not doing so could have some effect on existing caches that use this library. Because suddenly they might start to serve more stale content, depending on their context.

@stevengssns
Copy link

Hi again,
before diving into the coding attempt, I decided to go through the Mozilla docs and the RFC a bit more. And when doing so I have the feeling that the newly suggested API is still missing some subtle nuances that could be interesting for a cache developer.

I have compiled the following cases in my head:

  1. The cached response can be used without any revalidation needs.

An advanced cache implementation MAY still implement some pro-active prefetch strategy (e.g. Akamai has this feature), so it remains useful to also keep a separate method to obtain the revalidation request headers from the policy.

  1. The cached response can be used, but a revalidation COULD be done asynchronously (obviously synchronously would defeat the whole purpose of the cache).

This is the case where a max-stale request CC directive has been set, and a stale response can be served. A cache implementation could chose to still do a revalidation in the background to be ready for the next request that desires a fresh response.

  1. The cached response can be used, but a revalidation SHOULD be done asynchronously.

This is effectively the SWR case, and differs only in should vs could. I have not read every letter of the RFC yet, but I did not see any mention that doing the revalidation is a hard requirement (i.e. a MUST).

  1. The cached response can be used, but a revalidation SHOULD be done synchronously.

This case might not be very obvious, and it is not very well documented in the RFC or any of the less formal documentation sites. And I actually think it is a minor bug (or incompleteness) in the current implementation of satisfiesWithoutRevalidation.

When providing a max-age request CC directive, the server is still allowed to respond with a response that has an age that is larger, as long as it is still fresh. However, it is preferred to do the (synchronous) revalidation (i.e. SHOULD but not MUST).

Assuming my interpretation is correct, the current implementation does not allow me this choice. So maybe not really a bug, but rather an incompleteness.

However, I do think there might be an actual bug there. According to the specification, the response can only be used if it is still fresh (unless if there is also a max-stale request directive, but that is a separate case). And I don't think I see an extra freshness check in that place.

  1. The cached response can NOT be used.

Obviously a revalidation should be used, or a 504 Gateway Timeout should be returned by the cache if not successful.

I feel I know enough to get started on something, but initially it might look slightly different from your API suggestion in order to cater for the cases mentioned above. I actually would like to abstract away any notion of SWR in the API.

Feel free to take your time to reply, and to poke holes in my understanding of the HTTP semantics.

Kind regards.

@kornelski
Copy link
Owner

kornelski commented Jan 17, 2025

SHOULD in RFCs has a specific definition:

there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

For the purpose of this feature you can treat SHOULD like MUST. Speculative "but maybe someone wants to use it differently" is not a specific exception that can be fully understood, so it doesn't count.

@kornelski
Copy link
Owner

I think the approach I suggested above with doRequestAsync does give library user enough control to revalidate synchronously if they want to, or even return the stale response without revalidating.

@stevengssns
Copy link

I do understand the meaning of SHOULD, and I do understand that in this case it is very similar to MUST. However I feel there are still the cases where you MAY want to do revalidation. Which is definitely less demanding than SHOULD, so it feels like it would be helpful to communicate this in the response of the evaluateRequest API in a more explicit way.

And then there is the case where you should do synchronous revalidation, but you are still allowed to use the cached response (if you have a valid reason): I.e When the requestCC max-age cannot be satisfied, and revalidation fails, but the cached response is still fresh, then it can still be served. At least according to my reading of the specs, because it is a bit vague in this area.

But to be honest I think we are discussing details here. And what I have in my head right now is almost the same as what you suggest. So I'll start by trying it out and provide a usage example that covers the cases as I see them. Maybe I'll prove myself wrong in the process, and otherwise we can continue the discussion starting from some a PR.

I might open a separate PR if I feel I might have found a bug in the current implementation (I still have to double check the one that I mentioned above).

Regards.

@stevengssns
Copy link

I updated #49

  • The changes are essentially the same as those that you requested. The only differences are that I have put a stale property inside the response result object instead of using a different property name for the response itself. I added a revalidation property that holds the (origin) 'request' and a flag to indicate if it needs to happen synchronous or not.
  • I did not yet extend the unit tests yet because I first would like to get some feedback. Any preferences on how to add them would also be welcome (e.g. should I copy, modify and extend the tests for satisfiesWithoutRevalidation).
  • The unit tests that were already added show that satisfiesWithoutRevalidation should work just like before the changes. However, the timeToLive is affected, which could affect cache implementations. If they receive responses with swr headers they will probably store the response longer because of the higher TTL.

P.S.: There is no bug (at least not the one I mentioned I was going to double check).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants