Feature request: implement client-defined deployment grace period #144

andy108369 · 2023-11-13T17:05:51Z

I think the Alternative proposal (client-defined) would be ideal if the timeout (the amount of time when the lease is down because it cannot redeploy as the worker node is down) could be configured by the clients themselves in their SDL (say deployment_grace_period or tolerate_downtime in deployment manifests).

This way the clients could specify the amount of time (in hours/days) they can tolerate their app being down.

It could also be useful when providers have been running a deployment with persistent storage and they aren't willing to lose it and can accept the downtime measured even in days, just so that they will be able to get their data back. (I know they should backup the data, monitor the backups; but many don't and backups can get stale/break or corrupted)

The text was updated successfully, but these errors were encountered:

SGC41 · 2023-11-13T19:37:39Z

i think the default of such setting should also be pretty high...
the only good reason to kill a deployment due to some downtime, would be in some sort of HA setup where replacements will have taken over the work anyways.

all these fast lease closures gives tenants are grief.
especially in the case of persistent storage, where they could have been working on their deployment for weeks, making potentially massive changes, which might only exist in the persistent storage.

which then is destroyed after just 10 or 30 minutes of issues from the provider side.

vpavlin · 2024-04-29T14:41:06Z

I am not 100% sure this is the same case, but I'll put it here just in case it is:)

I had a deployment which takes ~20mins to start (syncing some on-chain data) and then I needed to update the image. I screwed up the image format, so the deployment failed. I then fixed the image format, but it seemed like the scheduler/node did not pick up the fix (probably due to backoff) before the lease got automatically closed.

For this case I could definitely see mysel setting this grace period to an hour if that would mean I'd not have to start from scratch in case of a small mess-up:)

andy108369 · 2024-11-18T14:59:18Z

FWIW: The provider-defined deployment grace period can be adjusted since the provider v0.6.4:
akash-network/provider@99cb9ac

I'll update the helm charts to support this in
akash-network/helm-charts#289

andy108369 added repo/provider Akash provider-services repo issues awaiting-triage labels Nov 13, 2023

troian removed the awaiting-triage label Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: implement client-defined deployment grace period #144

Feature request: implement client-defined deployment grace period #144

andy108369 commented Nov 13, 2023 •

edited

Loading

SGC41 commented Nov 13, 2023

vpavlin commented Apr 29, 2024

andy108369 commented Nov 18, 2024 •

edited

Loading

Feature request: implement client-defined deployment grace period #144

Feature request: implement client-defined deployment grace period #144

Comments

andy108369 commented Nov 13, 2023 • edited Loading

SGC41 commented Nov 13, 2023

vpavlin commented Apr 29, 2024

andy108369 commented Nov 18, 2024 • edited Loading

andy108369 commented Nov 13, 2023 •

edited

Loading

andy108369 commented Nov 18, 2024 •

edited

Loading