Replies: 3 comments 4 replies
-
I know I ask because if your success and failure expressions both evaluate to false, the result is indeterminate. The Promotion will idle in the queue for a while and the request will be retried after some time. If you were getting back a 403 or 503 or something, you'd keep retrying and you'd exhaust the timeout every time. Do you have any way of confirming connectivity to the endpoint in question isn't being obstructed by a network policy or something? |
Beta Was this translation helpful? Give feedback.
-
The endpoint is reachable, but I believe the retry HTTP request from the Kargo steps endpoint is not working. This is because, prior to that step, I have a git push step, and the HTTP steps are executed afterward. The changes pushed to GitHub take a couple of minutes for reconciliation. I believe the issue might be with the retry mechanism—either it is not working as expected, or there is no option to define a time interval to wait before retrying. The HTTP Kargo stage shows as failed because of this. However, when I re-promote the same version after the changes are deployed in Kubernetes, it shows as successful. |
Beta Was this translation helpful? Give feedback.
-
@krancour It took some time to troubleshoot, but I found out that the issue was with the CDN. The version got changed, but due to the cache, the Kargo health check failed. Sorry for the inconvenience. |
Beta Was this translation helpful? Give feedback.
-
I have the code snippet below in the Kargo stage steps, which performs an HTTP GET to check whether the new version is deployed.
The URL endpoint health check output is:
{"status":"Healthy","version":"v0.1.0"}
When the promotion is done, it triggers the deployment in Kubernetes. After the new version is deployed, this step should pass. It typically takes a maximum of 4-5 minutes for the changes to be reconciled in the Kubernetes cluster. However, with the above configuration, it shows as failed even though the new version is already deployed and the endpoint is healthy.
How can I configure it to redo the health check every 30 or 60 seconds to verify if the new version is deployed? It should only fail if the timeout is reached. In my case, it does not take 10 minutes to deploy a new version.
Beta Was this translation helpful? Give feedback.
All reactions