This repository has been archived by the owner on May 18, 2020. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
jaeger-cassandra-schema-job: unset activeDeadlineSeconds (#125)
* jaeger-cassandra-schema-job: unset activeDeadlineSeconds The current strategy is to abort the job after two minutes (across all corresponding pod invocations). In that case the job shows as 'failed' and a human is required to intervene (there is no concept of an automatic restart of a job in Kubernetes). With this patch the job tries forever to create the schema until one of the pods it starts succeeds doing so. That is, with this change the job never goes into the permanent 'failed' state. That change is expected to smoothen deployment in in environments where Cassandra takes a little less predictable amount of time until it is available. So far, in those environments the dealine can hit in and then a human needs to re-schedule the job to address this *transient* problem. With this patch the system heals itself, instead. Notes: - activeDeadlineSeconds is a mechnanism for aborting a retry-loop in case of a *permanent* error such as misconfiguration. - Here, `activeDeadlineSeconds: 120` was introduced three years ago in the first major commit of this template. It stands to reason that it was simply copy/pasted and did not have a deep rationale. Also, since then the job execution semantics around failure handling have changed a bit: https://github.com/kubernetes/community/pull/583/files Signed-off-by: Dr. Jan-Philip Gehrcke <[email protected]>
- Loading branch information