-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use envtest to parallelize tests by giving each test its own etcd & kube-apiserver to use #33
Comments
Ginkgo can run in parallel, but doing so spins up separate `go test` processes and an operator for *each* worker / CPU core. This creates competing operators that fight to process the same Stack CRs, and causes concurrency issues that ultimately lead to indeterministic update states. Spawning a single operator in Ginkgo to share amongst a set of tests would be ideal, but Ginkgo does not support running shared services in a global context for the entirety of the test suite. Ultimately, the operator will need to be configured with leader election to settle contention between multiple Operator instances. Once available, this should allow ginkgo to run in parallel again. See: - #33 - operator-framework/operator-sdk#3585 - https://onsi.github.io/ginkgo/#parallel-specs - https://docs.openshift.com/container-platform/4.5/operators/operator_sdk/osdk-leader-election.html
Ginkgo can run in parallel, but doing so spins up separate `go test` processes and an operator for *each* worker / CPU core. This creates competing operators that fight to process the same Stack CRs, and causes concurrency issues that ultimately lead to indeterministic update states. Spawning a single operator in Ginkgo to share amongst a set of tests would be ideal, but Ginkgo does not support running shared services in a global context for the entirety of the test suite. Ultimately, the operator will need to be configured with leader election to settle contention between multiple Operator instances. Once available, this should allow ginkgo to run in parallel again. See: - #33 - operator-framework/operator-sdk#3585 - https://onsi.github.io/ginkgo/#parallel-specs - https://docs.openshift.com/container-platform/4.5/operators/operator_sdk/osdk-leader-election.html
Leader election is actually enabled by default within clusters thanks to the operator-sdk. This issue is w.r.t running concurrent controller binaries in the test suite if using When controllers run like this connected to the same cluster but the binaries don't actually run in-cluster, leader election can't help contentious competes. The long term solution is to not share a cluster amongst the controller binaries if using test parallelism, and instead configure each test with its own ephemeral envtest to simulate the API with a per-test, isolated etcd & kube-apiserver instance. |
Hi, any ETA on when this will be fixed? We have currently deployed single replica Pulumi Operator in our dev env, which currently manages 10 stacks and it consumes around 8-10Gb memory and 1-2cpus. In production we might need to manage around 20-30 stacks, so it would be helpful if we can deploy multi replica Pulumi operator, so that each instance will run on a separate node with sufficient resources instead of needing a single beefy node |
Good news everyone, we just released a preview of Pulumi Kubernetes Operator v2. This new release has a whole-new architecture that uses pods as the execution environment. The scalability issue mentioned by @infa-nang has been addressed. Also, the Please read the announcement blog post for more information: Would love to hear your feedback! Feel free to engage with us on the #kubernetes channel of the Pulumi Slack workspace. |
The operator-sdk supports
MaxConcurrentReconcilations
, which is the number of workers a given operator can spawn to handle reconciliation loops. By default this is set to 1, but we set it to 10 (an arbitrary value greater than the default).While this handles concurrency within a single operator to work with the APIserver, it does not handle concurrency across multiple instances/replicas of an operator. This means competing operators will fight to process the same Stack CRs, and causes concurrency issues that ultimately lead to extraneous reconciliation loops, and indeterministic stack update sequences.
The operator will need to be configured with leader election to settle contention between multiple Operator instances by using an active-passive setup.
Related: operator-framework/operator-sdk#3585
The text was updated successfully, but these errors were encountered: