Use envtest to parallelize tests by giving each test its own etcd & kube-apiserver to use #33

metral · 2020-07-29T22:44:23Z

The operator-sdk supports MaxConcurrentReconcilations, which is the number of workers a given operator can spawn to handle reconciliation loops. By default this is set to 1, but we set it to 10 (an arbitrary value greater than the default).

While this handles concurrency within a single operator to work with the APIserver, it does not handle concurrency across multiple instances/replicas of an operator. This means competing operators will fight to process the same Stack CRs, and causes concurrency issues that ultimately lead to extraneous reconciliation loops, and indeterministic stack update sequences.

The operator will need to be configured with leader election to settle contention between multiple Operator instances by using an active-passive setup.

Related: operator-framework/operator-sdk#3585

The text was updated successfully, but these errors were encountered:

Ginkgo can run in parallel, but doing so spins up separate `go test` processes and an operator for *each* worker / CPU core. This creates competing operators that fight to process the same Stack CRs, and causes concurrency issues that ultimately lead to indeterministic update states. Spawning a single operator in Ginkgo to share amongst a set of tests would be ideal, but Ginkgo does not support running shared services in a global context for the entirety of the test suite. Ultimately, the operator will need to be configured with leader election to settle contention between multiple Operator instances. Once available, this should allow ginkgo to run in parallel again. See: - #33 - operator-framework/operator-sdk#3585 - https://onsi.github.io/ginkgo/#parallel-specs - https://docs.openshift.com/container-platform/4.5/operators/operator_sdk/osdk-leader-election.html

metral · 2020-10-05T21:13:36Z

Leader election is actually enabled by default within clusters thanks to the operator-sdk.

This issue is w.r.t running concurrent controller binaries in the test suite if using ginkgo -p, which run the controllers locally on the client, yet the controllers must connect to a k8s cluster to work with the API. In testing, we currently stand up an ephemeral GKE cluster and share that amongst tests.

When controllers run like this connected to the same cluster but the binaries don't actually run in-cluster, leader election can't help contentious competes.

The long term solution is to not share a cluster amongst the controller binaries if using test parallelism, and instead configure each test with its own ephemeral envtest to simulate the API with a per-test, isolated etcd & kube-apiserver instance.

infa-nang · 2023-03-20T14:53:20Z

Hi, any ETA on when this will be fixed?

We have currently deployed single replica Pulumi Operator in our dev env, which currently manages 10 stacks and it consumes around 8-10Gb memory and 1-2cpus.

In production we might need to manage around 20-30 stacks, so it would be helpful if we can deploy multi replica Pulumi operator, so that each instance will run on a separate node with sufficient resources instead of needing a single beefy node

EronWright · 2024-10-30T00:10:23Z

Good news everyone, we just released a preview of Pulumi Kubernetes Operator v2. This new release has a whole-new architecture that uses pods as the execution environment. The scalability issue mentioned by @infa-nang has been addressed. Also, the MaxConcurrentReconcilations parameter was tuned for each controller.

Please read the announcement blog post for more information:
https://www.pulumi.com/blog/pulumi-kubernetes-operator-2-0/

Would love to hear your feedback! Feel free to engage with us on the #kubernetes channel of the Pulumi Slack workspace.
cc @infa-nang

metral mentioned this issue Jul 29, 2020

Add more StackController loop hardening #34

Merged

metral changed the title ~~Consider adding leader election to support multiple running instances of the operator~~ Use envtest to parallelize tests by giving each test its own etcd & kube-apiserver to use Oct 5, 2020

leezen added the enhancement label Nov 13, 2020

infin8x added kind/enhancement Improvements or new features and removed enhancement labels Jul 10, 2021

EronWright added the resolution/fixed This issue was fixed label Oct 30, 2024

EronWright self-assigned this Oct 30, 2024

EronWright closed this as completed Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use envtest to parallelize tests by giving each test its own etcd & kube-apiserver to use #33

Use envtest to parallelize tests by giving each test its own etcd & kube-apiserver to use #33

metral commented Jul 29, 2020

metral commented Oct 5, 2020

infa-nang commented Mar 20, 2023

EronWright commented Oct 30, 2024 •

edited

Loading

Use envtest to parallelize tests by giving each test its own etcd & kube-apiserver to use #33

Use envtest to parallelize tests by giving each test its own etcd & kube-apiserver to use #33

Comments

metral commented Jul 29, 2020

metral commented Oct 5, 2020

infa-nang commented Mar 20, 2023

EronWright commented Oct 30, 2024 • edited Loading

EronWright commented Oct 30, 2024 •

edited

Loading