Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use envtest to parallelize tests by giving each test its own etcd & kube-apiserver to use #33

Closed
metral opened this issue Jul 29, 2020 · 3 comments
Assignees
Labels
kind/enhancement Improvements or new features resolution/fixed This issue was fixed

Comments

@metral
Copy link
Contributor

metral commented Jul 29, 2020

The operator-sdk supports MaxConcurrentReconcilations, which is the number of workers a given operator can spawn to handle reconciliation loops. By default this is set to 1, but we set it to 10 (an arbitrary value greater than the default).

While this handles concurrency within a single operator to work with the APIserver, it does not handle concurrency across multiple instances/replicas of an operator. This means competing operators will fight to process the same Stack CRs, and causes concurrency issues that ultimately lead to extraneous reconciliation loops, and indeterministic stack update sequences.

The operator will need to be configured with leader election to settle contention between multiple Operator instances by using an active-passive setup.

Related: operator-framework/operator-sdk#3585

metral added a commit that referenced this issue Jul 30, 2020
Ginkgo can run in parallel, but doing so spins up separate `go test`
processes and an operator for *each* worker / CPU core.

This creates competing operators that fight to process the same Stack CRs, and
causes concurrency issues that ultimately lead to indeterministic update states.

Spawning a single operator in Ginkgo to share amongst a set of tests
would be ideal, but Ginkgo does not support running shared services in a
global context for the entirety of the test suite.

Ultimately, the operator will need to be configured with leader election
to settle contention between multiple Operator instances. Once available,
this should allow ginkgo to run in parallel again.

See:
 - #33
 - operator-framework/operator-sdk#3585
 - https://onsi.github.io/ginkgo/#parallel-specs
 - https://docs.openshift.com/container-platform/4.5/operators/operator_sdk/osdk-leader-election.html
metral added a commit that referenced this issue Jul 30, 2020
Ginkgo can run in parallel, but doing so spins up separate `go test`
processes and an operator for *each* worker / CPU core.

This creates competing operators that fight to process the same Stack CRs, and
causes concurrency issues that ultimately lead to indeterministic update states.

Spawning a single operator in Ginkgo to share amongst a set of tests
would be ideal, but Ginkgo does not support running shared services in a
global context for the entirety of the test suite.

Ultimately, the operator will need to be configured with leader election
to settle contention between multiple Operator instances. Once available,
this should allow ginkgo to run in parallel again.

See:
 - #33
 - operator-framework/operator-sdk#3585
 - https://onsi.github.io/ginkgo/#parallel-specs
 - https://docs.openshift.com/container-platform/4.5/operators/operator_sdk/osdk-leader-election.html
@metral
Copy link
Contributor Author

metral commented Oct 5, 2020

Leader election is actually enabled by default within clusters thanks to the operator-sdk.

This issue is w.r.t running concurrent controller binaries in the test suite if using ginkgo -p, which run the controllers locally on the client, yet the controllers must connect to a k8s cluster to work with the API. In testing, we currently stand up an ephemeral GKE cluster and share that amongst tests.

When controllers run like this connected to the same cluster but the binaries don't actually run in-cluster, leader election can't help contentious competes.

The long term solution is to not share a cluster amongst the controller binaries if using test parallelism, and instead configure each test with its own ephemeral envtest to simulate the API with a per-test, isolated etcd & kube-apiserver instance.

@metral metral changed the title Consider adding leader election to support multiple running instances of the operator Use envtest to parallelize tests by giving each test its own etcd & kube-apiserver to use Oct 5, 2020
@infin8x infin8x added kind/enhancement Improvements or new features and removed enhancement labels Jul 10, 2021
@infa-nang
Copy link

Hi, any ETA on when this will be fixed?

We have currently deployed single replica Pulumi Operator in our dev env, which currently manages 10 stacks and it consumes around 8-10Gb memory and 1-2cpus.

In production we might need to manage around 20-30 stacks, so it would be helpful if we can deploy multi replica Pulumi operator, so that each instance will run on a separate node with sufficient resources instead of needing a single beefy node

@EronWright EronWright added the resolution/fixed This issue was fixed label Oct 30, 2024
@EronWright EronWright self-assigned this Oct 30, 2024
@EronWright
Copy link
Contributor

EronWright commented Oct 30, 2024

Good news everyone, we just released a preview of Pulumi Kubernetes Operator v2. This new release has a whole-new architecture that uses pods as the execution environment. The scalability issue mentioned by @infa-nang has been addressed. Also, the MaxConcurrentReconcilations parameter was tuned for each controller.

Please read the announcement blog post for more information:
https://www.pulumi.com/blog/pulumi-kubernetes-operator-2-0/

Would love to hear your feedback! Feel free to engage with us on the #kubernetes channel of the Pulumi Slack workspace.
cc @infa-nang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Improvements or new features resolution/fixed This issue was fixed
Projects
None yet
Development

No branches or pull requests

5 participants