Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Admission Controllers #48

Open
dtornow opened this issue Dec 15, 2021 · 8 comments
Open

Admission Controllers #48

dtornow opened this issue Dec 15, 2021 · 8 comments
Assignees

Comments

@dtornow
Copy link

dtornow commented Dec 15, 2021

Author: Dominik Tornow

Summary of the feature being proposed

1. Admission Controllers

Provide Temporal Admission Controllers, similar to Kubernetes Admission Controllers, that intercept requests to Temporal before they get processed

Admission Controllers come in two flavors:

  • Validating Admission Controller
  • Mutating Admission Controller

Validating Admission Controllers may not alter a request, Mutating Admission Controller may alter a request.

2. Admission Workflows & Activities

Enable developers to implement Admission Controllers as Temporal Workflows and Temporal Activities, therefore enabling the developer to extend Temporal with Temporal.

What value does this feature bring to Temporal?

Admission Controllers have proven to be a surprisingly simple yet surprisingly powerful extension mechanism for Kubernetes, they may prove to be a simple yet powerful extension mechanism for Temporal

  1. Example, Validating Admission Controller

The user may define a Validating Admission Controller that rejects a StartWorkflowExecutionRequest if the workflow_id does not match some regex

  1. Example, Mutating Admission Controller

The user may define a Mutating Admission Controller that implements a simple versioning strategy:

  • Initially, the Admission Controller modifies every StartWorkflowExecutionRequest of type Foo to Foo-v1
  • Later, the Admission Controller modifies every StartWorkflowExecutionRequest of type Foo to Foo-v2

Are you willing to implement this feature yourself?

The temporal team

@cretz
Copy link
Member

cretz commented Dec 15, 2021

Can you clarify whether you are encouraging webhooks? Anyone can of course put a gRPC proxy in front or https://pkg.go.dev/go.temporal.io/server/temporal#WithAuthorizer and start Temporal programmatically. See https://docs.temporal.io/docs/server/security/#authorizer-plugin-interface. Do we need yet another mechanism? Do we need to support mutating or maybe allow gRPC interceptors directly?

@dtornow
Copy link
Author

dtornow commented Dec 15, 2021

  • Yes, I am encouraging generic (web)hooks, open for discussion are hooks both on API calls as well Commands within RespondWorkflowTaskCompleted API calls.
  • Ideally hooks can be added to a vanilla Temporal deployment via configuration, again similar to K8s
  • Ideally hooks can be validating as well as mutating and including side effects like querying 3rd part APIs.

@yiminc
Copy link
Member

yiminc commented Dec 16, 2021

Custom gRPC interceptor is already supported: temporalio/temporal#2156 @cretz

@robzienert
Copy link

How would these admission controller workflows be registered? Would a worker startup and signal Temporal itself that it is available? Would these workflows be invoked for all start requests, or would the registration process include filtering criteria?

@dtornow
Copy link
Author

dtornow commented Dec 16, 2021

While there is a wide range of possibilities, I would advocate for: Admission Controller Workflows are registered with the Temporal Cluster, probably per (namespace, workflow type, workflow queue) and (namespace, activity type, activity queue) with wildcard support and Temporal starts the Admission Controller Workflow - which is itself not subject to Admission Controllers (duh, haha)

@cretz
Copy link
Member

cretz commented Dec 16, 2021

I am not sure workflows are a good primitive for intercepting every RPC request due to performance issues. I think webhooks are reasonable. I don't think the benefit of dogfooding here (very minimal surely) outweighs the cost.

@dtornow
Copy link
Author

dtornow commented Dec 16, 2021

Absolutely possible that workflows are not the right primitive here! Could be webhooks, could be WebAssembly hooks, could be all three.

@jlegrone
Copy link

jlegrone commented Dec 17, 2021

This sounds really interesting! I can provide some context on how my company might leverage this feature internally.

Re #48 (comment), we use the temporal.WithAuthorizer option in production today. Our authorizer implementation is both "validating" and "mutating" in that it:

  1. Extracts auth claims from the grpc headers (we probably don't want to pass grpc headers to admission controllers, so this would likely remain the same)
  2. Modifies the workflow or activity headers to include the auth claims
  3. Passes auth claims and request payloads to a rego policy engine (at this point the request may be rejected)

We've also considered performing other request modifications here, like rewriting task queues or aliasing workflow types.

I think there are a few things we've observed with this setup that this proposal might help improve upon:

  • Multitenancy/distributed ownership: developers in different business domains, security, etc can deploy and run their own admission webhooks and we don't need to support each use case in the same authorizer plugin.
  • Better observability: Temporal could provide more visibility into what changes were made by mutating admission webhooks. See also monitoring admission webhooks in the k8s docs.
  • Less Temporal server deployment churn: updating our authorizer plugin means redeploying Temporal server (or at least the frontend). In practice we haven't gone to the trouble of not compiling the plugin into the binary we run for history/matching/worker so those get rolled out too.
  • The authorizer plugin is low level: We've abstracted validation use cases a good bit via rego policies, but when mutating requests it's hard to know the rules of the road.
    • Ie. modifying a workflow ID is probably a bad idea, but what about a task queue? Or did you remember to make the same change in the case for both StartWorkflowExecutionRequest and SignalWithStartWorkflowExecutionRequest?

Other things we care about:

  • Minimal latency
    • We'd want to allow teams to layer a handful of admission webhooks together without worrying about the overhead. We don't currently have use cases for stateful/long-running validations, and if we did we could probably support those via interceptors at the worker level.
  • Filtering webhook evaluations by namespace, task queue, or workflow/activity/signal/query type
  • Guarantees around order of evaluation when multiple webhooks are registered
  • Timeouts and "fail open" option
    • Fail open would be helpful in evaluating new webhooks with lower risk in production. See also failure policy in k8s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants