Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Istio RBAC issues when submitting a pipeline from a Notebook #2747

Closed
7 tasks done
Tracked by #2763
kimwnasptd opened this issue Jun 5, 2024 · 16 comments
Closed
7 tasks done
Tracked by #2763

Comments

@kimwnasptd
Copy link
Member

Validation Checklist

  • Is this a Kubeflow issue?
  • Are you posting in the right repository ?
  • Did you follow the installation guide https://github.com/kubeflow/manifests?tab=readme-ov-file ?
  • Is the issue report properly structured and detailed with version numbers?
  • Is this for Kubeflow development ?
  • Would you like to work on this issue?
  • Join our slack channel using wg-manifests.

Version

1.9

Describe your issue

I tried to create a pipeline from the KFP SDK, from a Notebook, but the python code that creates the experiment/run is failing with the following errors

From the notebook cell

ERROR:root:Failed to get healthz info attempt 1 of 5.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/kfp/client/client.py", line 424, in get_kfp_healthz
    return self._healthz_api.get_healthz()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
kfp_server_api.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'content-length': '19', 'content-type': 'text/plain', 'date': 'Wed, 05 Jun 2024 17:28:20 GMT', 'server': 'envoy', 'x-envoy-upstream-service-time': '11'})
HTTP response body: RBAC: access denied

And from the logs of the KFP API Server istio sidecar

[2024-06-05T17:28:20.788Z] "GET /apis/v2beta1/healthz HTTP/1.1" 403 - rbac_access_denied_matched_policy[none] - "-" 0 19 0 - "-" "OpenAPI-Generator/2.0.3/python" "9a5
0db59-9e33-420d-bb16-efb29c5e504f" "ml-pipeline.kubeflow.svc.cluster.local:8888" "-" inbound|8888|| - 10.244.1.33:8888 10.244.2.23:47694 outbound_.8888_._.ml-pipeline.kubeflow.svc.cluster.local -

Steps to reproduce the issue

  1. Deploy the latest v1.9-branch
  2. Use a KinD cluster
  3. Create the PodDefault for KFP from https://www.kubeflow.org/docs/components/pipelines/v1/sdk/connect-api/#full-kubeflow-subfrom-inside-clustersub
  4. Create an ipynb with Data Passing pipeline https://github.com/kubeflow/pipelines/blob/master/samples/tutorials/Data%20passing%20in%20python%20components/Data%20passing%20in%20python%20components%20-%20Files.py
  5. Submit the pipeline

Put here any screenshots or videos (optional)

No response

@kimwnasptd
Copy link
Member Author

kimwnasptd commented Jun 5, 2024

@juliusvonkohout @kromanow94 after playing a bit around and catching up with the oauth2-proxy with some quick testing the root cause must have been the RequestAuthentication for the K8s issuer.

Specifically, this RequestAuthentication will create a header kubeflow-userid from the ServiceAccount token that the KFP SDK adds as an Authentication: Bearer <token> header to the request.

But by adding this header it means that the request will not match the current AuthorizationPolicy, that blocks requests that do NOT come from the IngressGateway to set the kubeflow-userid header

@kimwnasptd
Copy link
Member Author

kimwnasptd commented Jun 5, 2024

In this case by removing the above lines, from the RequestAuthentication, that set this header for K8s issuer tokens removes the issue.

This comes back a bit to the discussion about using headers vs tokens everywhere. But to move us a bit forward I suggest that for the K8s issuer RequestAuthentication objects we just don't set these headers for now.

@thesuperzapper
Copy link
Member

thesuperzapper commented Jun 5, 2024

I haven't been following the oauth2-proxy stuff, but this is important because this issue will not just affect Notebook Pods.

Users have always assumed that Pods that directly access the ml-pipeline-ui.kubeflow.svc.cluster.local service can pass their Authorization header (with a Pod ServiceAccount Token) and KFP will give them the associated RBAC of that ServiceAccount.

That is, there will be no JWT (from dex/oauth2-proxy) for these requests, because they authenticate themselves with their Pod service account token.

See the "Full Kubeflow (from inside cluster)" part of this page for examples which need to work:

@kimwnasptd
Copy link
Member Author

I've put quite a bit of more thought after seeing the above. I'm preparing a small spec to go over the details. We'll need to indeed be careful on both the expectations around JWTs and also what changes to do, by having a uniform plan

@kimwnasptd
Copy link
Member Author

kimwnasptd commented Jun 6, 2024

So what I propose for unblocking this is to

  1. update the AuthorizationPolicy in KFP
    - when:
    - key: request.headers[kubeflow-userid]
    notValues: ['*']
  2. Change it to always needing a JWT for talking to KFP
    - from:
      - source:
          requestPrincipals: ["*"]

And for the above to ensure

  1. Istio is configured with RequestAuthentication to recognise id_tokens (ServiceAccount tokens) where the issuer is the K8s cluster
  2. The above RequestAuthentication must set the kubeflow-userid header from the JWT

If the header is not set, via RequestAuthentication, then this means anyone can forge a request with a K8s ServiceAccount token but also set this header. Then because the backend evaluates first the header, they can impersonate any user.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jun 6, 2024

@thesuperzapper yes, it is a critical feature that we have to fix for 1.9
@kimwnasptd I can only review this next week. Hopefully by then @kromanow94 is back from vacation as well.

I really like the JWT specification proposal and we should also create an additional test for that usecase to our github actions.

@kromanow94
Copy link
Contributor

Hey All,

I'm finally back from vacation and can chime in.

@kimwnasptd, to comment on the #2747 (comment):

(...) the root cause must have been the RequestAuthentication for the K8s issuer.

I wouldn't call this the root cause. By removing the configuration from RequestAuthentication that puts the sub claim in kubeflow-userid, we're only mitigating the issue in KFP backend but the other components that require kubeflow-userid header will stop working as they will not have the header available for auth decisions.

But, I'd call the root cause what you've described about the AuthorizationPolicy/ml-pipeline which is blocking traffic with the kubeflow-userid header configured.

We also should take into account if we want to force auth check with oauth2-proxy directly for ml-pipeline or not. The difference is that:

  • if we enable that through AuthorizationPolicy with custom action, oauth2-proxy as the provider and app=ml-pipeline as the selector, oauth2-proxy will stand as guard of the KFP API and will only allow traffic that is accepted by auth middleware.

    • with current setup, it also means that the oauth2-proxy has to be configured with extra jwt issuer configured
      with sub=aud pair like so:

      https://kubernetes.default.svc.cluster.local=pipelines.kubeflow.org
      
  • if we don't configure AuthorizationPolicy with custom action and oauth2-proxy as the provider for ml-pipeline, we omit oauth2-proxy and rely directly on istio and RequestAuthentication. I guess this is not very bad, but from my perspective oauth2-proxy (or other configured auth middleware) should be the tool that makes the decision if a request should be accepted to the service mesh or not. This opens some flexibility in the future where we can loose some coupling with Istio and allow other service meshes to work with Kubeflow and maybe improve the boundaries of each tool/depdenency.

I'm a fan of the first option because it is more secure, streamlined and promotes decoupling.

I present you this diff that you can apply locally on your v1.9-branch branch to make it work with the preffered scenario:

diff --git a/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml b/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml
index a9a45e5e..cbef023f 100644
--- a/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml
+++ b/apps/pipeline/upstream/base/installs/multi-user/istio-authorization-config.yaml
@@ -32,10 +32,24 @@ spec:
         - cluster.local/ns/kubeflow/sa/ml-pipeline-scheduledworkflow
         - cluster.local/ns/kubeflow/sa/ml-pipeline-viewer-crd-service-account
         - cluster.local/ns/kubeflow/sa/kubeflow-pipelines-cache
-  # For user workloads, which cannot user http headers for authentication
-  - when:
-    - key: request.headers[kubeflow-userid]
-      notValues: ['*']
+  - from:
+    - source:
+        requestPrincipals: ["*"]
+---
+apiVersion: security.istio.io/v1
+kind: AuthorizationPolicy
+metadata:
+  name: ml-pipeline-oauth2-proxy
+  namespace: kubeflow
+spec:
+  action: CUSTOM
+  provider:
+    name: oauth2-proxy
+  selector:
+    matchLabels:
+      app: ml-pipeline
+  rules:
+  - {}
 ---
 apiVersion: security.istio.io/v1beta1
 kind: AuthorizationPolicy
diff --git a/common/oidc-client/oauth2-proxy/overlays/m2m-self-signed/kustomization.yaml b/common/oidc-client/oauth2-proxy/overlays/m2m-self-signed/kustomization.yaml
index fd56fa31..8e88dff7 100644
--- a/common/oidc-client/oauth2-proxy/overlays/m2m-self-signed/kustomization.yaml
+++ b/common/oidc-client/oauth2-proxy/overlays/m2m-self-signed/kustomization.yaml
@@ -16,4 +16,4 @@ configMapGenerator:
   literals:
   - ALLOW_SELF_SIGNED_ISSUER=true
   - ENABLE_M2M_TOKENS=true
-  - EXTRA_JWT_ISSUERS=https://kubernetes.default.svc.cluster.local=https://kubernetes.default.svc.cluster.local
+  - EXTRA_JWT_ISSUERS=https://kubernetes.default.svc.cluster.local=https://kubernetes.default.svc.cluster.local,https://kubernetes.default.svc.cluster.local=pipelines.kubeflow.org

Now, assuming you also have the PodDefaults configured for accessing ml-pipeline, running such script from a KF Notebook successfully creates a KF Pipeline Run:

#!/usr/bin/env python3

from kfp import dsl
import kfp


client = kfp.Client()
experiment_name = "my-experiment"
experiment_namespace = "kubeflow-user-example-com"


@dsl.component
def add(a: float, b: float) -> float:
    """Calculates sum of two arguments"""
    return a + b


@dsl.pipeline(
    name="Addition pipeline",
    description="An example pipeline that performs addition calculations.",
)
def add_pipeline(
    a: float = 1.0,
    b: float = 7.0,
):
    first_add_task = add(a=a, b=4.0)
    second_add_task = add(a=first_add_task.output, b=b)


try:
    print("getting experiment...")
    experiment = client.get_experiment(
        experiment_name=experiment_name, namespace=experiment_namespace
    )
    print("got experiment!")
except Exception:
    print("creating experiment...")
    experiment = client.create_experiment(
        name=experiment_name, namespace=experiment_namespace
    )
    print("created experiment!")

client.create_run_from_pipeline_func(
    add_pipeline,
    arguments={"a": 7.0, "b": 8.0},
    experiment_id=experiment.experiment_id,
    enable_caching=False,
)

@kimwnasptd, I saw your proposal about unifying JWT handling. I like that you've already proposed usage of the rules.from.source.requestPrincipals. I think this is the right approach from istio perspective as it would force usage of RequestAuthentication, which will also enable usage of group and sub claims in all the AuthorizationPolicies.

I have a few thoughts and I'll put them there in incoming days, but in general, I'd suggest:

  • dropping the istio-ingressgateway-service-account principal and just ensure that any principal is available through rules.from.source.requestPrincipals,
  • dropping the pipelines.kubeflow.org audience and rely on the standard kubernetes audience so we don't have to use PodDefaults to enable secure api access from KF Notebook,
  • specifying AuthorizationPolicy objects with custom action and oauth2-proxy as the provider for every component that's exposed to users (api-server, jupyter-web-app, central-dashboard, etc...), not just for the istio-ingressgateway, to bring security closer to the components,
  • not dropping usage of JWTs in the components. If we drop JWT verification in the components, we're tightening coupling with Istio and if Istio is misconfigured, it poses a serious security risk where kubeflow-userid and kubeflow-groups headers can be injected. Relying on the JWTs directly would also bring maturity and more possibilities/flexibility in the future.
    • But, I think the implementation of JWT verification in the components should be improved. AFAIR, currently the JWTs are verified only against K8s API with TokenReview. If we were to rely on a more general approach and the kf components can verify the JWTs by themself, this would be much more secure. We can use a combination of both (so TokenReview if the token was issued by K8s or github.com/golang-jwt/jwt/v5 otherwise; both should end up with SubjectAccessReview).
      • But, I admit that your proposal with dropping JWT verification in the components and relying directly on Istio would work. I'm just not sure if this would be considered a good decision from project maturity perspective.

This way we have uniformed way of authz, both from out of cluster and in-cluster perspective, and we don't drop the security behind istio-ingressgateway. RBAC can still be defined with RoleBindings but the source of truth for auth should always be a JWT and never a header.


So, from my perspective, I think it would be best to define a goal where at end we use only JWTs from Authorization headers and drop the kubeflow-userid and kubeflow-groups headers completely. This will not be possible for 1.9 because of how close we are to the release but maybe for 1.10. I'd be definitely available to help with this initiative, both for implementation of manifests, golang and python code.

Reference on the implementation of direct JWT verification: https://medium.com/@cheickzida/golang-implementing-jwt-token-authentication-bba9bfd84d60

@kromanow94
Copy link
Contributor

Hey, I continued and updated some thoughts in the proposal: Uniform way for handling JWTs: #2748 (comment)

In summary, I had some thoughts and decided to drop the emphasis on not removing and tightening the implementation of JWT in Kubeflow components. While this is still something I'd love to see, I understand that have to balance between the functionality and development and offload whatever possible to external components, so just relying on Istio to verify and authorize requests based on the JWTs is fine.

With the above in mind, I agree that it should be enough to change the AuthorizationPolicy/ml-pipeline rules to only allow user traffic if any trusted principal is available. My proposed changes to add pipelines.kubeflow.org audience to oauth2-proxy and AuthorizationPolicy/ml-pipeline-oauth2-proxy is not something we should add as part of the resolution for this issue.

@kromanow94
Copy link
Contributor

kromanow94 commented Jun 18, 2024

@thesuperzapper

That is, there will be no JWT (from dex/oauth2-proxy) for these requests, because they authenticate themselves with their Pod service account token.

The ServiceAccount Token is a K8s Issued JWT and dex also provides JWT after login. Also, oauth2-proxy is configured to accept JWTs from dex and kubernetes.svc.cluster.local.

Please see:

From the surrounding architecture perspective, we're fully capable of using JWTs directly in KF Components. But,
we would have to switch from TokenReview API to some generic JWT Validation code/routine to fully use it.

@kromanow94
Copy link
Contributor

I put my comments in the proposal: Uniform way for handling JWTs. I like where this is going but we also have to cover for current release.

I've made a PR where I change the AuthorizationPolicy/ml-pipeline to allow access from any trusted principal and will add test with gh-workflow to create KF Notebook and start KF Pipeline Run from that Notebook. This is still in progress.

authorize requests to ml-pipeline endpoint if contains any trusted principals (#2753)

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jun 25, 2024

@kromanow94 we also need a negative test to check that it does not work without the proper token, but just the faked userid header.
Lets do that in a follow up PR.

since we have to get RC.2 ready and Kimonas approved it in #2747 (comment). I merged your PR #2753, but we have to do a full assessment next week with more security tests.

And we have to bring some of those changes to Kubeflow/pipelines or it will break with the next synchronization again.

@kromanow94
Copy link
Contributor

@juliusvonkohout didn't you mean @kromanow94 ? 😅

Yes, negative test also makes sense. I can take care of this but only next week.

If you'd like my help on bringing those changes to kubeflow/pipelines, I can take care of that but maybe you can describe what should be a part of these changes?

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jun 26, 2024

@juliusvonkohout didn't you mean @kromanow94 ? 😅

Yes, negative test also makes sense. I can take care of this but only next week.

If you'd like my help on bringing those changes to kubeflow/pipelines, I can take care of that but maybe you can describe what should be a part of these changes?

@juliusvonkohout didn't you mean @kromanow94 ? 😅

Yes, negative test also makes sense. I can take care of this but only next week.

If you'd like my help on bringing those changes to kubeflow/pipelines, I can take care of that but maybe you can describe what should be a part of these changes?

Yes the right Romanov :-D Especially the change of the KFP authorizationpolicy to requestprincipals in their multi-user manifests is important.

@juliusvonkohout
Copy link
Member

@nagar-ajay it works in https://github.com/kubeflow/manifests/actions/runs/9741004851/job/26879289585?pr=2781 and is in 1.9rc.2, but we will leave this issue open until it is fixed upstream.

@rimolive rimolive moved this from In Progress to Done in 1.9 Release Jul 17, 2024
@juliusvonkohout
Copy link
Member

It is fixed in manifests and upstreaming is tracked in #2804

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

No branches or pull requests

4 participants