Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Add SSA cache metrics #11635

Merged
merged 6 commits into from
Jan 21, 2025

Conversation

cahillsf
Copy link
Member

@cahillsf cahillsf commented Jan 2, 2025

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #10527

/area util

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/util Issues or PRs related to utils cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 2, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 2, 2025
internal/util/ssa/cache.go Outdated Show resolved Hide resolved
type newCacheOption func(*newCacheConfig)

// WithOwner allows definition of the owner field to be used in NewCache.
func WithOwner(owner string) newCacheOption {
Copy link
Member Author

@cahillsf cahillsf Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it actually looks like we get a provider label when pulling this metric, which may make this cache_owner label extension redundant?

added an example call here in local testing:
r.ssaCache = ssa.NewCache(ssa.WithOwner("kubeadmcontrolplane"))

and was getting metrics like this:

  {
    "metric": {
      "__name__": "capi_ssa_cache_request_total",
      "cache_owner": "kubeadmcontrolplane",
      "instance": "10.244.0.17:8443",
      "job": "capi-providers",
      "kind": "ssa_cache",
      "namespace": "capi-kubeadm-control-plane-system",
      "pod": "capi-kubeadm-control-plane-controller-manager-85d788c785-gw247",
      "provider": "control-plane-kubeadm",
      "status": "miss"
    },
    "values": [
      [
        1735767358.995,
        "358"
      ],
...

Copy link
Member

@sbueringer sbueringer Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC provider is added by Prometheus when scraping the metrics. But I'm looking basically for a "controller" (i.e. reconciler) label, not one for the entire binary (I'll add an additional comment)

@cahillsf cahillsf force-pushed the add-ssa-cache-metrics branch from 540ea95 to 87ba1bc Compare January 2, 2025 23:52
@cahillsf cahillsf force-pushed the add-ssa-cache-metrics branch from 87ba1bc to 33017f9 Compare January 2, 2025 23:56
@sbueringer sbueringer changed the title 🌱 Add ssh cache metrics 🌱 Add SSA cache metrics Jan 20, 2025
internal/util/ssa/cache.go Outdated Show resolved Hide resolved
internal/util/ssa/cache.go Outdated Show resolved Hide resolved
internal/util/ssa/metrics.go Outdated Show resolved Hide resolved
type newCacheOption func(*newCacheConfig)

// WithOwner allows definition of the owner field to be used in NewCache.
func WithOwner(owner string) newCacheOption {
Copy link
Member

@sbueringer sbueringer Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC provider is added by Prometheus when scraping the metrics. But I'm looking basically for a "controller" (i.e. reconciler) label, not one for the entire binary (I'll add an additional comment)

@sbueringer
Copy link
Member

Might be worth removing draft and adding [WIP] to the PR title to keep the PR In WIP. Then you get CI jobs running

@cahillsf cahillsf changed the title 🌱 Add SSA cache metrics [WIP] 🌱 Add SSA cache metrics Jan 20, 2025
@cahillsf cahillsf marked this pull request as ready for review January 20, 2025 18:37
@sbueringer sbueringer added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jan 21, 2025
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, looks pretty good.

I think you can remove WIP

controlplane/kubeadm/internal/controllers/controller.go Outdated Show resolved Hide resolved
internal/util/ssa/metrics.go Outdated Show resolved Hide resolved
internal/util/ssa/metrics.go Outdated Show resolved Hide resolved
internal/util/ssa/metrics.go Outdated Show resolved Hide resolved
@cahillsf cahillsf changed the title [WIP] 🌱 Add SSA cache metrics 🌱 Add SSA cache metrics Jan 21, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 21, 2025
@sbueringer
Copy link
Member

Thank you!

/lgtm

Flake is independent (already opened an issue for that one)
/retest

/test pull-cluster-api-e2e-main

/assign @fabriziopandini @chrischdi

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 21, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: e64d38d2446d7ea30efad7b4895e1067bf868163

@chrischdi
Copy link
Member

/approve

Thanks!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrischdi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 21, 2025
@chrischdi
Copy link
Member

/retest (independent flake)

@k8s-ci-robot
Copy link
Contributor

@chrischdi: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

/test pull-cluster-api-build-main
/test pull-cluster-api-e2e-blocking-main
/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main
/test pull-cluster-api-e2e-latestk8s-main
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-mink8s-main
/test pull-cluster-api-e2e-upgrade-1-32-1-33-main
/test pull-cluster-api-test-main
/test pull-cluster-api-test-mink8s-main
/test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

/test pull-cluster-api-apidiff-main

Use /test all to run the following jobs that were automatically triggered:

pull-cluster-api-apidiff-main
pull-cluster-api-build-main
pull-cluster-api-e2e-blocking-main
pull-cluster-api-test-main
pull-cluster-api-verify-main

In response to this:

/retest (independent flake)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrischdi
Copy link
Member

/retest

@fabriziopandini
Copy link
Member

Nice work!
I'm curious to see values for the new metrics when we run the next stress test
/lgtm

@fabriziopandini
Copy link
Member

/hold
it seems there is a race, but I'm not sure is related to current changes https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api/11635/pull-cluster-api-test-main/1881753541584883712

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 21, 2025
@sbueringer
Copy link
Member

That one is independent of this PR. Occurs pretty frequently since we merged this test case via another PR: #11722

/retest
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 21, 2025
@k8s-ci-robot k8s-ci-robot merged commit 2096276 into kubernetes-sigs:main Jan 21, 2025
20 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.10 milestone Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/util Issues or PRs related to utils cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metrics for SSA cache
5 participants