Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote dev capi changes to staging #262

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions charts/dev/capi-infra/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,19 @@ openstack-cluster:
autoscale: false
machineFlavor: l3.micro

rolloutStrategy:
type: RollingUpdate
rollingUpdate:
# The maximum number of node group machines that can be unavailable during the update
# Can be an absolute number or a percentage of the desired count
maxUnavailable: 0
# The maximum number of machines that can be scheduled above the desired count for
# the group during an update
# Can be an absolute number or a percentage of the desired count
maxSurge: 1
# One of Random, Newest, Oldest
deletePolicy: Random

healthCheck:
enabled: true
spec:
Expand Down
2 changes: 1 addition & 1 deletion charts/staging/capi-infra/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ version: 1.3.0
dependencies:
- repository: https://azimuth-cloud.github.io/capi-helm-charts
name: openstack-cluster
version: 0.11.2
version: 0.12.2
17 changes: 15 additions & 2 deletions charts/staging/capi-infra/values.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
openstack-cluster:
kubernetesVersion: "1.30.6"
machineImage: "capi-ubuntu-2204-kube-v1.30.6-2024-11-15"
kubernetesVersion: "1.31.4"
machineImage: "capi-ubuntu-2204-kube-v1.31.4-2025-01-07"

# The PEM-encoded CA certificate for openstack.stfc.ac.uk
# this expires 2023-12-05T23:59:59Z (UTC)
Expand Down Expand Up @@ -354,6 +354,19 @@ openstack-cluster:
autoscale: false
machineFlavor: l3.micro

rolloutStrategy:
type: RollingUpdate
rollingUpdate:
# The maximum number of node group machines that can be unavailable during the update
# Can be an absolute number or a percentage of the desired count
maxUnavailable: 0
# The maximum number of machines that can be scheduled above the desired count for
# the group during an update
# Can be an absolute number or a percentage of the desired count
maxSurge: 1
# One of Random, Newest, Oldest
deletePolicy: Random

healthCheck:
enabled: true
spec:
Expand Down
5 changes: 5 additions & 0 deletions charts/staging/longhorn/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
apiVersion: v2
name: longhorn
version: 1.0.0
dependencies:
# https://github.com/longhorn/charts/releases
- name: longhorn
version: 1.7.1
repository: https://charts.longhorn.io
5 changes: 0 additions & 5 deletions charts/staging/longhorn/requirements.yaml

This file was deleted.

9 changes: 5 additions & 4 deletions charts/staging/longhorn/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,15 @@ longhorn:
defaultSettings:
taintToleration: "nvidia.com/gpu:NoSchedule"
snapshotMaxCount: 10
snapshotDataIntegrity: true
snapshotDataIntegtrityCronjob: true
replicaAutoBalance: true
snapshotDataIntegrity: "enabled"
snapshotDataIntegrityCronjob: "0 12 * * 1"
replicaAutoBalance: "best-effort"
autoDeletePodWhenVolumeDetachedUnexpectedly: true
allowVolumeCreationWithDegradedAvailability: true
nodeDrainPolicy: "block-for-eviction"

persistence:
defaultClassReplicaCount: 3
defaultDataLocality: disabled
migratable: "true"


1 change: 0 additions & 1 deletion clusters/dev/worker/infra-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ openstack-cluster:
machineFlavor: l3.micro

nodeGroupDefaults:
machineFlavor: l3.nano
nodeLabels:
# we're running longhorn on this cluster
# set label so worker nodes can host longhorn volumes
Expand Down
12 changes: 6 additions & 6 deletions clusters/staging/management/infra-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,25 +24,25 @@ openstack-cluster:
env: staging
ingress:
hosts:
- prometheus-mgmt.staging.nubes.stfc.ac.uk
- prometheus.staging-mgmt.nubes.stfc.ac.uk
tls:
- hosts:
- prometheus-mgmt.staging.nubes.stfc.ac.uk
- prometheus.staging-mgmt.nubes.stfc.ac.uk
secretName: tls-keypair
grafana:
ingress:
hosts:
- grafana-mgmt.staging.nubes.stfc.ac.uk
- grafana.staging-mgmt.nubes.stfc.ac.uk
tls:
- hosts:
- grafana-mgmt.staging.nubes.stfc.ac.uk
- grafana.staging-mgmt.nubes.stfc.ac.uk
secretName: tls-keypair
alertmanager:
enabled: true
ingress:
hosts:
- alertmanager-mgmt.staging.nubes.stfc.ac.uk
- alertmanager.staging-mgmt.nubes.stfc.ac.uk
tls:
- hosts:
- alertmanager-mgmt.staging.nubes.stfc.ac.uk
- alertmanager.staging-mgmt.nubes.stfc.ac.uk
secretName: tls-keypair
19 changes: 12 additions & 7 deletions clusters/staging/worker/infra-values.yaml
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
openstack-cluster:

controlPlane:
machineCount: 3
Comment on lines +3 to +4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be left implicit at 5, since this means we cannot lose 2 members without a full rebuild, so trying to recover becomes scarier than losing 1 and having another to start recovery from


nodeGroups:
- name: default-md-0
machineCount: 5
machineCount: 3
machineFlavor: l3.micro

Comment on lines +8 to 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the dep-l2 flavours as l3 is quite tight?

nodeGroupDefaults:
machineFlavor: l3.nano
Expand All @@ -22,6 +26,7 @@ openstack-cluster:
loadBalancerIP: "130.246.81.242"

monitoring:
enabled: true
# no need to send alerts around certs/openstack API endpoints for dev/staging clusters
# ends up with too many messages in the ticket queue
blackBoxExporter:
Expand All @@ -36,25 +41,25 @@ openstack-cluster:
env: staging
ingress:
hosts:
- prometheus-worker.staging.nubes.stfc.ac.uk
- prometheus.staging-worker.nubes.stfc.ac.uk
tls:
- hosts:
- prometheus-worker.staging.nubes.stfc.ac.uk
- prometheus.staging-worker.nubes.stfc.ac.uk
secretName: tls-keypair
grafana:
ingress:
hosts:
- grafana-worker.staging.nubes.stfc.ac.uk
- grafana.staging-worker.nubes.stfc.ac.uk
tls:
- hosts:
- grafana-worker.staging.nubes.stfc.ac.uk
- grafana.staging-worker.nubes.stfc.ac.uk
secretName: tls-keypair
alertmanager:
enabled: true
ingress:
hosts:
- alertmanager-worker.staging.nubes.stfc.ac.uk
- alertmanager.staging-worker.nubes.stfc.ac.uk
tls:
- hosts:
- alertmanager-worker.staging.nubes.stfc.ac.uk
- alertmanager.staging-worker.nubes.stfc.ac.uk
secretName: tls-keypair
Loading