docs: add ADRs and RFCs

JasperHG90 · Feb 17, 2024 · ec37b50 · ec37b50
1 parent 4e3d710
commit ec37b50
Show file tree

Hide file tree

Showing 11 changed files with 866 additions and 5 deletions.
diff --git a/docs/ADRs/.gitkeep b/docs/ADRs/.gitkeep
diff --git a/docs/ADRs/static/backfills.png → ...ment/3700d996b63719480ecb6cc84ce2ccb9.png b/docs/ADRs/static/backfills.png → ...ment/3700d996b63719480ecb6cc84ce2ccb9.png
diff --git a/docs/ADRs/⚙️ Limiting concurrency for Dagster jobs.md b/docs/ADRs/⚙️ Limiting concurrency for Dagster jobs.md
@@ -0,0 +1,70 @@
+---
+creation date: 2024-02-17 13:02
+tags:
+  - ADR
+  - template
+  - dagster
+  - concurrency
+  - GKE
+  - partitions
+template: "[[🏷 Templates/ADR template]]"
+status: ✅ Accepted
+---
+- [[#✍️ Context|✍️ Context]]
+- [[#🤝 Decision|🤝 Decision]]
+	- [[#🤝 Decision#🏷️ Tag-based concurrency limits|🏷️ Tag-based concurrency limits]]
+- [[#☝️Consequences|☝️Consequences]]
+- [[#➡️ Follow-ups|➡️ Follow-ups]]
+- [[#🔗 References|🔗 References]]
+
+## ✍️ Context
+---
+We want to limit concurrency on Dagster jobs because. When we are triggering lots of jobs (e.g. because of a job with a lot of partitions), we don't want to open up too many connections to the PostgreSQL database, or start too many pods at the same time on the GKE cluster.
+
+## 🤝 Decision
+---
+We did some research in [[⚙️ Dagster concurrency]], and found that the easiest way to limit concurrency is to impose it on the entire cluster on the **job** level (all options described [here](https://docs.dagster.io/guides/limiting-concurrency-in-data-pipelines)). It is possible to scope this to the type of job (e.g. backfills versus other runs, example below).
+
+> [!important]
+> You can limit concurrency at different levels. This ADR only describes limiting job concurrency. It does not describe how you can limit asset concurrency (e.g. if you have lots of assets running in a single run). This should be configured in the job specification of a DAG. More information can be found [here](https://docs.dagster.io/guides/limiting-concurrency-in-data-pipelines#configuring-opasset-level-concurrency).
+
+Limiting concurrency using this approach is achieved by setting the following property when deploying the Dagster Helm chart:
+
+```terraform
+  set {
+    name  = "dagsterDaemon.runCoordinator.config.queuedRunCoordinator.maxConcurrentRuns"
+    value = 5
+  }
+```
+
+This uses the default `QueuedRunCoordinator`. See the [dagster documentation](https://docs.dagster.io/deployment/run-coordinator) for more information.
+
+### 🏷️ Tag-based concurrency limits
+Dagster allows you to set concurrency limits [based on tags](https://docs.dagster.io/guides/limiting-concurrency-in-data-pipelines#configuring-opasset-level-concurrency). For now, this is only configured for backfills, which are limited to three concurrent runs.
+
+```terraform
+  set {
+    name  = "dagsterDaemon.runCoordinator.config.queuedRunCoordinator.tagConcurrencyLimits[0].key"
+    value = "dagster/backfill"
+  }
+
+  set {
+    name  = "dagsterDaemon.runCoordinator.config.queuedRunCoordinator.tagConcurrencyLimits[0].limit"
+    value = 3
+  }
+```
+
+## ☝️Consequences
+---
+Easier:
+- Don't have to worry about concurrency at the DAG level since this has been taken care of by the system.
+- Can use tags to change default concurrency limit of five jobs if required.
+
+## ➡️ Follow-ups
+---
+- Check out advanced limits on orchestration using e.g. [tags defined on jobs](https://docs.dagster.io/guides/limiting-concurrency-in-data-pipelines#limiting-concurrency-using-tags)
+- Check out limiting global [job concurrency across runs](https://docs.dagster.io/guides/limiting-concurrency-in-data-pipelines#limiting-opasset-concurrency-across-runs)
+
+## 🔗 References
+---
+- [[⚙️ Dagster concurrency]]
diff --git a/...ne-time scripts with Dagster OS on GKE.md → ...ne-time scripts with Dagster OS on GKE.md b/...ne-time scripts with Dagster OS on GKE.md → ...ne-time scripts with Dagster OS on GKE.md
@@ -1,5 +1,3 @@
-# 🎚️ Triggering one-time scripts with Dagster OS on GKE
-
 ---
 creation date: 2024-02-16 10:02
 tags:
@@ -13,8 +11,11 @@ tags:
   - "#pydantic"
   - one-time-jobs
 template: "[[🏷 Templates/ADR template]]"
-status: Proposed
+status: ✅ Accepted
 ---
+- [[#✍️ Context|✍️ Context]]
+- [[#🤝 Decision|🤝 Decision]]
+- [[#☝️Consequences|☝️Consequences]]
 
 ## ✍️ Context
 ---
@@ -30,9 +31,9 @@ Create a new library called "dagster_scripts", which contains a CLI for one-time
 5. To actually run a manual production job, we create a CI/CD pipeline that needs to be manually triggered. The design of this pipeline is subject of another ADR.
 6. The CI/CD pipeline triggers a kubernetes job, which pulls the docker image containing the CLI and runs the desired command.
 7. The kubernetes job template will be added to the repo, and should be templated to allow for CLI version specification.
-   ![](static/backfills.png)
+   ![](attachment/3700d996b63719480ecb6cc84ce2ccb9.png)
 
-## ☝️ Consequences
+## ☝️Consequences
 ---
 Harder:
 - Quickly running backfills. These should not be executed using the Dagster UI, but through a PR, which adds overhead.
@@ -43,3 +44,9 @@ Easier:
 - Tracing manual production jobs and their settings.
 - History of manual production jobs executed over time.
 - Everything is declared in code.
+
+## 🔗 References
+---
+- [[🔘 Manually triggering jobs on the Dagster GKE production server using a configuration file]]
+- [[💪 Executing manually triggered Dagster jobs within GKE]]
+- [[🚀 Defining a CICD job to manually trigger Dagster jobs in GKE from a configuration file]]
diff --git a/docs/ADRs/📉 Limiting resource usage in Dagster jobs.md b/docs/ADRs/📉 Limiting resource usage in Dagster jobs.md
@@ -0,0 +1,61 @@
+---
+creation date: 2024-02-17 13:02
+tags:
+  - ADR
+  - template
+  - dagster
+  - GKE
+  - configuration
+  - orchestration
+template: "[[🏷 Templates/ADR template]]"
+status: ✅ Accepted
+---
+- [[#✍️ Context|✍️ Context]]
+- [[#🤝 Decision|🤝 Decision]]
+- [[#☝️Consequences|☝️Consequences]]
+- [[#➡️ Follow-ups|➡️ Follow-ups]]
+
+## ✍️ Context
+---
+We need to be able to control the resources that are used by a Dagster job running on GKE. Each job uses its own pod, but not all pods require the same resources.
+
+## 🤝 Decision
+---
+We will leave it up to the engineer designing the DAG which resources are required. This can be done by specifying the following configuration on a Dagster job:
+
+```python
+from dagster import define_asset_job, multiprocess_executor
+
+
+ingestion_job = define_asset_job(
+	...
+	tags={
+        "dagster-k8s/config": {
+            "container_config": {
+                "resources": {
+                    "requests": {
+	                    "cpu": "100m",
+	                    "memory": "64Mi"
+	                },
+                    "limits": {
+	                    "cpu": "100m",
+	                    "memory": "64Mi"
+	                },
+                },
+            },
+        }
+    },
+)
+```
+Other options are described [here](https://docs.dagster.io/deployment/guides/kubernetes/customizing-your-deployment).
+## ☝️Consequences
+---
+Easier
+- Users can specify their own needs in terms of compute resources
+
+Harder
+- Need to monitor compute resources to see if lots of compute is going idle.
+
+## ➡️ Follow-ups
+- RFC: monitoring compute resources (and alerting)
+- Write documentation about setting resource constraints, and how you should determine these.
diff --git a/docs/RFCs/.gitkeep b/docs/RFCs/.gitkeep
diff --git a/docs/RFCs/attachment/7d65f20360ea246c181ac860297c6735.png b/docs/RFCs/attachment/7d65f20360ea246c181ac860297c6735.png
diff --git a/docs/RFCs/attachment/88ab3f750c9431a58e5b245c36b7e66d.png b/docs/RFCs/attachment/88ab3f750c9431a58e5b245c36b7e66d.png