Skip to content

Commit

Permalink
docs: add ADRs and RFCs
Browse files Browse the repository at this point in the history
  • Loading branch information
JasperHG90 committed Feb 17, 2024
1 parent 4e3d710 commit ec37b50
Show file tree
Hide file tree
Showing 11 changed files with 866 additions and 5 deletions.
Empty file removed docs/ADRs/.gitkeep
Empty file.
File renamed without changes
70 changes: 70 additions & 0 deletions docs/ADRs/⚙️ Limiting concurrency for Dagster jobs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
creation date: 2024-02-17 13:02
tags:
- ADR
- template
- dagster
- concurrency
- GKE
- partitions
template: "[[🏷 Templates/ADR template]]"
status: ✅ Accepted
---
- [[#✍️ Context|✍️ Context]]
- [[#🤝 Decision|🤝 Decision]]
- [[#🤝 Decision#🏷️ Tag-based concurrency limits|🏷️ Tag-based concurrency limits]]
- [[#☝️Consequences|☝️Consequences]]
- [[#➡️ Follow-ups|➡️ Follow-ups]]
- [[#🔗 References|🔗 References]]

## ✍️ Context
---
We want to limit concurrency on Dagster jobs because. When we are triggering lots of jobs (e.g. because of a job with a lot of partitions), we don't want to open up too many connections to the PostgreSQL database, or start too many pods at the same time on the GKE cluster.

## 🤝 Decision
---
We did some research in [[⚙️ Dagster concurrency]], and found that the easiest way to limit concurrency is to impose it on the entire cluster on the **job** level (all options described [here](https://docs.dagster.io/guides/limiting-concurrency-in-data-pipelines)). It is possible to scope this to the type of job (e.g. backfills versus other runs, example below).

> [!important]
> You can limit concurrency at different levels. This ADR only describes limiting job concurrency. It does not describe how you can limit asset concurrency (e.g. if you have lots of assets running in a single run). This should be configured in the job specification of a DAG. More information can be found [here](https://docs.dagster.io/guides/limiting-concurrency-in-data-pipelines#configuring-opasset-level-concurrency).
Limiting concurrency using this approach is achieved by setting the following property when deploying the Dagster Helm chart:

```terraform
set {
name = "dagsterDaemon.runCoordinator.config.queuedRunCoordinator.maxConcurrentRuns"
value = 5
}
```

This uses the default `QueuedRunCoordinator`. See the [dagster documentation](https://docs.dagster.io/deployment/run-coordinator) for more information.

### 🏷️ Tag-based concurrency limits
Dagster allows you to set concurrency limits [based on tags](https://docs.dagster.io/guides/limiting-concurrency-in-data-pipelines#configuring-opasset-level-concurrency). For now, this is only configured for backfills, which are limited to three concurrent runs.

```terraform
set {
name = "dagsterDaemon.runCoordinator.config.queuedRunCoordinator.tagConcurrencyLimits[0].key"
value = "dagster/backfill"
}
set {
name = "dagsterDaemon.runCoordinator.config.queuedRunCoordinator.tagConcurrencyLimits[0].limit"
value = 3
}
```

## ☝️Consequences
---
Easier:
- Don't have to worry about concurrency at the DAG level since this has been taken care of by the system.
- Can use tags to change default concurrency limit of five jobs if required.

## ➡️ Follow-ups
---
- Check out advanced limits on orchestration using e.g. [tags defined on jobs](https://docs.dagster.io/guides/limiting-concurrency-in-data-pipelines#limiting-concurrency-using-tags)
- Check out limiting global [job concurrency across runs](https://docs.dagster.io/guides/limiting-concurrency-in-data-pipelines#limiting-opasset-concurrency-across-runs)

## 🔗 References
---
- [[⚙️ Dagster concurrency]]
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
# 🎚️ Triggering one-time scripts with Dagster OS on GKE

---
creation date: 2024-02-16 10:02
tags:
Expand All @@ -13,8 +11,11 @@ tags:
- "#pydantic"
- one-time-jobs
template: "[[🏷 Templates/ADR template]]"
status: Proposed
status: ✅ Accepted
---
- [[#✍️ Context|✍️ Context]]
- [[#🤝 Decision|🤝 Decision]]
- [[#☝️Consequences|☝️Consequences]]

## ✍️ Context
---
Expand All @@ -30,9 +31,9 @@ Create a new library called "dagster_scripts", which contains a CLI for one-time
5. To actually run a manual production job, we create a CI/CD pipeline that needs to be manually triggered. The design of this pipeline is subject of another ADR.
6. The CI/CD pipeline triggers a kubernetes job, which pulls the docker image containing the CLI and runs the desired command.
7. The kubernetes job template will be added to the repo, and should be templated to allow for CLI version specification.
![](static/backfills.png)
![](attachment/3700d996b63719480ecb6cc84ce2ccb9.png)

## ️ Consequences
## ️Consequences
---
Harder:
- Quickly running backfills. These should not be executed using the Dagster UI, but through a PR, which adds overhead.
Expand All @@ -43,3 +44,9 @@ Easier:
- Tracing manual production jobs and their settings.
- History of manual production jobs executed over time.
- Everything is declared in code.

## 🔗 References
---
- [[🔘 Manually triggering jobs on the Dagster GKE production server using a configuration file]]
- [[💪 Executing manually triggered Dagster jobs within GKE]]
- [[🚀 Defining a CICD job to manually trigger Dagster jobs in GKE from a configuration file]]
61 changes: 61 additions & 0 deletions docs/ADRs/📉 Limiting resource usage in Dagster jobs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
creation date: 2024-02-17 13:02
tags:
- ADR
- template
- dagster
- GKE
- configuration
- orchestration
template: "[[🏷 Templates/ADR template]]"
status: ✅ Accepted
---
- [[#✍️ Context|✍️ Context]]
- [[#🤝 Decision|🤝 Decision]]
- [[#☝️Consequences|☝️Consequences]]
- [[#➡️ Follow-ups|➡️ Follow-ups]]

## ✍️ Context
---
We need to be able to control the resources that are used by a Dagster job running on GKE. Each job uses its own pod, but not all pods require the same resources.

## 🤝 Decision
---
We will leave it up to the engineer designing the DAG which resources are required. This can be done by specifying the following configuration on a Dagster job:

```python
from dagster import define_asset_job, multiprocess_executor


ingestion_job = define_asset_job(
...
tags={
"dagster-k8s/config": {
"container_config": {
"resources": {
"requests": {
"cpu": "100m",
"memory": "64Mi"
},
"limits": {
"cpu": "100m",
"memory": "64Mi"
},
},
},
}
},
)
```
Other options are described [here](https://docs.dagster.io/deployment/guides/kubernetes/customizing-your-deployment).
## ☝️Consequences
---
Easier
- Users can specify their own needs in terms of compute resources

Harder
- Need to monitor compute resources to see if lots of compute is going idle.

## ➡️ Follow-ups
- RFC: monitoring compute resources (and alerting)
- Write documentation about setting resource constraints, and how you should determine these.
Empty file removed docs/RFCs/.gitkeep
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit ec37b50

Please sign in to comment.