Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
brabster authored Jan 2, 2024
0 parents commit d3d3933
Show file tree
Hide file tree
Showing 26 changed files with 538 additions and 0 deletions.
5 changes: 5 additions & 0 deletions .env_template
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# copy this file to gitignored `.env` and set the environment for your personal workspace

export DBT_DATASET=sandbox_your_name
export DBT_LOCATION=EU
export DBT_PROJECT=some-project-id # must be the GCP project id, not the project name!
3 changes: 3 additions & 0 deletions .envs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This directory contains environment-specific configurations for use in pipeline deployment.

Example to follow...
3 changes: 3 additions & 0 deletions .envs/prod.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
export DBT_DATASET=pypi
export DBT_LOCATION=US
export DBT_PROJECT=pypi-408816
3 changes: 3 additions & 0 deletions .envs/test.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
export DBT_DATASET=pypi_test
export DBT_LOCATION=US
export DBT_PROJECT=pypi-408816
32 changes: 32 additions & 0 deletions .github/actions/dbt_build/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@

name: dbt build in venv
description: Runs dbt build from venv
inputs:
env:
required: true
description: Environment file to source
runs:
using: composite
steps:
- name: dbt build for ${{ inputs.env }}
shell: bash
run: |
source .venv/bin/activate
source .envs/${{ inputs.env }}.env
rm -rf logs
dbt clean
dbt deps
dbt debug
dbt run
echo "dbt test goes here"
dbt docs generate
- name: upload target artifacts
uses: actions/upload-artifact@v3
with:
name: dbt_artifacts_${{ inputs.env }}
path: |
target
logs
19 changes: 19 additions & 0 deletions .github/actions/setup_dbt/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@

name: Setup DBT in virtualenv
description: Sets up environment suitable for DBT
runs:
using: composite
steps:
- uses: actions/setup-python@v5
with:
python-version: '3.11' # dbt does not support 3.12 yet
check-latest: true
- name: setup-python-venv
shell: bash
run: |
python --version
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -U -r requirements.txt
36 changes: 36 additions & 0 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: deploy-to-gcp
on:
push: {}
jobs:
deploy:
runs-on: ubuntu-latest
env:
PIP_REQUIRE_VIRTUALENV: true
permissions:
contents: read
id-token: write
actions: read
pages: write
steps:
- uses: actions/checkout@v4
with:
base-ref: ref
- uses: ./.github/actions/setup_dbt
- uses: google-github-actions/auth@v2
with:
workload_identity_provider: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }}
service_account: ${{ secrets.GCP_SERVICE_ACCOUNT }}
- uses: google-github-actions/setup-gcloud@v2
with:
version: '>= 363.0.0'
- uses: ./.github/actions/dbt_build
with:
env: test
- uses: ./.github/actions/dbt_build
with:
env: prod
- uses: actions/upload-pages-artifact@v3
with:
path: target
- uses: actions/deploy-pages@v4

15 changes: 15 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Python virtualenv files
.venv/

# User's environment settings
.env

# DBT logs
logs/

# DBT target dir
target/

# DBT packages
dbt_packages/
package-lock.yml
50 changes: 50 additions & 0 deletions .vscode/tasks.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
{
// See https://go.microsoft.com/fwlink/?LinkId=733558
// for the documentation about the tasks.json format
"version": "2.0.0",
"tasks": [
{
"label": "init_venv",
"type": "shell",
"command": "python",
"args": ["-m", "venv", ".venv"]
},
{
"label": "ensure_pip_version",
"type": "shell",
"command": "pip",
"args": ["install", "--upgrade", "pip"],
"dependsOn": ["init_venv"]
},
{
"label": "ensure_python_deps_updated",
"type": "shell",
"command": "pip",
"args": ["install", "-U", "-r", "${workspaceFolder}/requirements.txt"],
"dependsOn": ["init_venv"]
},
{
"label": "load_user_env",
"type": "shell",
"command": ". ${workspaceFolder}/.env"
},
{
"label": "ensure_dbt_packages_updated",
"type": "shell",
"command": "dbt",
"args": ["deps", "--upgrade"],
"dependsOn": ["ensure_python_deps_updated", "load_user_env"]
},
{
"label": "ensure_updated",
"dependsOn": [
"ensure_pip_version",
"ensure_python_deps_updated",
"ensure_dbt_packages_updated"
],
"runOptions": {
"runOn": "folderOpen"
}
}
]
}
77 changes: 77 additions & 0 deletions CONTRIBUTORS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Contributing

When contributing to this repository, please first discuss the change you wish to make via issue,
email, or any other method with the owners of this repository before making a change.

Please note we have a code of conduct, please follow it in all your interactions with the project.

## Pull Request Process

A pull request process will be agreed with the first contributor.

## Code of Conduct

### Our Pledge

In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, gender identity and expression, level of experience,
nationality, personal appearance, race, religion, or sexual identity and
orientation.

### Our Standards

Examples of behavior that contributes to creating a positive environment
include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting

### Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.

### Scope

This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.

### Enforcement

Maintainers will monitor the project for breaches of code of conduct.
An enforcement policy will be set up should the need arise.

### Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at [http://contributor-covenant.org/version/1/4][version]

[homepage]: http://contributor-covenant.org
[version]: http://contributor-covenant.org/version/1/4/
21 changes: 21 additions & 0 deletions LICENCE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 Paul Brabban

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
70 changes: 70 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
A candidate template for a standalone dbt-core/dbt-bigquery based repository.

Thanks to [Equal Experts](https://equalexperts.com) for supporting this work.

dbt docs automatically published on deployment at https://brabster.github.io/dbt_bigquery_template/

# Pre-Reqs

- Python == 3.11 (see https://docs.getdbt.com/faqs/Core/install-python-compatibility)
- [RECOMMENDED] VSCode to use built-in tasks
- Access to GCP Project enabled for BigQuery
- [RECOMMENDED] set environment variable `PIP_REQUIRE_VIRTUALENV=true`
- Prevents accidentally installing to your system Python installation (if you have permissions to do so)

# Setup

- open the terminal
- `Terminal` - `New Terminal`
- update .env with appropriate values
- note project ID not project name (manifests as 404 error)
- `. .env` to update values in use in terminal
- get credentials
- if no valid credential, then error message says default credentials not found
- must be application default credential
- `gcloud auth application-default login`
- `dbt debug` should now succeed and list settings/versions
- if `dbt` is not found, you may need to enter your venv at the terminal
- `. .venv/bin/activate` (`. .venv/Scripts/activate` on Windows/Git-Bash)

# Assumptions

This repo is setup based on assumptions of specific ways of working that I have found to work well.
I'll try and describe them here.

The aim is to apply tried and tested practices that I generally refer to as "engineering" to analytics, so that trust and value can develop.
The following set of principles help explain the choices in this repo structure.

## Data-as-a-Product

Whilst this repo can be used for ad-hoc exploration, it's intended to support a shared set of data that consumers can influence and then build on with confidence.

## You Build It You Run It

A team is responsible for actively developing the data product this repository describes. That team is responsible for operating the product, resolving issues, and maintaining appropriate stability and robustness to build trust with consumers.

## Trunk-Based Development

There is a `main` branch, which is the current version of the data product. This is the only long-lived branch, and will persist from creation of the repository until it is decommissioned. Engineers will branch from `main` to implement a change, then a Pull Request process with appropriate approvals will control the merge of that change back to `main` as the next iteration of the data product.

## Developer Sandbox Datasets

In order to develop in a branching style without risk of collision between different work-in-progress, engineers will need a sandbox dataset to work in. I've found that personal sandboxes in the same project as `main` is a simple approach that works well.
This repo assumes that developers will have such a sandbox (or will have permissions to create one, see `on-run-start` hook in [dbt_project.yml](dbt_project.yml)) and have set their local, personal `.env` variables to refer to it.

## Always Up-To-Date

There are several supply chains providing dependencies for this repo. When developing interactively, important sources are:

- Your Python runtime, including the venv module
- `pip` package manager in the virtualenv
- Python packages via PyPI
- dbt packages

Aside from the Python runtime which must be present to bootstrap the repo, these sources are set by default to update automatically to the latest available versions. A VSCode task is included to automatically update your local environment, and the CI system will update to latest on each run.

I believe this setup minimises the risk related to software dependencies that users of this template are exposed to by default.

## Self-Contained and Self-Describing

The repo aims to be as self-contained as possible, minimising what's needed in an engineer's development environment, and making the CI setup as similar as possible to that of the engineer's environment.
Loading

0 comments on commit d3d3933

Please sign in to comment.