Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Week 3 #38

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/sqlfluff/sqlfluff
rev: 2.3.5
hooks:
- id: sqlfluff-lint
additional_dependencies: [
'dbt-snowflake==1.7.1',
'sqlfluff-templater-dbt==2.3.5'
]
- id: sqlfluff-fix
stages: [manual] # this command is available only to run manually
additional_dependencies: [
'dbt-snowflake==1.7.1',
'sqlfluff-templater-dbt==2.3.5'
]
- repo: https://github.com/dbt-checkpoint/dbt-checkpoint
rev: v1.2.0
hooks:
- id: dbt-compile # Compiles dbt (necessary for future hooks)
- id: dbt-docs-generate # Generates the dbt docs (necessary for some future hooks)
- id: check-source-table-has-description # Ensures all source tables have descriptions
- id: check-model-has-tests # Ensures all models have at least 2 tests
args: ["--test-cnt", "2", "--"]
files: ^models/
- id: check-script-semicolon # Ensure that the model does not have a semicolon at the end of the file.
- id: check-script-has-no-table-name # Ensures models only use source or ref macro to specify the table name.
- id: check-model-has-all-columns # Ensures that mart models have all columns in the database also specified in the .yml
files: ^models/marts
12 changes: 12 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
dbt-core = "==1.7.4"
dbt-snowflake = "==1.7.1"
pre-commit = "==3.6.0"

[requires]
python_version = "3.11"
1,076 changes: 1,076 additions & 0 deletions Pipfile.lock

Large diffs are not rendered by default.

11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@

#### Models
- The primary key column must have not_null and unique schema tests.
- All boolean columns must have an accepted_values schema test. The accepted values are true and false.
- Columns that contain category values must have an accepted_values schema test.
- Columns that should never be null must have a not_null schema test.
- Columns that should be unique must have a unique schema test.
- Where possible, use schema tests from the dbt_utils or dbt_expectations packages to perform extra verification.
- For models that AREN'T staging models:
- All boolean columns must have an accepted_values schema test. The accepted values are true and false.
- Columns that contain category values must have an accepted_values schema test.
- Columns that should never be null must have a not_null schema test.
- Columns that should be unique must have a unique schema test.
- Where possible, use schema tests from the dbt_utils or dbt_expectations packages to perform extra verification.
48 changes: 44 additions & 4 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ log-path: 'logs'
target-path: 'target'
packages-install-path: 'dbt_packages'

on-run-end: "{{ dbt_project_evaluator.print_dbt_project_evaluator_issues() }}"

clean-targets:
- "target"
- "dbt_packages"
Expand All @@ -24,10 +26,10 @@ models:
+materialized: view
+grants:
select: ['transformer', 'reporter']

staging:
+schema: staging

intermediate:
+schema: intermediate

Expand All @@ -43,5 +45,43 @@ seeds:
dbt_project_evaluator:
dbt_project_evaluator_exceptions:
+enabled: false

# on-run-end: "{{ dbt_project_evaluator.print_dbt_project_evaluator_issues() }}"
course_advanced_dbt:
unit_testing:
+schema: unit_testing
+tags: unit_testing
unit_test_expected_output_int_dates:
+column_types:
calender_date: date
date_week: date
date_month: date
date_quarter: date
date_year: date
day_of_month: number
year_num: number
quarter_num: number
month_num: number
unit_test_expected_output_dim_subscriptions:
+column_types:
subscription_id: number
subscription_plan_id: number
user_id: number
starts_at: timestamp
ends_at: timestamp
plan_name: varchar
pricing: number
billing_period: varchar
unit_test_expected_output_dim_mrr:
+column_types:
surrogate_key: varchar
date_month: date
user_id: number
subscription_id: number
starts_at: timestamp
ends_at: timestamp
plan_name: varchar
mrr_amount: number
mrr_change: number
retained_mrr_amount: number
previous_month_mrr_amount: number
change_category: varchar
month_retained_number: number
1 change: 0 additions & 1 deletion macros/cents_to_dollars.sql
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
{% macro cents_to_dollars(column_name) %}
({{ column_name }} / 100.0)::DECIMAL(18, 2)
{% endmacro %}

6 changes: 6 additions & 0 deletions macros/find_previous_partition.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{% macro find_previous_partition(column_name, partition_by='user_id', partition_by_2='subscription_date', order_by='date_month', coalesce_value=FALSE) %}
COALESCE(
LAG({{ column_name }}) OVER (PARTITION BY {{ partition_by }}, {{ partition_by_2 }} ORDER BY {{ order_by }}),
{{ coalesce_value }}
)
{% endmacro %}
16 changes: 16 additions & 0 deletions macros/find_previous_partition.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
macros:
- name: find_previous_partition
description: >
This macro finds the previous value of a column using the LAG window function, and coalesces the result. Use this when you are looking
to compare values of a current row to values of a previous row.
arguments:
- name: column_name
description: The field whose value you are finding
- name: partition_by
description: The field you are partitioning by
- name: partition_by_2
description: The 2nd field you are partitioning by
- name: order_by
description: The field you are ordering the values by
- name: coalesce_value
description: The value to be returned if the LAG function returns NULL
7 changes: 0 additions & 7 deletions macros/rolling_average_7_periods.sql

This file was deleted.

7 changes: 7 additions & 0 deletions macros/rolling_average_n_periods.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{% macro rolling_average_n_periods(column_name, partition_by, order_by='created_at', n_periods=6) %}
avg( {{ column_name }} ) OVER (
PARTITION BY {{ partition_by }}
ORDER BY {{ order_by }}
ROWS BETWEEN {{ n_periods }} PRECEDING AND CURRENT ROW
) AS avg_{{ n_periods }}_periods_{{ column_name }}
{% endmacro %}
16 changes: 16 additions & 0 deletions macros/rolling_average_n_periods.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
macros:
- name: rolling_average_n_periods
description: This macro finds the rolling average of field over n number of periods
arguments:
- name: column_name
type: integer
description: The field you are taking the rolling average of
- name: partition_by
type: varchar
description: The field that you want the rolling average calculation to be grouped by
- name: order_by
type: timestamp
description: The timestamp you want the data to be ordered by in the rolling average calculation
- name: n_periods
type: integer
description: The number of periods you wish the rolling average to be calculated over
5 changes: 5 additions & 0 deletions macros/truncate_date.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{% macro truncate_date(column_name, date_type='month') %}

DATE( DATE_TRUNC('{{ date_type }}', {{ column_name }}) )

{% endmacro %}
12 changes: 12 additions & 0 deletions macros/truncate_date.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
macros:
- name: truncate_date
description: >
This macro truncates a timestamp to find the date type specified. Use this whenever you are extracting a part of a date from
a timestamp function, such as month, day, year, etc.
arguments:
- name: column_name
type: timestamp
description: The timestamp column you wish to truncate
- name: date_type
type: string
description: The date type that you wish to extract from the timestamp e.g. month
19 changes: 19 additions & 0 deletions models/docs/docs_bingeflix.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,22 @@ This section contains documentation from the Bingeflix Users table.
{% docs bingeflix_column_user_id %}
The unique identifier of the Bingeflix user. A user is created when...
{% enddocs %}

## Subscriptions
This section contains documentation from the Bingeflix Subscriptions table.

{% docs bingeflix_column_subscription_id %}
The unique identifier for the subscription.
{% enddocs %}

{% docs bingeflix_column_subscription_plan_id %}
The unique identifier of the subscription plan.
{% enddocs %}

{% docs bingeflix_column_subscription_starts_at %}
When the subscription started.
{% enddocs %}

{% docs bingeflix_column_subscription_ends_at %}
When the subscription ends. This value is NULL if the subscription is active.
{% enddocs %}
4 changes: 3 additions & 1 deletion models/intermediate/int_dates.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
{% set import_dates_final = unit_testing_select_table(final, ref('unit_test_expected_output_int_dates')) %}

WITH

date_spine AS (
Expand Down Expand Up @@ -25,4 +27,4 @@ final AS (
date_spine
)

SELECT * FROM final
SELECT * FROM {{ import_dates_final }}
4 changes: 4 additions & 0 deletions models/intermediate/int_dates.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@ models:
- name: int_dates
description: This is a calendar table including all the consecutive dates between the 2019-01-01 and 2030-01-01.
It also includes week, month, quarter, year, and other values associated with a specific date.
tests:
- dbt_utils.equality:
compare_model: ref('unit_test_expected_output_int_dates')
tags: ['unit_testing']
columns:
- name: calendar_date
description: The calendar date.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ models:
description: The number of times the user logged out during the specified session.

- name: video_watched_count
description: The number of times the user watched videos during the specified session.
description: The number of times the user watched videos during the specified session.
5 changes: 4 additions & 1 deletion models/marts/core/dim_subscriptions.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
{% set import_subscriptions_final = unit_testing_select_table(final, ref('unit_test_expected_output_dim_subscriptions')) %}


WITH

final AS (
Expand All @@ -16,4 +19,4 @@ final AS (
ON s.subscription_plan_id = sp.subscription_plan_id
)

SELECT * FROM final
SELECT * FROM {{ import_subscriptions_final }}
16 changes: 10 additions & 6 deletions models/marts/core/dim_subscriptions.yml
Original file line number Diff line number Diff line change
@@ -1,32 +1,36 @@
models:
- name: dim_subscriptions
description: This model contains information about Bingeflix subscriptions.
description: This model contains information about Bingeflix events.
tests:
- dbt_utils.equality:
compare_model: ref('unit_test_expected_output_dim_subscriptions')
tags: ['unit_testing']
columns:
- name: subscription_id
description: The unique identifier of the subscription.
description: '{{ doc("bingeflix_column_subscription_id") }}'
tests:
- not_null
- unique

- name: subscription_plan_id
description: The subscription plan identifier.
description: '{{ doc("bingeflix_column_subscription_plan_id") }}'
tests:
- not_null
- accepted_values:
values: [1, 2, 3]

- name: user_id
description: The identifier of the user.
description: '{{ doc("bingeflix_column_user_id") }}'
tests:
- not_null

- name: starts_at
description: When the subscription started.
description: '{{ doc("bingeflix_column_subscription_starts_at") }}'
tests:
- not_null

- name: ends_at
description: When the subscription ends.
description: '{{ doc("bingeflix_column_subscription_ends_at") }}'

- name: plan_name
description: The name of the subscription plan.
Expand Down
6 changes: 6 additions & 0 deletions models/marts/core/dim_users.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,9 @@ models:
description: Where the user lives.
tests:
- not_null

- name: first_subscription_starts_at
description: Timestamp that the user's first subscription starts

- name: count_of_subscriptions
description: Total number of subscriptions for the user
2 changes: 0 additions & 2 deletions models/marts/core/fct_events.sql
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,4 @@ SELECT
user_id,
event_name,
event_id

FROM {{ ref('stg_bingeflix__events') }}

27 changes: 27 additions & 0 deletions models/marts/core/fct_events.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
models:
- name: fct_events
description: This model contains information about Bingeflix subscriptions.
columns:
- name: session_id
description: The unique id of a session.
tests:
- not_null
- unique

- name: created_at
description: The timestamp that the session was created.
tests:
- not_null

- name: user_id
description: '{{ doc("bingeflix_column_user_id") }}'
tests:
- not_null

- name: event_name
description: The name of the event.
tests:
- not_null

- name: event_id
description: The unique ID of an event.
Loading