Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sander project 1 #51

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions .sqlfluff
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ template_blocks_indent = True

[sqlfluff:layout:type:comma]
spacing_before = touch
line_position = trailing
line_position = leading

[sqlfluff:templater]
unwrap_wrapped_queries = true
Expand All @@ -35,4 +35,7 @@ apply_dbt_builtins = true
single_table_references = consistent

[sqlfluff:ambiguous.column_references]
group_by_and_order_by_style = consistent
group_by_and_order_by_style = consistent

[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = upper
13 changes: 13 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
dbt-core = "==1.8.3"
dbt-snowflake = "==1.8.3"
sqlfluff = "==3.1.1"
sqlfluff-templater-dbt = "==3.1.1"

[requires]
python_version = "3.12"
1,322 changes: 1,322 additions & 0 deletions Pipfile.lock

Large diffs are not rendered by default.

20 changes: 20 additions & 0 deletions Project_1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
## Task 2: Doc Blocks

* Added doc blocks for all columns and tables in the Bingeflix data source.
* Created a new doc block file for the ads platform.
* Replaced all table and column descriptions in source yml and stg yml with doc block.
* I did not replace downstream references with doc blocks just to save myself some time. But recognize this is one of the primary benefits of using doc blocks.
* IRL I would probably identify high priortiy columns based on usage in joins and key metrics and prioritize that way.

## Task 3: Project Evaluator

* Added missing `fct_events.yml`. This resolved 4 issues realting to documentation coverage, undocumented models, missing pk tests, and test coverage.
* Renamed `mmr.sql` to `fct_mrr.sql`.
* Added source description to `source_ads_platform.yml`.
* Added `stg_bingeflix__events` to `dbt_project_evaluator_exceptions.csv` to ignore this table from `is_empty_fct_model_fanout_`.
* Added freshness tests on `source_ads_platform.yml` and `source_bingeflix.yml`.
* Without much context on this data I'm not convinced this was the best decision. It may have been better to mark the test as ignore and add freshness checks where it intuitively makes sense (events on created date, user on created date, etc).

## Optional: sqlfluff

* Capitalized keywords and changed to leading commas.
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,4 @@ seeds:
dbt_project_evaluator_exceptions:
+enabled: false

# on-run-end: "{{ dbt_project_evaluator.print_dbt_project_evaluator_issues() }}"
on-run-end: "{{ dbt_project_evaluator.print_dbt_project_evaluator_issues() }}"
32 changes: 32 additions & 0 deletions models/docs/docs_ads_platform.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Ads Platform Docs
This file contains doumentation for Ads Platform data sources.

## Daily Ads

{% docs ads_platform_table_daily_ads %}
The daily ad campaigns table.
{% enddocs %}

{% docs ads_platform_column_date %}
The calendar date of the campaign reporting period.
{% enddocs %}

{% docs ads_platform_column_campaign_id %}
The unique identifier for the campaign.
{% enddocs %}

{% docs ads_platform_column_surrogate_key %}
The surrogate key.
{% enddocs %}

{% docs ads_platform_column_spend %}
The amount spent on the campaign.
{% enddocs %}

{% docs ads_platform_column_cpm %}
The cost charged by the ads platform per thousand impressions.
{% enddocs %}

{% docs ads_platform_column_ctr %}
The click through rate on the campaign impressions.
{% enddocs %}
108 changes: 108 additions & 0 deletions models/docs/docs_bingeflix.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,117 @@
# Bingeflix Docs
This file contains doumentation for Bingeflix data sources.

## Events
This section contains documentation from the Bingeflix Events table.

{% docs bingeflix_table_events %}
This table contains information about the behavioral events of users while they interact with the Bingeflix platform.
{% enddocs %}

{% docs bingeflix_column_session_id %}
The unique identifier of the session in the Bingeflix application.
{% enddocs %}

{% docs bingeflix_column_event_created_at %}
When the event was logged.
{% enddocs %}

{% docs bingeflix_column_event_id %}
The unique identifier of the event.
{% enddocs %}

{% docs bingeflix_column_event_name %}
The name of the event.
{% enddocs %}

## Subscription Plans
This section contains documentation from the Bingeflix Subscription Plans table.

{% docs bingeflix_table_subscription_plans %}
This table contains information about various subscription plans available on Bingeflix.
{% enddocs %}

{% docs bingeflix_column_subscription_plan_id %}
The unique identifier of the subscription plan.
{% enddocs %}

{% docs bingeflix_column_plan_name %}
The name of the subscription plan.
{% enddocs %}

{% docs bingeflix_column_pricing %}
The price of the subscription.
{% enddocs %}

{% docs bingeflix_column_payment_period %}
The recurring payment period for the subscription.
{% enddocs %}

## Subscriptions

{% docs bingeflix_table_subscriptions %}
This model contains information about Bingeflix subscriptions.
{% enddocs %}

{% docs bingeflix_column_starts_at %}
The recurring payment period for the subscription.
{% enddocs %}

{% docs bingeflix_column_ends_at %}
The recurring payment period for the subscription.
{% enddocs %}

{% docs bingeflix_column_subscription_id %}
The recurring payment period for the subscription.
{% enddocs %}

## Users
This section contains documentation from the Bingeflix Users table.

{% docs bingeflix_table_users %}
This is table contains information about Bingeflix users.
{% enddocs %}

{% docs bingeflix_column_user_id %}
The unique identifier of the Bingeflix user. A user is created when...
{% enddocs %}

{% docs bingeflix_column_user_created_at %}
When the user was created.
{% enddocs %}

{% docs bingeflix_column_phone_number %}
The user's phone number.
{% enddocs %}

{% docs bingeflix_column_deleted_at %}
When the user's account was deleted. The value is NULL if the account has not been deleted.
{% enddocs %}

{% docs bingeflix_column_username %}
The username for login to Bingeflix.
{% enddocs %}

{% docs bingeflix_column_name %}
The name of the user.
{% enddocs %}

{% docs bingeflix_column_sex %}
The user's sex at birth.
{% enddocs %}

{% docs bingeflix_column_email %}
The user's email address.
{% enddocs %}

{% docs bingeflix_column_birthdate %}
TThe user's birthdate.
{% enddocs %}

{% docs bingeflix_column_region %}
Where the user resides (i.e. the state or province).
{% enddocs %}

{% docs bingeflix_column_country %}
Where the user resides.
{% enddocs %}
22 changes: 11 additions & 11 deletions models/intermediate/int_dates.sql
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,19 @@ date_spine AS (
end_date="cast('2030-01-01' as date)"
)
}}
),
)

final AS (
, final AS (
SELECT
date_day AS calendar_date,
CAST(DATE_TRUNC('week', date_day) AS DATE) AS date_week,
CAST(DATE_TRUNC('month', date_day) AS DATE) AS date_month,
CAST(DATE_TRUNC('quarter', date_day) AS DATE) AS date_quarter,
CAST(DATE_TRUNC('year', date_day) AS DATE) AS date_year,
DAY(date_day) AS day_of_month,
YEAR(date_day) AS year_num,
QUARTER(date_day) AS quarter_num,
MONTH(date_day) AS month_num
date_day AS calendar_date
, CAST(DATE_TRUNC('week', date_day) AS DATE) AS date_week
, CAST(DATE_TRUNC('month', date_day) AS DATE) AS date_month
, CAST(DATE_TRUNC('quarter', date_day) AS DATE) AS date_quarter
, CAST(DATE_TRUNC('year', date_day) AS DATE) AS date_year
, DAY(date_day) AS day_of_month
, YEAR(date_day) AS year_num
, QUARTER(date_day) AS quarter_num
, MONTH(date_day) AS month_num
FROM
date_spine
)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
{% set event_names = dbt_utils.get_column_values(table=ref('stg_bingeflix__events'), column='event_name') -%}

SELECT
session_id,
user_id,
{% for event_name in event_names %}
session_id
, user_id
, {% for event_name in event_names %}
SUM(CASE WHEN event_name = '{{ event_name }}' THEN 1 ELSE 0 END) AS {{ event_name|replace(" ", "_")|lower }}_count
{% if not loop.last %},{% endif -%}
{% endfor %}
Expand Down
16 changes: 8 additions & 8 deletions models/marts/core/dim_subscriptions.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@ WITH

final AS (
SELECT
s.subscription_id,
s.subscription_plan_id,
s.user_id,
s.starts_at,
s.ends_at,
sp.plan_name,
sp.pricing,
sp.payment_period AS billing_period
s.subscription_id
, s.subscription_plan_id
, s.user_id
, s.starts_at
, s.ends_at
, sp.plan_name
, sp.pricing
, sp.payment_period AS billing_period
FROM
{{ ref('stg_bingeflix__subscriptions') }} AS s
LEFT JOIN {{ ref('stg_bingeflix__subscription_plans') }} AS sp
Expand Down
44 changes: 22 additions & 22 deletions models/marts/core/dim_users.sql
Original file line number Diff line number Diff line change
Expand Up @@ -5,35 +5,35 @@ users AS (
*
FROM
{{ ref('stg_bingeflix__users') }}
),
)

users_subscription_facts AS (
, users_subscription_facts AS (
SELECT
user_id,
MIN(starts_at) AS first_subscription_starts_at,
COUNT(DISTINCT subscription_id) AS count_of_subscriptions
user_id
, MIN(starts_at) AS first_subscription_starts_at
, COUNT(DISTINCT subscription_id) AS count_of_subscriptions
FROM
{{ ref('stg_bingeflix__subscriptions') }}
GROUP BY 1
),
)

final AS (
, final AS (
SELECT
u.user_id,
created_at,
phone_number,
deleted_at,
username,
name,
sex,
email,
birthdate,
TRUNCATE(DATEDIFF(MONTH, birthdate, CURRENT_DATE)/12) AS current_age,
TRUNCATE(DATEDIFF(MONTH, birthdate, created_at)/12) AS age_at_acquisition,
region,
country,
usf.first_subscription_starts_at,
usf.count_of_subscriptions
u.user_id
, created_at
, phone_number
, deleted_at
, username
, name
, sex
, email
, birthdate
, TRUNCATE(DATEDIFF(MONTH, birthdate, CURRENT_DATE)/12) AS current_age
, TRUNCATE(DATEDIFF(MONTH, birthdate, created_at)/12) AS age_at_acquisition
, region
, country
, usf.first_subscription_starts_at
, usf.count_of_subscriptions
FROM
users AS u
LEFT JOIN users_subscription_facts AS usf ON u.user_id = usf.user_id
Expand Down
11 changes: 5 additions & 6 deletions models/marts/core/fct_events.sql
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
{{ config(materialized='table') }}

SELECT
session_id,
created_at,
user_id,
event_name,
event_id
session_id
, created_at
, user_id
, event_name
, event_id

FROM {{ ref('stg_bingeflix__events') }}

32 changes: 32 additions & 0 deletions models/marts/core/fct_events.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
models:
- name: fct_events
config:
tags: p0
description: '{{ doc("bingeflix_table_events") }}'

columns:
- name: session_id
description: '{{ doc("bingeflix_column_session_id") }}'
data_tests:
- not_null

- name: created_at
description: '{{ doc("bingeflix_column_event_created_at") }}'
data_tests:
- not_null

- name: user_id
description: '{{ doc("bingeflix_column_user_id") }}'
data_tests:
- not_null

- name: event_name
description: '{{ doc("bingeflix_column_event_name") }}'
data_tests:
- not_null

- name: event_id
description: '{{ doc("bingeflix_column_user_id") }}'
data_tests:
- not_null
- unique
Loading