Skip to content

DOP v0.3.0

Latest
Compare
Choose a tag to compare
@dinigo dinigo released this 17 Aug 12:17
· 2 commits to master since this release

Features

  • Support for "generic" airflow operators: you can now use regular python
    operators as part of your config files.

  • Support for “dbt docs” command to generate documentation for all dbt
    tasks
    : Users can now add “docs generate” as a target in their DOP
    configuration and additionally specify a GCS bucket with the --bucket
    and --bucket-path options where documents are copied to.

  • Serve dbt docs: Documents generated by dbt can be served as a web page by
    deploying the provided app on GAE. Note that deploying is an additional step
    that needs to be carried out after docs have been generated. See
    infrastructure/dbt-docs/README.md for details.

  • dbt tasks artifacts run_results created by dbt tasks saved to BigQuery:
    This json file contains information on completed dbt invocations and is saved
    in the BQ table “run_results” for analysis and debugging.

  • Add support for Airflow v1.10.14 and v1.10.15 local environments:
    Users can specify which version they want to use by setting
    the AIRFLOW_VERSION environment variable.

  • Pre-commit linters: added pre-commit hooks to ensure python, yaml and some
    support for plain text file consistency in formatting and style throughout DOP
    codebase.

Changes

  • Ensure DAGs using the same DBT project do not run concurrently: Safety
    feature to safely allow selective execution of workflows by calling specific
    commands or tags (e.g. dbt run --m) within a single dbt project. This avoids
    creating inter-dependant workflows to avoid overriding each other's artifacts,
    since they will share the same target location (within the dbt container).

  • Test time-partitioning: Time-partitioning of datetime type properly
    validated as part of schema validation.

  • Use Python 3.7 and dbt 0.19.1 in Composer K8s Operator

  • Add Dataflow example task: with the introduction of "regular" in the yaml
    config Airflow Operators, it is now possible to run compute intensive Dataflow
    jobs. Check example_dataflow_template for an example on how to implement a
    Dataflow pipeline.