Skip to content

Commit

Permalink
Merge branch 'devel' into 832-synapse-destination
Browse files Browse the repository at this point in the history
  • Loading branch information
jorritsandbrink authored Feb 5, 2024
2 parents d7d9e35 + d2dd951 commit e931ffb
Show file tree
Hide file tree
Showing 128 changed files with 3,148 additions and 623 deletions.
4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ body:
attributes:
value: |
Thanks for reporting a bug for dlt! Please fill out the sections below.
If you are not sure if this is a bug or not, please join our [Slack](https://join.slack.com/t/dlthub-community/shared_invite/zt-1n5193dbq-rCBmJ6p~ckpSFK4hCF2dYA)
If you are not sure if this is a bug or not, please join our [Slack](https://dlthub.com/community)
and ask in the #3-technical-help channel.
- type: input
attributes:
Expand All @@ -34,7 +34,7 @@ body:
attributes:
label: Steps to reproduce
description: >
How can we replicate the issue? If it's not straightforward to reproduce, please join our [Slack](https://join.slack.com/t/dlthub-community/shared_invite/zt-1n5193dbq-rCBmJ6p~ckpSFK4hCF2dYA)
How can we replicate the issue? If it's not straightforward to reproduce, please join our [Slack](https://dlthub.com/community)
and ask in the #3-technical-help channel.
placeholder: >
Provide a step-by-step description of how to reproduce the problem you are running into.
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
blank_issues_enabled: true
contact_links:
- name: Ask a question or get support on dlt Slack
url: https://join.slack.com/t/dlthub-community/shared_invite/zt-1n5193dbq-rCBmJ6p~ckpSFK4hCF2dYA
url: https://dlthub.com/community
about: Need help or support? Join our dlt community on Slack and get assistance.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/feature_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ body:
attributes:
value: |
Thanks for suggesting a feature for dlt!
If you like to discuss your idea first, please join our [Slack](https://join.slack.com/t/dlthub-community/shared_invite/zt-1n5193dbq-rCBmJ6p~ckpSFK4hCF2dYA)
If you like to discuss your idea first, please join our [Slack](https://dlthub.com/community)
and pose your questions in the #3-technical-help channel.
For minor features and improvements, feel free to open a [pull request](https://github.com/dlt-hub/dlt/pulls) directly.
- type: textarea
Expand Down
88 changes: 88 additions & 0 deletions .github/workflows/test_destination_databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@

name: test databricks

on:
pull_request:
branches:
- master
- devel
workflow_dispatch:

env:
DLT_SECRETS_TOML: ${{ secrets.DLT_SECRETS_TOML }}

RUNTIME__SENTRY_DSN: https://[email protected]/4504819859914752
RUNTIME__LOG_LEVEL: ERROR

ACTIVE_DESTINATIONS: "[\"databricks\"]"
ALL_FILESYSTEM_DRIVERS: "[\"memory\"]"

jobs:
get_docs_changes:
uses: ./.github/workflows/get_docs_changes.yml
if: ${{ !github.event.pull_request.head.repo.fork }}

run_loader:
name: Tests Databricks loader
needs: get_docs_changes
if: needs.get_docs_changes.outputs.changes_outside_docs == 'true'
strategy:
fail-fast: false
matrix:
os: ["ubuntu-latest"]
# os: ["ubuntu-latest", "macos-latest", "windows-latest"]
defaults:
run:
shell: bash
runs-on: ${{ matrix.os }}

steps:

- name: Check out
uses: actions/checkout@master

- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: "3.10.x"

- name: Install Poetry
uses: snok/[email protected]
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true

- name: Load cached venv
id: cached-poetry-dependencies
uses: actions/cache@v3
with:
path: .venv
key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}-gcp

- name: Install dependencies
run: poetry install --no-interaction -E databricks -E s3 -E gs -E az -E parquet --with sentry-sdk --with pipeline

- name: create secrets.toml
run: pwd && echo "$DLT_SECRETS_TOML" > tests/.dlt/secrets.toml

- run: |
poetry run pytest tests/load
if: runner.os != 'Windows'
name: Run tests Linux/MAC
- run: |
poetry run pytest tests/load
if: runner.os == 'Windows'
name: Run tests Windows
shell: cmd
matrix_job_required_check:
name: Databricks loader tests
needs: run_loader
runs-on: ubuntu-latest
if: always()
steps:
- name: Check matrix job results
if: contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled')
run: |
echo "One or more matrix job tests failed or were cancelled. You may need to re-run them." && exit 1
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Be it a Google Colab notebook, AWS Lambda function, an Airflow DAG, your local l
</h3>

<div align="center">
<a target="_blank" href="https://join.slack.com/t/dlthub-community/shared_invite/zt-1n5193dbq-rCBmJ6p~ckpSFK4hCF2dYA" style="background:none">
<a target="_blank" href="https://dlthub.com/community" style="background:none">
<img src="https://img.shields.io/badge/slack-join-dlt.svg?labelColor=191937&color=6F6FF7&logo=slack" style="width: 260px;" />
</a>
</div>
Expand Down Expand Up @@ -101,7 +101,7 @@ We suggest that you allow only `patch` level updates automatically:

The dlt project is quickly growing, and we're excited to have you join our community! Here's how you can get involved:

- **Connect with the Community**: Join other dlt users and contributors on our [Slack](https://join.slack.com/t/dlthub-community/shared_invite/zt-1n5193dbq-rCBmJ6p~ckpSFK4hCF2dYA)
- **Connect with the Community**: Join other dlt users and contributors on our [Slack](https://dlthub.com/community)
- **Report issues and suggest features**: Please use the [GitHub Issues](https://github.com/dlt-hub/dlt/issues) to report bugs or suggest new features. Before creating a new issue, make sure to search the tracker for possible duplicates and add a comment if you find one.
- **Track progress of our work and our plans**: Please check out our [public Github project](https://github.com/orgs/dlt-hub/projects/9)
- **Contribute Verified Sources**: Contribute your custom sources to the [dlt-hub/verified-sources](https://github.com/dlt-hub/verified-sources) to help other folks in handling their data tasks.
Expand Down
5 changes: 4 additions & 1 deletion dlt/cli/_dlt.py
Original file line number Diff line number Diff line change
Expand Up @@ -498,7 +498,10 @@ def main() -> int:
)
pipe_cmd_schema = pipeline_subparsers.add_parser("schema", help="Displays default schema")
pipe_cmd_schema.add_argument(
"--format", choices=["json", "yaml"], default="yaml", help="Display schema in this format"
"--format",
choices=["json", "yaml"],
default="yaml",
help="Display schema in this format",
)
pipe_cmd_schema.add_argument(
"--remove-defaults", action="store_true", help="Does not show default hint values"
Expand Down
8 changes: 7 additions & 1 deletion dlt/cli/pipeline_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,13 @@ def _display_pending_packages() -> Tuple[Sequence[str], Sequence[str]]:
fmt.warning("Pipeline does not have a default schema")
else:
fmt.echo("Found schema with name %s" % fmt.bold(p.default_schema_name))
schema_str = p.default_schema.to_pretty_yaml(remove_defaults=True)
format_ = command_kwargs.get("format")
remove_defaults_ = command_kwargs.get("remove_defaults")
s = p.default_schema
if format_ == "json":
schema_str = json.dumps(s.to_dict(remove_defaults=remove_defaults_), pretty=True)
else:
schema_str = s.to_pretty_yaml(remove_defaults=remove_defaults_)
fmt.echo(schema_str)

if operation == "drop":
Expand Down
24 changes: 24 additions & 0 deletions dlt/common/configuration/specs/aws_credentials.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,13 @@ def to_native_representation(self) -> Dict[str, Optional[str]]:
"""Return a dict that can be passed as kwargs to boto3 session"""
return dict(self)

def to_session_credentials(self) -> Dict[str, str]:
return dict(
aws_access_key_id=self.aws_access_key_id,
aws_secret_access_key=self.aws_secret_access_key,
aws_session_token=self.aws_session_token,
)


@configspec
class AwsCredentials(AwsCredentialsWithoutDefaults, CredentialsWithDefault):
Expand All @@ -47,6 +54,23 @@ def on_partial(self) -> None:
if self._from_session(session) and not self.is_partial():
self.resolve()

def to_session_credentials(self) -> Dict[str, str]:
"""Return configured or new aws session token"""
if self.aws_session_token and self.aws_access_key_id and self.aws_secret_access_key:
return dict(
aws_access_key_id=self.aws_access_key_id,
aws_secret_access_key=self.aws_secret_access_key,
aws_session_token=self.aws_session_token,
)
sess = self._to_botocore_session()
client = sess.create_client("sts")
token = client.get_session_token()
return dict(
aws_access_key_id=token["Credentials"]["AccessKeyId"],
aws_secret_access_key=token["Credentials"]["SecretAccessKey"],
aws_session_token=token["Credentials"]["SessionToken"],
)

def _to_botocore_session(self) -> Any:
try:
import botocore.session
Expand Down
4 changes: 3 additions & 1 deletion dlt/common/configuration/specs/gcp_credentials.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,9 @@ class GcpCredentials(CredentialsConfiguration):

project_id: str = None

location: str = ( # DEPRECATED! and present only for backward compatibility. please set bigquery location in BigQuery configuration
location: (
str
) = ( # DEPRECATED! and present only for backward compatibility. please set bigquery location in BigQuery configuration
"US"
)

Expand Down
21 changes: 21 additions & 0 deletions dlt/common/data_writers/escape.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,3 +130,24 @@ def escape_snowflake_identifier(v: str) -> str:
# Snowcase uppercase all identifiers unless quoted. Match this here so queries on information schema work without issue
# See also https://docs.snowflake.com/en/sql-reference/identifiers-syntax#double-quoted-identifiers
return escape_postgres_identifier(v.upper())


escape_databricks_identifier = escape_bigquery_identifier


DATABRICKS_ESCAPE_DICT = {"'": "\\'", "\\": "\\\\", "\n": "\\n", "\r": "\\r"}


def escape_databricks_literal(v: Any) -> Any:
if isinstance(v, str):
return _escape_extended(v, prefix="'", escape_dict=DATABRICKS_ESCAPE_DICT)
if isinstance(v, (datetime, date, time)):
return f"'{v.isoformat()}'"
if isinstance(v, (list, dict)):
return _escape_extended(json.dumps(v), prefix="'", escape_dict=DATABRICKS_ESCAPE_DICT)
if isinstance(v, bytes):
return f"X'{v.hex()}'"
if v is None:
return "NULL"

return str(v)
4 changes: 4 additions & 0 deletions dlt/common/destination/capabilities.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ class DestinationCapabilitiesContext(ContainerInjectableContext):
timestamp_precision: int = 6
max_rows_per_insert: Optional[int] = None
insert_values_writer_type: str = "default"
supports_multiple_statements: bool = True
supports_clone_table: bool = False
"""Destination supports CREATE TABLE ... CLONE ... statements"""

# do not allow to create default value, destination caps must be always explicitly inserted into container
can_create_default: ClassVar[bool] = False
Expand All @@ -78,4 +81,5 @@ def generic_capabilities(
caps.is_max_text_data_type_length_in_bytes = True
caps.supports_ddl_transactions = True
caps.supports_transactions = True
caps.supports_multiple_statements = True
return caps
Loading

0 comments on commit e931ffb

Please sign in to comment.