-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce hard_delete
and dedup_sort
columns hint for merge
#960
Merged
rudolfix
merged 32 commits into
devel
from
947-core-extensions-to-support-database-replication
Feb 24, 2024
Merged
Changes from 3 commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
82c3634
black formatting
97c5512
remove unused exception
400d84b
add initial support for replicate write disposition
24f362e
add hard_delete hint and sorted deduplication for merge
f3a4878
undo config change
deb816f
undo unintentional changes
4a38d56
refactor hard_delete handling and introduce dedup_sort hint
0d1c977
update docstring
474d8bc
replace dialect-specific SQL
568ef26
add parentheses to ensure proper clause evaluation order
81ea426
add escape defaults and temp tables for non-primary key case
a04a238
exclude destinations that don't support merge from test
8ac0f9c
correct typo
ec115e9
extend docstring
a1afeb8
remove redundant copies for immutable strings
f07205d
simplify boolean logic
a64580d
add more test cases for hard_delete and dedup_sort hints
3308549
refactor table chain resolution
189c2fb
marks tables that seen data in normalizer, skips empty jobs if never …
rudolfix a649b0e
ignores tables that didn't seen data when loading, tests edge cases
rudolfix 9778f0e
Merge branch 'devel' into 947-core-extensions-to-support-database-rep…
rudolfix 4b3c59b
add sort order configuration option
c984c4e
bumps schema engine to v9, adds migrations
rudolfix 935748a
filters tables without data properly in load
rudolfix d125556
converts seen-data to boolean, fixes tests
rudolfix ecaf6ef
Merge branch '947-core-extensions-to-support-database-replication' of…
rudolfix af0b344
disables filesystem tests config due to merge present
rudolfix 262018b
add docs for hard_delete and dedup_sort column hints
0814bb0
Merge branch '947-core-extensions-to-support-database-replication' of…
44a9ff2
fixes extending table chains in load
rudolfix 9384148
Merge branch '947-core-extensions-to-support-database-replication' of…
rudolfix 9921b89
refactors load and adds unit tests with dummy
rudolfix File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,6 +37,7 @@ | |
TTypeDetections, | ||
TWriteDisposition, | ||
TSchemaContract, | ||
TCdcConfig, | ||
) | ||
from dlt.common.schema.exceptions import ( | ||
CannotCoerceColumnException, | ||
|
@@ -317,6 +318,19 @@ def validate_stored_schema(stored_schema: TStoredSchema) -> None: | |
if parent_table_name not in stored_schema["tables"]: | ||
raise ParentTableNotFoundException(table_name, parent_table_name) | ||
|
||
# check for "replicate" tables that miss a primary key or "cdc_config" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this makes sense but we should move it to
also we should check merge disposition. also take a look at |
||
if table.get("write_disposition") == "replicate": | ||
if len(get_columns_names_with_prop(table, "primary_key", True)) == 0: | ||
raise SchemaException( | ||
f'Primary key missing for table "{table_name}" with "replicate" write' | ||
" disposition." | ||
) | ||
if "cdc_config" not in table: | ||
raise SchemaException( | ||
f'"cdc_config" missing for table "{table_name}" with "replicate" write' | ||
" disposition." | ||
) | ||
|
||
|
||
def migrate_schema(schema_dict: DictStrAny, from_engine: int, to_engine: int) -> TStoredSchema: | ||
if from_engine == to_engine: | ||
|
@@ -724,6 +738,7 @@ def new_table( | |
resource: str = None, | ||
schema_contract: TSchemaContract = None, | ||
table_format: TTableFormat = None, | ||
cdc_config: TCdcConfig = None, | ||
) -> TTableSchema: | ||
table: TTableSchema = { | ||
"name": table_name, | ||
|
@@ -742,6 +757,8 @@ def new_table( | |
table["schema_contract"] = schema_contract | ||
if table_format: | ||
table["table_format"] = table_format | ||
if cdc_config is not None: | ||
table["cdc_config"] = cdc_config | ||
if validate_schema: | ||
validate_dict_ignoring_xkeys( | ||
spec=TColumnSchema, | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is one way to go. but IMO a better way would be to define a column level hint.
cdc_op
which could be integer or single char (u/d/i)do we really need a sequence? if so we could reuse
sort
or add a new hint ie.cdc_seq
. There are helper methods to find column(s) with hintsit looks simpler to me.