Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filesystem state sync #1184
filesystem state sync #1184
Changes from 9 commits
0369496
9a87f0f
f6d5c9c
d58a38b
2913c33
e32ad95
cd21ff6
6b7c16d
95cc882
b5eb47d
15ac9bf
a6ce1b1
bce2837
40f1f3e
5e8c233
e7e0192
7cd51b4
c406600
0c52fcd
f0635b2
bdaf094
a09f896
fce47c6
0d5423c
b2b5913
cd4dd23
c8b3429
6522f87
de34a48
abfc170
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is totally fine and should stay like that (we need to document this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use
os.path
, useposixpath
. here paths are normalized from fsspec.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this path? maybe we should save it where the previous path was
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wanted to align it with the way all the other dlt tables are stored. i somehow like it more, we could have a setting though for backwards compatibility or something? your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm you will not get same hash twice. we do not emit state if hash is not changing. also I think load_id is a must in the file name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i put load id there now, not sure what you mean with the "you will not get the same hash twice" comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will not work, yes we process package in order but do not assume that (because we do not have to)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean here, maybe we should discuss it briefly. Imho this setup replicates the behavior of the other destinations, with the same lookups/where clauses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it to iterate over the files in the dir and select the correct one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to reproduce the WHERE clause of other destinations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what is does, no? I mean it starts with the pipeline name, so you can look up the state with the pipeline, and instead of looking for the highest load_id (which we shouldn't do anyway, since we should not rely on load ids being timestamps) it has this current marker. I have a screenshot above of the file layout this pr produces. This way we can avoid iterating through a list of files to find the newest one which will eventually slow down against destinations with many loads, at least that would be my expectation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO hash is enough. also it would be good to have load_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need the name in the filepath so I can find the right schema when looking for the newest version of a schema, so I will keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing like in state: find the oldest load id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done (assuming you mean the newest load id :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep those in some ignored folder ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i have that also, but since i am working on two different machines i need to do this sometimes ;)