-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store the compiled code from hooks, or statistics on it #428
Comments
million dollar matt! love this idea -- we rely on the |
Nice! Thank you for researching. Off the top:
|
(+1) - Thanks @matt-winkler for thinking of this idea! TL;DR: Re-FramingPerhaps we reframe the feature to befit dbt's larger, overall view of code compilation/visibility and model contracts. We can do this while still reserving actionable progress to the feature's focus of intent- hooks. The feature, as I understand it, is not limited to project admins alone. The goal appears to be to gaurdrail developers to patterns aligned to a dbt ethos. Any code can go into a hook. But, it probably shouldn't. RE Code compilation/visibility:Code compilation of hook components does not alter the DAG in ways perhaps intended by the developer. Dependencies can be created within hooks that are left unexpressed. That's bad. Furthermore, compiled code is presented without principal components of execution. An update statement in a hook, a workhorse macro, ... many components are left out of visiblity in terms of what will* be executed (I.E. The compiled and parsed code). dbt simply doesn't know what will be executed in such code. However, dbt could* surface visbility that {some} code will be executed. RE Model ContractsYou can't have a contract if there is work on its assets outside visibility! Matt's listed examples illustrate how use of hooks can unintentionally (or intentionally) obfuscate code and dilute contact fulfillment.
Analysis would extend to each of the model contract focus domains. Quality: If a hook has a DML (which is effectively a hidden model), does it have a test? Observability: How does code in a hook impact the DAG? etc... Example of dbt's model contracts focus domains include:
OutcomeBuilding guardrails will entail surfacing warnings/errors related to hook usage. Enhancing how the node config is parsed, compiled, and presented will advance dbt's mission (E.G. in their contracts) and community packages (E.G. project evaluator) in their scope/effectiveness. |
@matt-winkler @algarbar i spent some time on this today, and there are a couple caveats to what we have available that make me wonder if it's worth investing in! 1. the If a hook calls a macro, all we'll see in the graph is 2. we already collect hard coded references Because we parse the entirety of the raw sql in our Given these 2 things, i'm wondering what we should do next! open to thoughts and suggestions |
Hey @dave-connors-3 I'm not sure I agree with the assumption in point 1 above. A major use case for this is identifying when migrations to dbt from other transformation tools go awry, or aren't done according to dbt best practices. Many of those other tools don't include macros like dbt conceives of them at all. It does make sense that e.g. permissions grants via macros might not be as inspectable with this, but that's a different use case than what I (at least) considered for this. Good point on hard coded references. Do you know if there's a way we can identify those hard coded references are created VIA a hook specifically? More curious on this one. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
tl;drUse job artifacts for this insight Progress to DateA great deal of work has been done to resolve this need through an alternative avenue (DM for link). Although not a perfect solution, successful runs generate artifacts (E.G. Pros
Cons
|
Describe the feature
As a project admin, I want to be able to analyze the code written in hooks on dbt models. dbt hooks can execute arbitrary SQL statements. The goal of capturing this data is to be able to catch situations in which business logic is embedded in hooks instead of being coded in dbt models themselves.
Describe alternatives you've considered
Linters might be able to achieve something similar. Scanning SQL Fluff's dbt templater I didn't see any obvious controls for hooks.
If storing the SQL code itself is problematic because of practical, security reasons, it might be useful to compute summary statistics e.g. total number of lines
Additional context
By having additional tools to align the dbt codebase, dbt users can better leverage additional features like restarting jobs from the point of failure.
Who will this benefit?
Groups migrating from legacy SQL systems to dbt.
Are you interested in contributing this feature?
Yes
The text was updated successfully, but these errors were encountered: