Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unique schema test will fail on BigQuery if tested on column with name identical to model #2061

Closed
4 tasks
tnightengale opened this issue Jan 20, 2020 · 6 comments
Labels
bug Something isn't working stale Issues that have gone stale

Comments

@tnightengale
Copy link

Describe the bug

There appears to be a bug where the unique schema test will fail on BigQuery if it tests a column with an identical name to the model. This fails because of the compilation formatting, here is an example of the failing compiled test sql:

select count(*)
from (

    select
        promo_code

    from `starhopp3r`.`tnightengale_stingray`.`promo_code`
    where promo_code is not null
    group by promo_code
    having count(*) > 1

) validation_errors

On the BigQuery data warehouse, the column name is ambiguous with the view/table, and thus the warehouse thinks the instruction is to group by a STRUCT object (ie, the entire view/table). This compilation issue can be solved with a simple alias of the column, to differentiate it from the view:

select count(*)
from (

    select
        promo_code

    from `starhopp3r`.`tnightengale_stingray`.`promo_code` AS table1
    where table1.promo_code is not null
    group by table1.promo_code
    having count(*) > 1

) validation_errors

Steps To Reproduce

  1. Create a model called 'mock' with a column also called 'mock' and define a schema for it, where you apply the unique schema test to the column called mock
  2. Run dbt test --models mock
  3. Note failure and find compiled test under /target
  4. [Optional] Paste compiled .sql into BigQuery, given error is: Grouping by expressions of type STRUCT is not allowed at [9:14]

Expected behavior

The unique test should compile and pass without error.

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • [x ] bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.15.0
   latest version: 0.15.1

Your version of dbt is out of date! You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation


The operating system you're using:

The output of python --version:

Additional context

Add any other context about the problem here.

@tnightengale tnightengale added bug Something isn't working triage labels Jan 20, 2020
@drewbanin drewbanin removed the triage label Jan 21, 2020
@drewbanin
Copy link
Contributor

drewbanin commented Jan 21, 2020

Thanks for the report @tnightengale!

The relevant code for this test is here:
https://github.com/fishtown-analytics/dbt/blob/7a07017b96ac332c872725343833a94b49129c68/core/dbt/include/global_project/macros/schema_tests/unique.sql#L2-L19

I think a similar issue affects the not_null test in the same directory. If I use this example code, I find that the not_null test dbt uses does not correctly report that the table contains null values:

create or replace table dbt_dbanin.debug_not_null as (
  select cast(null as int64) as debug_not_null
);

select count(*)
from dbt_dbanin.debug_not_null
where debug_not_null is null
--------------------------------------
Returns 0, but I would expect it to return 1

I think it would be a good idea to update the not_null test to alias the model table here too.

Is this something you're interested in contributing a fix for? We're happy to help out however we can if so :)

@tnightengale
Copy link
Author

tnightengale commented Jan 21, 2020

Hi @drewbanin !

Happy to contribute - it seems like a quick fix. I have a WIP local commit, but alas I am not overly familiar with github, and it seems I am unable to push/create a remote branch of my local branch, on this repository. I am sure I am just missing something obvious: could you help me understand how to open a remote branch on here?

When running git push --set-upstream origin fix/aliasing_schema_tests on my HTTPS clone, I was prompted for my github username and password, however, when I entered them, the command still failed. I think this might be because I have 2FA set up on my github account.

I have now run the following commands to re-clone the repository using SSH, and made changes:

  1. git clone [email protected]:fishtown-analytics/dbt.git
  2. cd dbt/core/dbt/include/global_project/macros/
  3. made changes to the schema tests
  4. git add schema_tests/*
  5. git commit -m "WIP: first commit for aliasing schema tests"
  6. git push --set-upstream origin fix/aliasing_schema_tests

Which yields the following error:

ERROR: Permission to fishtown-analytics/dbt.git denied to tnightengale.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

I'm sure this is obvious to other people but I actually don't really know all the protocols for contributing to public projects on github. Would you or anyone else lend some guidance? Is this expected behaviour because I am approaching it wrong, or do I need to do some kind of additional set up with my SSH key?

@drewbanin
Copy link
Contributor

Hey @tnightengale - cool to hear you have a local commit that addresses the issue!

To contribute a change to this repo, you'll want to:

  1. fork the repo into your own GitHub account
  2. push your changes up to a branch
  3. open a PR from your accounts fork of the repo into the dev/0.15.2 branch of this repo

You can find more docs on how to do this here: https://github.com/fishtown-analytics/dbt/blob/dev/0.15.1/CONTRIBUTING.md#external-contributors

This flow is necessary because you are not a member of the fishtown-analytics organization, so you can't push code directly to the dbt repo.

Looking forward to seeing the PR!

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2021

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Dec 5, 2021
@steffan-roughsedge
Copy link

Still an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

4 participants