-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Float precision in relationship table causes issues in cohort diagnostics unit tests #65
Comments
Here is a reprex for this issue. "0.0" is getting stored in the text column where "0" should be. SQLite does not have a varchar(1) datatype as far as I know. remotes::install_github("ohdsi/Eunomia")
#> Using github PAT from envvar GITHUB_PAT. Use `gitcreds::gitcreds_set()` and unset GITHUB_PAT in .Renviron (or elsewhere) if you want to use the more secure git credential store instead.
#> Skipping install of 'Eunomia' from a github remote, the SHA1 (79c89443) has not changed since last install.
#> Use `force = TRUE` to force installation
library(Eunomia)
library(DatabaseConnector)
cd <- createConnectionDetails("sqlite", server = getDatabaseFile("GiBleed", overwrite = T))
con <- connect(cd)
#> attempting to download GiBleed
#> attempting to extract and load: /Users/ablack/eunomia_data/GiBleed_5.3.zip to: /Users/ablack/eunomia_data/GiBleed_5.3.sqlite
#> Connecting using SQLite driver
#> attempting to download GiBleed
#>
#> attempting to extract and load: /Users/ablack/eunomia_data/GiBleed_5.3.zip to: /Users/ablack/eunomia_data/GiBleed_5.3.sqlite
querySql(con, "select is_hierarchical from main.relationship") |> dplyr::tibble()
#> # A tibble: 480 × 1
#> IS_HIERARCHICAL
#> <chr>
#> 1 0.0
#> 2 0.0
#> 3 0.0
#> 4 0.0
#> 5 0.0
#> 6 0.0
#> 7 0.0
#> 8 0.0
#> 9 0.0
#> 10 0.0
#> # ℹ 470 more rows
disconnect(con) Created on 2024-09-18 with reprex v2.1.1 Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.3 (2024-02-29)
#> os macOS Sonoma 14.1
#> system aarch64, darwin20
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Amsterdam
#> date 2024-09-18
#> pandoc 3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> backports 1.5.0 2024-05-23 [1] CRAN (R 4.3.3)
#> bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.0)
#> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.0)
#> blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.0)
#> cachem 1.1.0 2024-05-16 [1] CRAN (R 4.3.3)
#> checkmate 2.3.2 2024-07-29 [1] CRAN (R 4.3.3)
#> cli 3.6.3 2024-06-21 [1] CRAN (R 4.3.3)
#> CommonDataModel 0.2.0 2024-02-07 [1] CRAN (R 4.3.1)
#> crayon 1.5.3 2024-06-20 [1] CRAN (R 4.3.3)
#> curl 5.2.2 2024-08-26 [1] CRAN (R 4.3.3)
#> DatabaseConnector * 6.3.2 2023-12-11 [1] CRAN (R 4.3.1)
#> DBI 1.2.3 2024-06-02 [1] CRAN (R 4.3.3)
#> digest 0.6.37 2024-08-19 [1] CRAN (R 4.3.3)
#> dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.1)
#> Eunomia * 2.0.0 2024-09-18 [1] Github (ohdsi/Eunomia@79c8944)
#> evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.3.3)
#> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1)
#> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.3.3)
#> fs 1.6.4 2024-04-25 [1] CRAN (R 4.3.1)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
#> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.1)
#> hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
#> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1)
#> knitr 1.48 2024-07-07 [1] CRAN (R 4.3.3)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
#> readr 2.1.5 2024-01-10 [1] CRAN (R 4.3.1)
#> remotes 2.5.0 2024-03-17 [1] CRAN (R 4.3.1)
#> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.3.3)
#> rJava 1.0-11 2024-01-26 [1] CRAN (R 4.3.1)
#> rlang 1.1.4 2024-06-04 [1] CRAN (R 4.3.3)
#> rmarkdown 2.28 2024-08-17 [1] CRAN (R 4.3.3)
#> RSQLite 2.3.7 2024-05-27 [1] CRAN (R 4.3.3)
#> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.3.1)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
#> SqlRender 1.18.1 2024-08-21 [1] CRAN (R 4.3.3)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
#> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.1)
#> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
#> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1)
#> vroom 1.6.5 2023-12-05 [1] CRAN (R 4.3.1)
#> withr 3.0.1 2024-07-31 [1] CRAN (R 4.3.3)
#> xfun 0.47 2024-08-17 [1] CRAN (R 4.3.3)
#> yaml 2.3.10 2024-07-26 [1] CRAN (R 4.3.3)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#>
#> ────────────────────────────────────────────────────────────────────────────── And here is the reprex using the issue65 branch. remotes::install_github("ohdsi/Eunomia", ref = "issue65")
#> Using github PAT from envvar GITHUB_PAT. Use `gitcreds::gitcreds_set()` and unset GITHUB_PAT in .Renviron (or elsewhere) if you want to use the more secure git credential store instead.
#> Skipping install of 'Eunomia' from a github remote, the SHA1 (29dc6dd6) has not changed since last install.
#> Use `force = TRUE` to force installation
library(Eunomia)
library(DatabaseConnector)
cd <- createConnectionDetails("sqlite", server = getDatabaseFile("GiBleed", overwrite = T))
con <- connect(cd)
#> attempting to download GiBleed
#> attempting to extract and load: /Users/ablack/eunomia_data/GiBleed_5.3.zip to: /Users/ablack/eunomia_data/GiBleed_5.3.sqlite
#> Connecting using SQLite driver
#> attempting to download GiBleed
#>
#> attempting to extract and load: /Users/ablack/eunomia_data/GiBleed_5.3.zip to: /Users/ablack/eunomia_data/GiBleed_5.3.sqlite
querySql(con, "select is_hierarchical from main.relationship") |> dplyr::tibble()
#> # A tibble: 480 × 1
#> IS_HIERARCHICAL
#> <chr>
#> 1 0
#> 2 0
#> 3 0
#> 4 0
#> 5 0
#> 6 0
#> 7 0
#> 8 0
#> 9 0
#> 10 0
#> # ℹ 470 more rows
disconnect(con) Created on 2024-09-18 with reprex v2.1.1 Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.3 (2024-02-29)
#> os macOS Sonoma 14.1
#> system aarch64, darwin20
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Amsterdam
#> date 2024-09-18
#> pandoc 3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> backports 1.5.0 2024-05-23 [1] CRAN (R 4.3.3)
#> bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.0)
#> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.0)
#> blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.0)
#> cachem 1.1.0 2024-05-16 [1] CRAN (R 4.3.3)
#> checkmate 2.3.2 2024-07-29 [1] CRAN (R 4.3.3)
#> cli 3.6.3 2024-06-21 [1] CRAN (R 4.3.3)
#> CommonDataModel 0.2.0 2024-02-07 [1] CRAN (R 4.3.1)
#> crayon 1.5.3 2024-06-20 [1] CRAN (R 4.3.3)
#> curl 5.2.2 2024-08-26 [1] CRAN (R 4.3.3)
#> DatabaseConnector * 6.3.2 2023-12-11 [1] CRAN (R 4.3.1)
#> DBI 1.2.3 2024-06-02 [1] CRAN (R 4.3.3)
#> digest 0.6.37 2024-08-19 [1] CRAN (R 4.3.3)
#> dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.1)
#> Eunomia * 2.0.0 2024-09-18 [1] Github (ohdsi/Eunomia@29dc6dd)
#> evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.3.3)
#> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1)
#> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.3.3)
#> fs 1.6.4 2024-04-25 [1] CRAN (R 4.3.1)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
#> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.1)
#> hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
#> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1)
#> knitr 1.48 2024-07-07 [1] CRAN (R 4.3.3)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
#> readr 2.1.5 2024-01-10 [1] CRAN (R 4.3.1)
#> remotes 2.5.0 2024-03-17 [1] CRAN (R 4.3.1)
#> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.3.3)
#> rJava 1.0-11 2024-01-26 [1] CRAN (R 4.3.1)
#> rlang 1.1.4 2024-06-04 [1] CRAN (R 4.3.3)
#> rmarkdown 2.28 2024-08-17 [1] CRAN (R 4.3.3)
#> RSQLite 2.3.7 2024-05-27 [1] CRAN (R 4.3.3)
#> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.3.1)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
#> SqlRender 1.18.1 2024-08-21 [1] CRAN (R 4.3.3)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
#> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.1)
#> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
#> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1)
#> vroom 1.6.5 2023-12-05 [1] CRAN (R 4.3.1)
#> withr 3.0.1 2024-07-31 [1] CRAN (R 4.3.3)
#> xfun 0.47 2024-08-17 [1] CRAN (R 4.3.3)
#> yaml 2.3.10 2024-07-26 [1] CRAN (R 4.3.3)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#>
#> ────────────────────────────────────────────────────────────────────────────── The change I made is to be explicit about all column types when we read them into R from csv. However I set the types based on column order in the csv file which doesn't always match the cdm specification so I'd like some advice on how to handle that situation. |
Description
When inserting relationships to postgres tables in cohort diagnostics I get an error that the 'is_hierarchical' and 'defines_ancestry' columns overflow (note they are defined as varchar(1) here but they are defined as unbounded
TEXT
in the sqlite common data model ddl.When checking in sqlite the values all appear to be '0.0' - this causes cohort diagnostics unit tests to fail because its taking the 0.0 values from sqlite and trying to insert them in to postgres:
https://github.com/OHDSI/CohortDiagnostics/actions/runs/10579576275/job/29312378749#step:8:5416
I think this is maybe a bug I have seen before being caused by the sqlite DBI driver creeping up in DatabaseConnector:
OHDSI/DatabaseConnector#280
Essentially the value is thrown in to a csv as a 0, then when it is loaded it is turned into a numeric. When the DBI sqlite driver sees a numeric it automatically adds the floating point precision. This value is then finding its way into the CohortDiagnostics export (which I can work around).
The text was updated successfully, but these errors were encountered: