diff --git a/DESCRIPTION b/DESCRIPTION index 2b845b2c..e20de898 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,8 +1,8 @@ Package: ctrdata Type: Package Title: Retrieve and Analyze Clinical Trials in Public Registers -Version: 1.6.0.9000 -Imports: jsonlite, httr, curl, clipr, xml2, rvest, nodbi (>= 0.4.2.9000), DBI, stringi +Version: 1.7.0 +Imports: jsonlite, httr, curl, clipr, xml2, rvest, nodbi (>= 0.4.3), DBI, stringi SystemRequirements: sed, php, cat, perl URL: https://cran.r-project.org/package=ctrdata BugReports: https://github.com/rfhb/ctrdata/issues @@ -22,7 +22,7 @@ Description: Provides functions for querying, retrieving and analyzing the design and conduct as well as results of clinical trials. License: MIT + file LICENSE RoxygenNote: 7.1.1 -Suggests: devtools, knitr, rmarkdown, RSQLite (>= 2.1.2), mongolite, +Suggests: devtools, knitr, rmarkdown, RSQLite (>= 2.2.4), mongolite, tinytest (>= 1.2.1), R.rsp VignetteBuilder: R.rsp NeedsCompilation: no diff --git a/NAMESPACE b/NAMESPACE index 1e881bd8..6ade6ec7 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -24,6 +24,7 @@ importFrom(httr,HEAD) importFrom(httr,content) importFrom(httr,headers) importFrom(httr,progress) +importFrom(httr,status_code) importFrom(httr,write_disk) importFrom(jsonlite,toJSON) importFrom(jsonlite,validate) diff --git a/NEWS.md b/NEWS.md index b9ac4008..6a681762 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,8 +1,11 @@ -# ctrdata 1.6.0.9000 - - 2021-05-10 - - minimised database-specific code, using nodbi 0.4.2.9000 - - temporary directory creation when needed and automated deletion - +# ctrdata 1.7.0 + - 2021-07-24 + - much reduced database backend-specific code, using nodbi 0.4.3 (released 2021-07-23) + which also introduces transactions for sqlite using RSQLite >=2.2.4 (released 2021-03-12) + - temporary directory creation only when needed, more automated deletion + - changes in detecting non-functioning register servers + - further streamlined unit testing + # ctrdata 1.6.0 - 2021-05-09 - added support for ISRCTN diff --git a/README.Rmd b/README.Rmd index 704f310e..fb5c18d4 100644 --- a/README.Rmd +++ b/README.Rmd @@ -38,7 +38,7 @@ The package `ctrdata` provides functions for retrieving (downloading) informatio - ClinicalTrials.gov ("CTGOV", https://clinicaltrials.gov/) - ISRCTN (https://www.isrctn.com/) `r emo::ji("bell")`new in version 1.6.0 `r emo::ji("+1")` -The motivation is to understand trends in design and conduct of trials, their availability for patients and their detailled results. The package is to be used within the [R](https://www.r-project.org/) system; this README was reviewed on 2021-05-09 for version 1.6.0. +The motivation is to understand trends in design and conduct of trials, their availability for patients and their detailled results. The package is to be used within the [R](https://www.r-project.org/) system; this README was reviewed on 2021-07-24 for version 1.7.0. Main features: @@ -165,15 +165,16 @@ ctrLoadQueryIntoDb( queryterm = q, con = db) # * Found search query from EUCTR: query=cancer&age=under-18&phase=phase-one&status=completed -# (1/3) Checking trials in EUCTR: -# Retrieved overview, multiple records of 64 trial(s) from 4 page(s) to be downloaded. -# Checking helper binaries: done. -# Downloading trials (max. 10 pages in parallel)... +# (1/3) Checking trials in EUCTR: +# Retrieved overview, multiple records of 66 trial(s) from 4 page(s) to be downloaded +# Checking helper binaries: done +# Downloading trials (4 pages in parallel)... # Note: register server cannot compress data, transfer takes longer, about 0.4s per trial -# (2/3) Converting to JSON... +# Pages: 4 done, 0 ongoing +# (2/3) Converting to JSON, 248 records converted # (3/3) Importing JSON records into database... -# = Imported or updated 241 records on 64 trial(s). -# * Updated history in meta-info of "some_collection_name" +# = Imported or updated 248 records on 66 trial(s) +# * Updated history ("meta-info" in "some_collection_name") ``` * Analyse @@ -193,11 +194,11 @@ result <- dbGetFieldsIntoDf( # one record, for example for several EU Member States: uniqueids <- dbFindIdsUniqueTrials(con = db) # Searching for duplicate trials... -# - Getting trial ids, 241 found in collection +# - Getting trial ids, 248 found in collection # - Finding duplicates among registers' and sponsor ids... -# - 177 EUCTR _id were not preferred EU Member State record for 64 trials -# - Keeping 64 records from EUCTR -# = Returning keys (_id) of 64 records in collection "some_collection_name". +# - 182 EUCTR _id were not preferred EU Member State record for 66 trials +# - Keeping 66 records from EUCTR +# = Returning keys (_id) of 66 records in collection "some_collection_name" # Keep only unique / de-duplicated records: result <- result[ result[["_id"]] %in% uniqueids, ] @@ -207,12 +208,13 @@ with(result, table( p_end_of_trial_status, a7_trial_is_part_of_a_paediatric_investigation_plan)) -# a7_trial_is_part_of_a_paediatric_investigation_plan +# a7_trial_is_part_of_a_paediatric_investigation_plan # p_end_of_trial_status Information not present in EudraCT No Yes -# Completed 6 31 15 -# GB - no longer in EU/EEA 0 4 4 +# Completed 6 31 16 +# GB - no longer in EU/EEA 0 5 4 # Ongoing 0 1 0 # Prematurely Ended 1 1 0 +# Restarted 0 1 0 ``` * Add records from another register (CTGOV) into the same database @@ -225,13 +227,13 @@ ctrLoadQueryIntoDb( con = db) # * Found search query from CTGOV: cond=neuroblastoma&rslt=With&recrs=e&age=0&intr=Drug # (1/3) Checking trials in CTGOV: -# Retrieved overview, records of 37 trial(s) are to be downloaded. -# Checking helper binaries: done. -# Downloading: 500 kB -# (2/3) Converting to JSON... +# Retrieved overview, records of 40 trial(s) are to be downloaded +# Checking helper binaries: done +# Downloading: 580 kB +# (2/3) Converting to JSON, 40 records converted # (3/3) Importing JSON records into database... -# = Imported or updated 37 trial(s). -# * Updated history in meta-info of "some_collection_name" +# = Imported or updated 40 trial(s) +# * Updated history ("meta-info" in "some_collection_name") ``` * Add records from another register (ISRCTN) into the same database @@ -243,13 +245,13 @@ ctrLoadQueryIntoDb( con = db) # * Found search query from ISRCTN: q=neuroblastoma # (1/3) Checking trials in ISRCTN: -# Retrieved overview, records of 9 trial(s) are to be downloaded. -# Checking helper binaries: done. -# Downloading: 92 kB -# (2/3) Converting to JSON... +# Retrieved overview, records of 9 trial(s) are to be downloaded +# Checking helper binaries: done +# Downloading: 89 kB +# (2/3) Converting to JSON, 9 records converted # (3/3) Importing JSON records into database... -# = Imported or updated 9 trial(s). -# * Updated history in meta-info of "some_collection_name" +# = Imported or updated 9 trial(s) +# * Updated history ("meta-info" in "some_collection_name") ``` * Result-related trial information @@ -269,7 +271,7 @@ result <- dbGetFieldsIntoDf( # Transform all fields into long name - value format result <- dfTrials2Long(df = result) -# Total 5012 rows, 12 unique names of variables +# Total 6140 rows, 12 unique names of variables # [1.] get counts of subjects for all arms into data frame # This count is in the group that has "Total" in its name diff --git a/README.md b/README.md index 1e2b75df..01deba74 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ aggregating and analysing this information; it can be used for the The motivation is to understand trends in design and conduct of trials, their availability for patients and their detailled results. The package is to be used within the [R](https://www.r-project.org/) system; this -README was reviewed on 2021-05-09 for version 1.6.0. +README was reviewed on 2021-07-24 for version 1.7.0. Main features: @@ -197,15 +197,16 @@ ctrLoadQueryIntoDb( queryterm = q, con = db) # * Found search query from EUCTR: query=cancer&age=under-18&phase=phase-one&status=completed -# (1/3) Checking trials in EUCTR: -# Retrieved overview, multiple records of 64 trial(s) from 4 page(s) to be downloaded. -# Checking helper binaries: done. -# Downloading trials (max. 10 pages in parallel)... +# (1/3) Checking trials in EUCTR: +# Retrieved overview, multiple records of 66 trial(s) from 4 page(s) to be downloaded +# Checking helper binaries: done +# Downloading trials (4 pages in parallel)... # Note: register server cannot compress data, transfer takes longer, about 0.4s per trial -# (2/3) Converting to JSON... +# Pages: 4 done, 0 ongoing +# (2/3) Converting to JSON, 248 records converted # (3/3) Importing JSON records into database... -# = Imported or updated 241 records on 64 trial(s). -# * Updated history in meta-info of "some_collection_name" +# = Imported or updated 248 records on 66 trial(s) +# * Updated history ("meta-info" in "some_collection_name") ``` - Analyse @@ -226,11 +227,11 @@ result <- dbGetFieldsIntoDf( # one record, for example for several EU Member States: uniqueids <- dbFindIdsUniqueTrials(con = db) # Searching for duplicate trials... -# - Getting trial ids, 241 found in collection +# - Getting trial ids, 248 found in collection # - Finding duplicates among registers' and sponsor ids... -# - 177 EUCTR _id were not preferred EU Member State record for 64 trials -# - Keeping 64 records from EUCTR -# = Returning keys (_id) of 64 records in collection "some_collection_name". +# - 182 EUCTR _id were not preferred EU Member State record for 66 trials +# - Keeping 66 records from EUCTR +# = Returning keys (_id) of 66 records in collection "some_collection_name" # Keep only unique / de-duplicated records: result <- result[ result[["_id"]] %in% uniqueids, ] @@ -240,12 +241,13 @@ with(result, table( p_end_of_trial_status, a7_trial_is_part_of_a_paediatric_investigation_plan)) -# a7_trial_is_part_of_a_paediatric_investigation_plan +# a7_trial_is_part_of_a_paediatric_investigation_plan # p_end_of_trial_status Information not present in EudraCT No Yes -# Completed 6 31 15 -# GB - no longer in EU/EEA 0 4 4 +# Completed 6 31 16 +# GB - no longer in EU/EEA 0 5 4 # Ongoing 0 1 0 # Prematurely Ended 1 1 0 +# Restarted 0 1 0 ``` - Add records from another register (CTGOV) into the same database @@ -258,13 +260,13 @@ ctrLoadQueryIntoDb( con = db) # * Found search query from CTGOV: cond=neuroblastoma&rslt=With&recrs=e&age=0&intr=Drug # (1/3) Checking trials in CTGOV: -# Retrieved overview, records of 37 trial(s) are to be downloaded. -# Checking helper binaries: done. -# Downloading: 500 kB -# (2/3) Converting to JSON... +# Retrieved overview, records of 40 trial(s) are to be downloaded +# Checking helper binaries: done +# Downloading: 580 kB +# (2/3) Converting to JSON, 40 records converted # (3/3) Importing JSON records into database... -# = Imported or updated 37 trial(s). -# * Updated history in meta-info of "some_collection_name" +# = Imported or updated 40 trial(s) +# * Updated history ("meta-info" in "some_collection_name") ``` - Add records from another register (ISRCTN) into the same database @@ -276,13 +278,13 @@ ctrLoadQueryIntoDb( con = db) # * Found search query from ISRCTN: q=neuroblastoma # (1/3) Checking trials in ISRCTN: -# Retrieved overview, records of 9 trial(s) are to be downloaded. -# Checking helper binaries: done. -# Downloading: 92 kB -# (2/3) Converting to JSON... +# Retrieved overview, records of 9 trial(s) are to be downloaded +# Checking helper binaries: done +# Downloading: 89 kB +# (2/3) Converting to JSON, 9 records converted # (3/3) Importing JSON records into database... -# = Imported or updated 9 trial(s). -# * Updated history in meta-info of "some_collection_name" +# = Imported or updated 9 trial(s) +# * Updated history ("meta-info" in "some_collection_name") ``` - Result-related trial information @@ -302,7 +304,7 @@ result <- dbGetFieldsIntoDf( # Transform all fields into long name - value format result <- dfTrials2Long(df = result) -# Total 5012 rows, 12 unique names of variables +# Total 6140 rows, 12 unique names of variables # [1.] get counts of subjects for all arms into data frame # This count is in the group that has "Total" in its name diff --git a/cran-comments.md b/cran-comments.md index 956e5880..637b1d64 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,16 +1,21 @@ ## Test environments -* local: macOS (darwin15.6.0), R 3.6.3; Windows (19042.928), R 4.0.4 +* local: macOS (darwin17.0), R 3.6.3, R 4.1.0; Windows (19043.1110), R 4.1.0 * github-actions: Windows (Microsoft Windows Server 2019), R release * github-actions: macOS (10.15.7), R release and R oldrel -* win-builder: x86_64-w64-mingw32, 4.1.0 beta (2021-05-06 r80268) ## R CMD check results 0 errors | 0 warnings | 0 notes -## Downstream dependencies -None so far +## Reverse dependencies +None ## Submission reason -* new feature: extended to a third register of clinical trials (ISRCTN) -* refactored and improved query handling, checking binaries, deduplicating ids -* reduced code complexity, accelerated functions and reduced memory use +* removed database backend-specific code +* now requires nodbi >=0.4.3 +* bug fixes (typing for certain date fields, closing interrupted connections) +* better testing for register server availability and functioning + + +---------- +Thank you +Ralf diff --git a/inst/image/README-ctrdata_results_neuroblastoma.png b/inst/image/README-ctrdata_results_neuroblastoma.png index 8d655eb5..5fc3403f 100644 Binary files a/inst/image/README-ctrdata_results_neuroblastoma.png and b/inst/image/README-ctrdata_results_neuroblastoma.png differ diff --git a/vignettes/ctrdata_analyse.pdf b/vignettes/ctrdata_analyse.pdf index 06bb9e43..3c01a8e4 100644 Binary files a/vignettes/ctrdata_analyse.pdf and b/vignettes/ctrdata_analyse.pdf differ diff --git a/vignettes/ctrdata_install.pdf b/vignettes/ctrdata_install.pdf index c3cda40c..c8079cac 100644 Binary files a/vignettes/ctrdata_install.pdf and b/vignettes/ctrdata_install.pdf differ diff --git a/vignettes/ctrdata_retrieve.pdf b/vignettes/ctrdata_retrieve.pdf index 5a0a9fe3..2c5698fa 100644 Binary files a/vignettes/ctrdata_retrieve.pdf and b/vignettes/ctrdata_retrieve.pdf differ