Code Review #9

ablack3 · 2022-12-08T14:12:08Z

Overview

Overall the package is really useful and the code looks good to me. I made some suggestions for documentation updates and a few implementation notes.

Documentation

Overall the documentation is good. I've made some suggestions below but these are minor suggestions that I think might help improve the user experience.

The readme describes basic usage well. However it might be worth putting the information in a vignette as well with some more context (e.g. what are "permitted packages"?)

The package title and description in the DESCRIPTION file are a bit vague: "Tool Suite to Investigate Other Packages". Maybe the title should be something like "Tool Suite to Investigate R package dependencies".

Some of the prose in the readme could be improved. For example here is my suggested edit of one of the paragraphs.

original

Function use in all .R files can be investigated using the sumariseFunctionUse function. It assumes the function is runs inside an R-project and will automatically look through the .R files in the /R folder. Functions are grouped by the package they are imported from. When no package can be identified are grouped under unknown package. Usually base functions are not used as base::mean(). Therefore the function goes through all the base functions to bin them under the base package rather than unknown.

edit

Function use in all .R files can be investigated using the summariseFunctionUse function. sumariseFunctionUse is intended to be called from inside an R-project and will automatically look through the .R files in the /R folder. Functions used in .R files are are grouped together by the package they are imported from. Functions whose package cannot be identified are grouped together under the unknown label.

Make sure to spell check the package using spelling::spell_check_package(). Once all corrections are made you can update the package wordlist (words that should be added to the dictionary) with spelling::update_wordlist().

It looks like DependencyReviewer imports a few packages that need to be whitelisted.

Function descriptions should describe what they return or what side-effect they have. The documentation for checkDependencies should say something about its side effect of printing a report of dependencies that have not been approved.

Some arguments like in_package (in screenshot below) need a description. in_package is used in two exported functions so you could centralize the documentation for this argument. I also think you might be able to handle both relative and absolute paths in the files argument with normalizePath() and possibly deprecate the argument altogether. That just a thought though.

Argument names mix camel case and snake case. I wouldn't change that at this point. CDMConnector also does that. In Darwin we seem to be mixing the two styles.

Consider not exporting lower level functions like funsUsedInLine. Actually it looks like you already made this change in the main branch.

getGraphData documentation needs to be filled out a bit more. What is the input and output. What graph is being produced?

darwinLintScore takes in a linting function. However that linting function needs to return a very specific data format. It's not clear what other lint functions could be used with this function or if the user would want to have the flexibility to pass in any lint function. There should be some more description about what lint functions are available and what the requirements for a lint function that is accepted by darwinLintScore are.

Consider creating a package website with pkgdown and hosting it on github.

Implementation

Namespace and exports look good. For GGally consider only importing the one function you need (i.e. @importFrom GGally ggnet2) to silence the message "Registered S3 method overwritten by 'GGally': method from +.gg ggplot2" that is printed when the package is loaded.

The package lacks testing and I'd suggest that improving test coverage should be a focus on the next sprint for DependencyReviewer. However I can see how it is tricky to test these functions.

Below is alternative option for the bind_rows(lapply()) pattern that uses tidyr::unnest.

library(dplyr, warn.conflicts = F) # needed for %>% 
not_permitted <- c("dplyr", "pak")

example <-
  dplyr::bind_rows(
    lapply(
      X = not_permitted,
      FUN = function(pkg) {
        dplyr::tibble(pak::pkg_search(pkg)) %>%
          dplyr::filter(.data$package == pkg) %>%
          dplyr::select(.data$package, version, date, .data$downloads_last_month,
                        license, url)
      }))
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"package"` instead of `.data$package`
#> Warning: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0.
#> ℹ Please use `"downloads_last_month"` instead of `.data$downloads_last_month`
example
#> # A tibble: 2 × 6
#>   package version    date                downloads_last_month license      url  
#>   <chr>   <pckg_vrs> <dttm>                             <int> <chr>        <chr>
#> 1 dplyr   1.0.10     2022-09-01 08:20:06              1873114 MIT + file … http…
#> 2 pak     0.3.1      2022-09-08 20:30:02                39420 GPL-3        http…

# could be rewritten as

example <-
  dplyr::tibble(pkg = not_permitted) %>% 
  dplyr::mutate(pkg_search  = purrr::map(pkg, pak::pkg_search)) %>% 
  tidyr::unnest(pkg_search) %>% 
  dplyr::filter(.data$package == .data$pkg) %>%
  dplyr::select(package, version, date, downloads_last_month,
                license, url)

example
#> # A tibble: 2 × 6
#>   package version    date                downloads_last_month license      url  
#>   <chr>   <pckg_vrs> <dttm>                             <int> <chr>        <chr>
#> 1 dplyr   1.0.10     2022-09-01 08:20:06              1873114 MIT + file … http…
#> 2 pak     0.3.1      2022-09-08 20:30:02                39420 GPL-3        http…

Also note that you no longer need to use .data in dplyr::select. (https://tidyselect.r-lib.org/news/index.html#lifecycle-changes-1-2-0)

One comment about using "sapply": sapply is not type stable meaning that it sometimes simplifies it's output to a vector and sometimes it will return a list. From what I've heard it is safer to use vapply or lapply and explicitly specify the type of output.

# for example this is guaranteed to return a character vector
tidyversePackages <- vapply(
  X = tidyverse::tidyverse_packages(include_self = TRUE),
  FUN = function(pkg) {
    as.character(utils::packageVersion(pkg))
  },
  FUN.VALUE = character(1L)
)

When you call sapply for a side effect and not the return value (example, example) I think it does not matter since the return value is discarded.

Shiny app

This app is very nice. My only suggestion is that on the function review tab the selection of rows in the table should have some kind of effect. Currently it seems that selecting rows in the function use table does not have any effect on the app. I'm thinking that if the user selects a row then they should be taken to the corresponding row in the code viewer. Also users should only be allowed to select on row at a time.
You might also consider increasing the width of the checkBoxGroup input on the package review page to conserve the vertical screen real estate.

The shiny app code looks good to me. Overall its a great tool!

Miscellaneous

In

"Error was caught during the linting of your package. The package
might be to large to lint all together. Use: darwinLintFile(fileName)"

"to" should be "too".

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code Review #9

Code Review #9

ablack3 commented Dec 8, 2022 •

edited

Loading

Code Review #9

Code Review #9

Comments

ablack3 commented Dec 8, 2022 • edited Loading

Overview

Documentation

Implementation

Shiny app

Miscellaneous

ablack3 commented Dec 8, 2022 •

edited

Loading