Skip to content

Boehringer-Ingelheim/dso-r

Repository files navigation

dso-r: R companion package for dso

dso is a command line helper for building reproducible data analysis projects on top of dvc. To learn more about dso, please refer to the dso documentation. {dso-r} is the R companion package for dso. The purpose of this package is to provide access to files and configuration organized in a dso project.

Installation

For now, it is just possible to install the development version from GitHub:

remotes::install_github("Boehringer-Ingelheim/dso-r")

Typical usage

The DSO R-Package provides convenient access to stage parameters from R scripts or notebooks. Using read_params the params.yaml file of the specified stage is compiled and loaded into a dictionary. The path must be specified relative to the project root -- this ensures that the correct stage is found irrespective of the current working directory, as long as it the project root or any subdirectory thereof. Only parameters that are declared as params, dep, or output in dvc.yaml are loaded to ensure that one does not forget to keep the dvc.yaml updated.

library(dso)

params <- read_params("subfolder/my_stage")

# Access parameters
params$thresholds
params$samplesheet

By default, DSO compiles paths in configuration files to paths relative to each stage (see configuration). From R, you can use stage_here to resolve paths relative to the current stage independent of your current working directory. This works, because read_params has stored the path of the current stage in a configuration object that persists in the current R session. stage_here can use this information to resolve relative paths.

samplesheet <- readr::read_csv(stage_here(params$samplesheet))

When modifying the dvc.yaml, params.in.yaml, or params.yaml files during development, use the reload(params) function to ensure proper application of the changes by rebuilding and reloading the configuration.

reload(params)

Creating a stage within the R environment can be performed using create_stage and supplying it with the relative path of the stage from project root and a description.

create_stage(name = "subfolder/my_stage", description = "This stage does something")

API documentation

Please refer to the documentation website