Skip to content

Commit

Permalink
Merge pull request #21 from bnicenboim/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
bnicenboim authored Jul 22, 2024
2 parents 1e2b13a + a435069 commit a17a534
Show file tree
Hide file tree
Showing 13 changed files with 214 additions and 38 deletions.
16 changes: 5 additions & 11 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: pangoling
Type: Package
Title: Access to Large Language Model Predictions
Version: 0.0.0.9009
Version: 0.0.0.9010
Authors@R: c(
person("Bruno", "Nicenboim",
email = "[email protected]",
Expand All @@ -18,14 +18,7 @@ BugReports: https://github.com/bnicenboim/pangoling/issues
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: false
Config/reticulate/autoconfigure:
list(
packages = list(
list(package = "torch"),
list(package = "transformers")
)
)
Imports:
Imports:
data.table,
memoise,
reticulate,
Expand All @@ -39,9 +32,10 @@ Suggests:
testthat (>= 3.0.0),
tictoc,
covr,
spelling
spelling,
rstudioapi
Config/testthat/edition: 3
RoxygenNote: 7.2.3
RoxygenNote: 7.3.1
Roxygen: list(markdown = TRUE)
Depends:
R (>= 4.1.0)
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ export(causal_lp_mats)
export(causal_next_tokens_tbl)
export(causal_preload)
export(causal_tokens_lp_tbl)
export(install_py_pangoling)
export(masked_config)
export(masked_lp)
export(masked_preload)
Expand Down
141 changes: 127 additions & 14 deletions R/utils.R
Original file line number Diff line number Diff line change
@@ -1,23 +1,133 @@
#' Install the Python packages needed for `pangoling`
#'
#' @description
#' `install_py_pangoling` function facilitates the installation of Python packages needed for using `pangoling` within an R environment,
#' utilizing the `reticulate` package for managing Python environments. It supports various installation methods,
#' environment settings, and Python versions.
#'
#' @usage
#' install_py_pangoling(method = c("auto", "virtualenv", "conda"),
#' conda = "auto",
#' version = "default",
#' envname = "r-pangoling",
#' restart_session = TRUE,
#' conda_python_version = NULL,
#' ...,
#' pip_ignore_installed = FALSE,
#' new_env = identical(envname, "r-pangoling"),
#' python_version = NULL)
#'
#' @param method A character vector specifying the environment management method.
#' Options are 'auto', 'virtualenv', and 'conda'. Default is 'auto'.
#' @param conda Specifies the conda binary to use. Default is 'auto'.
#' @param version The Python version to use. Default is 'default', automatically selected.
#' @param envname Name of the virtual environment. Default is 'r-pangoling'.
#' @param restart_session Logical, whether to restart the R session after installation.
#' Default is TRUE.
#' @param conda_python_version Python version for conda environments.
#' @param ... Additional arguments passed to `reticulate::py_install`.
#' @param pip_ignore_installed Logical, whether to ignore already installed packages.
#' Default is FALSE.
#' @param new_env Logical, whether to create a new environment if `envname` is 'r-pangoling'.
#' Default is the identity of `envname`.
#' @param python_version Specifies the Python version for the environment.
#'
#' @details
#' This function automatically selects the appropriate method for environment management and Python installation,
#' with a focus on virtual and conda environments. It ensures flexibility in dependency management and Python version control.
#' If a new environment is created, existing environments with the same name are removed.
#'
#' @return
#' The function returns `NULL` invisibly, but outputs a message on successful installation.
#' @export
install_py_pangoling <- function(method = c("auto", "virtualenv", "conda"),
conda = "auto",
version = "default",
envname = "r-pangoling",
restart_session = TRUE,
conda_python_version = NULL,
...,
pip_ignore_installed = FALSE,
new_env = identical(envname, "r-pangoling"),
python_version = NULL
){

method <- match.arg(method)

python_version <- python_version %||% conda_python_version
if(method %in% c("auto", "virtualenv") &&
is.null(python_version)) {

# virtualenv_starter() picks the most recent version available, but older
# versions of tensorflow typically don't work with the latest Python
# release. In general, we're better off picking the oldest Python version available
# that works with the current release of tensorflow.

available <- reticulate::virtualenv_starter(version = ">=3.9", all = TRUE)
# pick the smallest minor version, ignoring patchlevel
if(nrow(available))
python_version <- min(available$version[, 1:2])
}

if (isTRUE(new_env)) {

if (method %in% c("auto", "virtualenv") &&
reticulate::virtualenv_exists(envname))
reticulate::virtualenv_remove(envname = envname, confirm = FALSE)

if (method %in% c("auto", "conda")) {
if (!is.null(tryCatch(reticulate::conda_python(envname, conda = conda),
error = function(e) NULL)))
reticulate::conda_remove(envname, conda = conda)
}

}
packages <- c("transformers", "torch")
py_install_args <- list(
packages = packages,
envname = envname,
method = method,
conda = conda,
python_version = python_version,
pip = TRUE,
pip_ignore_installed = pip_ignore_installed,
...
)

do.call(reticulate::py_install, py_install_args)
cat("\nInstallation complete.\n\n")

if (restart_session &&
requireNamespace("rstudioapi", quietly = TRUE) &&
rstudioapi::hasFun("restartSession"))
rstudioapi::restartSession()

invisible(NULL)

}


#' @noRd
message_verbose <- function(...) {
if (options()$pangoling.verbose > 0) message(...)
}


#' @noRd
stop2 <- function(...) {
stop(..., call. = FALSE)
}

#' #' Replacement of str_match
#' #' @noRd
#' chr_match <- function(string, pattern) {
#' matches <- regexec(pattern = pattern, text = string)
#' list_matches <- lapply(
#' regmatches(x = string, m = matches),
#' function(x) if (length(x) == 0) NA else x
#' )
#' do.call("rbind", list_matches)
#' }
# #' Replacement of str_match
# #' @noRd
# chr_match <- function(string, pattern) {
# matches <- regexec(pattern = pattern, text = string)
# list_matches <- lapply(
# regmatches(x = string, m = matches),
# function(x) if (length(x) == 0) NA else x
# )
# do.call("rbind", list_matches)
# }


#' Replacement of str_detect
Expand All @@ -27,7 +137,10 @@ chr_detect <- function(string, pattern, ignore.case = FALSE) {
}


#' #' @noRd
#' message_debug <- function(...) {
#' if (options()$pangoling.verbose > 1) message(...)
#' }
#
# message_debug <- function(...) {
# if (options()$pangoling.verbose > 1) message(...)
# }

#' @noRd
"%||%" <- function(x, y) if (is.null(x)) y else x
9 changes: 3 additions & 6 deletions R/zzz.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,16 @@ torch <- NULL
#' @noRd
.onLoad <- function(libname, pkgname) {

# This will instruct reticulate to immediately try to configure the
# active Python environment, installing any required Python packages
# as necessary.
reticulate::configure_environment(pkgname)
reticulate::use_virtualenv("r-pangoling", required = FALSE)

# use superassignment to update global reference
transformers <<- reticulate::import("transformers",
delay_load = TRUE,
convert = FALSE
)
torch <<- reticulate::import("torch", delay_load = TRUE, convert = FALSE)
# TODO message or something if it's not installed
# ask about the env
op <- options()
op.pangoling <- list(
pangoling.debug = FALSE,
Expand Down Expand Up @@ -47,5 +46,3 @@ torch <- NULL
"\nAn introduction to the package can be found in https://bruno.nicenboim.me/pangoling/articles/\n Notice that pretrained models and tokenizers are downloaded from https://huggingface.co/ the first time they are used.\n For changing the cache folder use:\n
set_cache_folder(my_new_path)")
}


6 changes: 6 additions & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,12 @@ There is still no released version of `pangoling`. The package is in the ** earl
remotes::install_github("bnicenboim/pangoling")
```

`install_py_pangoling` function facilitates the installation of Python packages needed for using pangoling within an R environment, using the `reticulate` package for managing Python environments. This needs to be done once.

```{r, eval = FALSE}
install_py_pangoling()
```

## Example

This is a basic example which shows you how to get log-probabilities of words in a dataset:
Expand Down
14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,15 @@ changes. To install the latest version from github use:
remotes::install_github("bnicenboim/pangoling")
```

`install_py_pangoling` function facilitates the installation of Python
packages needed for using pangoling within an R environment, using the
`reticulate` package for managing Python environments. This needs to be
done once.

``` r
install_py_pangoling()
```

## Example

This is a basic example which shows you how to get log-probabilities of
Expand Down Expand Up @@ -101,6 +110,9 @@ df_sent <- df_sent |>
#> `The apple doesn't fall far from the tree.`
#> Text id: 2
#> `Don't judge a book by its cover.`
```

``` r
df_sent
#> # A tidytable: 15 × 3
#> sent_n word lp
Expand All @@ -125,7 +137,7 @@ df_sent
## How to cite

> Nicenboim B (2023). *pangoling: Access to language model predictions
> in R*. R package version 0.0.0.9008, DOI:
> in R*. R package version 0.0.0.9010, DOI:
> [10.5281/zenodo.7637526](https://zenodo.org/badge/latestdoi/497831295),
> <https://github.com/bnicenboim/pangoling>.
Expand Down
2 changes: 1 addition & 1 deletion man/causal_config.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/causal_next_tokens_tbl.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/causal_preload.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/causal_tokens_lp_tbl.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file removed man/figures/logo.png
Binary file not shown.
55 changes: 55 additions & 0 deletions man/install_py_pangoling.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 0 additions & 2 deletions man/pangoling-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit a17a534

Please sign in to comment.