Skip to content

Commit

Permalink
Tighted DelayedMask behavior to align with missing value policies.
Browse files Browse the repository at this point in the history
The major change is that, if 'is.na()' is TRUE for a placeholder, we will mask
all values for which 'is.na()' is TRUE; this includes NaNs that were previously
not masked. This aligns with the takane policy of not trusting NaN payloads.

Another change is that, if we see an integer dataset with a NA and the
placeholder is not NA, we throw an error because the dataset's NA was likely an
attempt at -2^31, which we just can't represent as a regular integer in R.
  • Loading branch information
LTLA committed Nov 27, 2023
1 parent 5ca5c17 commit f74651f
Show file tree
Hide file tree
Showing 3 changed files with 56 additions and 29 deletions.
43 changes: 26 additions & 17 deletions R/DelayedMask.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,15 @@
#' @param x An existing \pkg{DelayedArray} seed.
#' @param placeholder Placeholder value to replace with \code{NA}.
#' This should be of the same type as \code{\link{type}(x)}.
#' @param force Whether to forcibly create a DelayedMask if \code{placeholder} is already \code{NA}.
#'
#' @details
#' If \code{\link{is.na}(placeholder)} is true for double-precision \code{x}, masking is performed for all values of \code{x} where \code{is.na} is true.
#' This includes both NaNs and NAs; no attempt is made to distinguish between the NaN payloads.
#'
#' Currently, an error is raised for any integer \code{x} that produces non-missing values of -2^31 without a placeholder of \code{NA_integer_}.
#' This is because R cannot distinguish the integer -2^31 from an integer-type NA.
#'
#' @return A DelayedMask object, to be wrapped in a \code{\link{DelayedArray}}.
#' If \code{force=FALSE} and \code{placeholder} is already \code{NA}, \code{x} is directly returned.
#'
#' @author Aaron Lun
#'
Expand All @@ -33,17 +38,7 @@
#' @export
#' @import methods
#' @importClassesFrom DelayedArray DelayedArray
DelayedMask <- function(x, placeholder, force=FALSE) {
if (!force && is.na(placeholder)) {
if (type(x) == "double") {
if (!is.nan(placeholder)) {
return(x)
}
} else {
return(x)
}
}

DelayedMask <- function(x, placeholder) {
if (is(x, "DelayedArray")) {
x <- x@seed
}
Expand Down Expand Up @@ -99,12 +94,26 @@ setMethod("OLD_extract_sparse_array", "DelayedMask", function(x, index) {

.replace_missing <- function(vec, placeholder) {
if (is.na(placeholder)) {
if (is.nan(placeholder)) {
vec[is.nan(vec)] <- NA
if (anyNA(vec)) {
if (is.double(vec)) {
vec[is.na(vec)] <- NA # convert NaNs as well.
} else {
# no-op, everything is already NA.
}
}
} else {
# Using which() here to avoid problems with existing NAs.
vec[which(vec == placeholder)] <- NA
if (anyNA(vec)) {
if (is.integer(vec)) {
# We don't want to incorrectly propagate -2^31 as NAs, so we
# just throw an error here. Better to fail than to silently
# modify the meaning of the data.
stop("integer arrays containing both -2^31 and NA are not yet supported")
} else {
vec[which(vec == placeholder)] <- NA # "which()" avoids problems with existing NaNs.
}
} else {
vec[vec == placeholder] <- NA
}
}
vec
}
Expand Down
12 changes: 8 additions & 4 deletions man/DelayedMask.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

30 changes: 22 additions & 8 deletions tests/testthat/test-DelayedMask.R
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,7 @@ test_that("sparse tests for DelayedMask", {

test_that("DelayedMask works with special values", {
original <- matrix(rpois(400, lambda=2), ncol=10)
expect_identical(DelayedMask(original, NA), original)

masked <- DelayedMask(original, NA, force=TRUE)
masked <- DelayedMask(original, NA)
expect_identical(original, as.matrix(masked))

# Also works with NaN values, whether the placeholder is NaN or not.
Expand All @@ -70,9 +68,25 @@ test_that("DelayedMask works with special values", {
ref[is.nan(ref)] <- NA
expect_identical(ref, as.matrix(masked))

# Same with a different type.
copy <- original
storage.mode(copy) <- "integer"
expect_identical(DelayedMask(copy, NA), copy)
expect_identical(DelayedMask(copy, NaN), copy)
# If the placeholder is NA, both NaNs and NAs are considered to be missing,
# as we don't rely on the payload (i.e., same logic as is.na()).
masked <- DelayedMask(original, NA)
ref <- as.matrix(original)
ref[is.nan(ref)] <- NA
expect_identical(ref, as.matrix(masked))

# No problem if the dataset is integer.
as.int <- original
storage.mode(as.int) <- "integer"
masked <- DelayedMask(as.int, NA_integer_)
expect_identical(as.int, as.matrix(masked))
})

test_that("DelayedMask fails for integers with non-NA placeholders", {
original <- matrix(NA_integer_, 10, 10)
masked <- DelayedMask(original, NA)
expect_identical(original, as.matrix(masked))

masked <- DelayedMask(original, 10L)
expect_error(as.matrix(masked), "not yet supported")
})

0 comments on commit f74651f

Please sign in to comment.