How to predict detection probabilities for observations on specific distances? #189

wlangera · 2024-11-28T14:27:22Z

Hi

I want to get detection probabilities for observations on specific distances, but I only manage to predict average detection probabilities.

Is there any way to do this?
In this example you see that I get the same value for 50 m and 150 m although the detection probability clearly decreases over distance.

# Load required library
library(Distance)
#> Loading required package: mrds
#> This is mrds 3.0.0
#> Built: R 4.4.1; ; 2024-11-04 02:50:10 UTC; windows
#> 
#> Attaching package: 'Distance'
#> The following object is masked from 'package:mrds':
#> 
#>     create.bins

# Set seed for reproducibility
set.seed(123)

# Simulate dataset
n <- 200  # Number of observations
truncation_distance <- 250  # Maximum detection distance (truncation)

# Generate distances with a higher probability of being detected at shorter distances
distances <- rexp(n, rate = 0.01)  # Exponential distribution for decreasing probability
distances <- distances[distances <= truncation_distance]  # Truncate at 250 meters

# Add a categorical covariate (e.g., habitat type)
habitat <- sample(c("forest", "grassland"), length(distances), replace = TRUE)

# Combine into a data frame
data <- data.frame(
  distance = distances,
  habitat = habitat
)

# View the first few rows of the simulated data
head(data)
#>     distance   habitat
#> 1  84.345726    forest
#> 2  57.661027 grassland
#> 3 132.905487    forest
#> 4   3.157736    forest
#> 5   5.621098    forest
#> 6  31.650122    forest

# Fit a detection function using the Distance package
ds_model <- ds(
  data = data,
  truncation = truncation_distance,
  key = "hn",  # Half-normal key function
  adjustment = NULL,  # No adjustment terms
  formula = ~habitat  # Include habitat as a cov
)
#> Fitting half-normal key function
#> AIC= 1952.067
#> No survey area information supplied, only estimating detection function.

# Visualise detection curve
plot(ds_model)

# Predict detection probability for custom distance
predict(ds_model,
        newdata = data.frame(distance = 50, habitat = "forest"))
#> $fitted
#> [1] 0.4669516

predict(ds_model,
        newdata = data.frame(distance = 150, habitat = "forest"))
#> $fitted
#> [1] 0.4669516

predict(ds_model$ddf,
        newdata = data.frame(distance = 150, habitat = "forest"),
        integrate = FALSE)
#> $fitted
#> [1] 0.4669516

^{Created on 2024-11-28 with reprex v2.1.1}

erex · 2024-11-28T15:41:05Z

@wlangera This isn't actually a bug, instead it is predicting detection probability within the covered region, integrating over all distances < truncation.

When I have wanted to calculate detection probability at a given distance, in the face of covariates in the detection function (as you have), I have resorted to writing my own code to do so. Here is an example of the function, and its use:

```{r, detfnfn}
gz<-function(z,
             beta, sigintercept, sigcoef, DistWin=FALSE,
             key="HR", w=max(z)){
#this is a generic detection function that returns the probability of detecting an animal
#    z               generic distance (perpendicular) - can be scalar or vector
#    beta            shape coefficient
#    sigintercept  intercept coefficient for sigma
#    sigcoef         coefficient for specific factor level
#    DistWin         coefficients from Distance for Windows or from R
#    key             the detection function key, works for hazard rate and half normal
#    w               truncation distance, by default the max of the distances
#
#RETURNS: a probability
  
  if(key != "HN" & key != "HR") {
    stop("Argument 'key' must be either HN or HR")
  }
  if (DistWin) {
    sigma <- sigintercept + exp(sigcoef)
    exponent <- beta
  } else {
    numterms <- length(sigcoef)
    predictor <- 0
    for (i in 1:numterms) {
      predictor <- predictor + sigcoef[i]
    }
    sigma <- exp(sigintercept + predictor)
    exponent <- exp(beta)
  }
  if(key=="HR") {
    scale.dist <- z/sigma
    inside <- -(scale.dist)^(-exponent)
    gx <- 1 - exp(inside)
  } else {
    scale.dist <- z  # debatably don't scale for half normal
    inside <- -(scale.dist^2/(2*sigma^2))
    gx <- exp(inside)
  }
  return(gx)
}

Apply function to the selected `yr24_hr_2level` model

Code below plucks out the relevant estimated coefficients from our model that included species and bout as covariates. I demonstrate only for two species and the two bouts. Code then plots the derived detection function from the fitted covariate coefficients. This gives us a sense check whether we've picked up the coefficients correctly.

coefs <- yr24_hr_2level$ddf$par
xmax <- unname(yr24_hr_2level$ddf$meta.data$width)
xvals <- seq(0,xmax, length=100)
am1 <- gz(z=xvals, key="HR",beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
          sigcoef = c(coefs["speciesamerican"]))
am2 <- gz(z=xvals, key="HR",beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
          sigcoef = c(coefs["speciesamerican"], coefs["as.factor.sample.bout.2"]))
bomb1 <- gz(z=xvals, key="HR",beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
          sigcoef = c(coefs["speciesbombus"]))
bomb2 <- gz(z=xvals, key="HR",beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
          sigcoef = c(coefs["speciesbombus"], coefs["as.factor.sample.bout.2"]))
plot(xvals, am1, lwd=2, type="l",
     main="Construct species x bout detection functions manually",
     xlab="Distance", ylab="Detection probability")
lines(xvals, am2, lwd=2, col="blue")
lines(xvals, bomb1, lwd=2, col="darkgreen")
lines(xvals, bomb2, lwd=2, col="red")
legend("topright", legend=c("American1", "American2", "Bombus1", "Bombus2"),
       col=c("black","blue","darkgreen","red"), lwd=2)

Integration and conversion to detection probability

Sense check completed, we now use our gz function to "manually" compute species- and bout-specific detection probabilities for the for examples created above. The formula being applied here (see Lecture 2 of notes) is $\hat{P}_a$ is "area under curve" divided by "area of rectangle"

$$\hat{P}_a=\frac{\int^w_0 \hat{g}(x) dx}{w} $$

am1_p_int <- integrate(gz, lower=0, upper=xmax, key="HR",
                    beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
                    sigcoef = c(coefs["speciesamerican"]))$value / xmax
am2_p_int <- integrate(gz, lower=0, upper=xmax, key="HR",
                    beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
                    sigcoef = c(coefs["speciesamerican"], coefs["as.factor.sample.bout.2"]))$value / xmax
bomb1_p_int <- integrate(gz, lower=0, upper=xmax, key="HR",
                    beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
                    sigcoef = c(coefs["speciesbombus"]))$value / xmax
bomb2_p_int <- integrate(gz, lower=0, upper=xmax, key="HR",
                    beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
                    sigcoef = c(coefs["speciesbombus"], coefs["as.factor.sample.bout.2"]))$value / xmax

It's a bit of a faff, let me know if it works and makes sense for your situation.

If you're happy, I'll close the issue.

lenthomas · 2024-11-28T16:23:17Z

It occured to me that there must be code to do this already, as used in the plot.ds function. It turns out to be the detfct function in the mrds package. It takes as input a "ddfobject" list, which can be pulled out of the ds_model object:

prediction_distances <- 0:100 #Just some example distances
#index = 1 uses covariates from the first row of data, which has habitat covariate forest
g_x_forest <- detfct(prediction_distances, ds_model$ddf$ds$aux$ddfobj, index = 1)
#index = 2 uses covariates from the second row of data, which has habitat covariate grassland
g_x_grassland <- detfct(prediction_distances, ds_model$ddf$ds$aux$ddfobj, index = 2)
plot(prediction_distances, g_x_forest, ylim = c(0, 1), xlab = "distance", ylab = "g(x)", type = "l")
lines(prediction_distances, g_x_grassland, col = "red")

There may be a still more elegant method that doesn't require you to know which row contains which covariate!

I also wonder if this might not make a nice article (vignette) at some point. (Also it took me a while to work out where the ddfobj was so that could be added to the help for the detfct function. Happy to go with your guidance on how best to document this, @erex.

wlangera · 2024-11-28T17:05:37Z

@erex @lenthomas thanks for your quick replies!

I will test the code you provided.

I was not sure if I was using the wrong function or the predict.ds() function the wrong way.
It would be nice to add this option in the predict.ds() function I think.
The documentation already states that newdata "must include a column called distance".

erex · 2024-11-28T17:09:25Z

Probably would be handy to have an example such as this kicking around in our vignettes/documentation.

I'll think about it. Unfortunately both predict.ds as well as detfct live within mrds so putting a vignette in that package might get missed by users of the Distance package.

lenthomas · 2024-11-28T17:17:33Z

Hi @wlangera, good suggestion regarding extending predict.ds to allow it to predict individual detection probabilities. I'll add it as a new issue, labelling it "enhancement". I expect, however, it will be some time before we get to it!

In the meantime, we could add a bit to the documentation for predict.ds to make it clear it doesn't produce distance-specific detection probabilities and pointing to the detfct function in mrds. We could put a small example there to show users.

This doesn't preclude also having a more extensive example @erex. Regarding which package to put the vignette in -- how about (in any case) we add a web-only article to both the Distance and mrds github sites called something like "Other articles" and in it simply point out that other examples are available at [fill in the address of the other package here]. So that way users of our web site who use the drop-down to get a list of examples will see the "Other articles" link and can go browse the other site. Just a thought.

erex · 2024-11-28T17:21:11Z

capturing the elements of this issue and placing them into a nascent .qmd vignette for further development

erex · 2024-11-28T17:50:46Z

nascent-vignette-about-detfct.pdf

erex · 2024-11-29T13:23:04Z

Somewhat better; half-normal for detection probability; cleaner way to identify index argument for detfct
nascent-vignette-about-detfct.pdf

erex · 2024-11-30T10:45:39Z

This version interrogates the design matrix for proper index to pass to detfct. This should remedy mismatch of index when there is truncation and interrogation of the original data.
nascent-vignette-about-detfct.pdf

erex · 2024-12-02T08:09:54Z

Newest version: using dsims to simulate data with actual differing detection functions for four covariate levels

library(dsims)
study.region <- make.region()
density <- make.density(region = study.region)
covariate.list <- list()
# The population equal proportion in four age/sex classes
covariate.list$agesex <- list(data.frame(level = c("youngmale", "youngfemale", "adultmale", "adultfemale"), 
                                          prob = rep(0.25,4)))
pop.desc <- make.population.description(region = study.region, 
                                        density = density, 
                                        N = 400,
                                        covariates = covariate.list,
                                        fixed.N = TRUE)
cov.params <- list()
cov.params$agesex = data.frame(level = c("youngmale", "youngfemale", "adultmale", "adultfemale"), 
                            param = c(0.2, 0.7, 1.0, 1.3))
detect.cov <- make.detectability(key.function = "hn" ,
                                 scale.param = 10,
                                 cov.param = cov.params, 
                                 truncation = 50)
plot(detect.cov, pop.desc, lwd=3)
design <- make.design(region = study.region,
                      transect.type = "line",
                      design = "systematic",
                      spacing = 100,
                      truncation = 50)
analysis.cov <- make.ds.analysis(dfmodel = list(~agesex),
                                 key = "hn",
                                 truncation = 50)
my.simulation <- make.simulation(reps = 999, 
                                 design = design,
                                 population.description = pop.desc,
                                 detectability = detect.cov,
                                 ds.analysis = analysis.cov)
my.survey <- run.survey(my.simulation)
plot(my.survey, study.region)
out <- analyse.data(analysis.cov, my.survey)

# simulation completed with a dsmodel object in "out"

ddfobj <- out$ddf$ds$aux$ddfobj
ym.indx <- which(ddfobj$xmat$agesex=="youngmale")[1]
yf.indx <- which(ddfobj$xmat$agesex=="youngfemale")[1]
am.indx <- which(ddfobj$xmat$agesex=="adultmale")[1]
af.indx <- which(ddfobj$xmat$agesex=="adultfemale")[1]

prediction_distances <- seq(0, 50) 
g_x_ym <- mrds::detfct(prediction_distances, ddfobj, index = ym.indx)
g_x_yf <- mrds::detfct(prediction_distances, ddfobj, index = yf.indx)
g_x_am <- mrds::detfct(prediction_distances, ddfobj, index = am.indx)
g_x_af <- mrds::detfct(prediction_distances, ddfobj, index = af.indx)

plot(out, pl.col="white", cex=0.7)
lines(prediction_distances, g_x_ym, col="darkgreen", lwd=2)
lines(prediction_distances, g_x_yf, col = "green", lwd=2)
lines(prediction_distances, g_x_am, col = "darkblue", lwd=2)
lines(prediction_distances, g_x_af, col = "brown", lwd=2)
legend("topright", title="Age/sex class", lwd=2,
       legend=c("young male", "young female", "adult male", "adult female"),
       col=c("darkgreen", "green", "darkblue", "brown"))

mydist <- 35
Young_male <- round(g_x_ym[prediction_distances==mydist],3)
Young_female <- round(g_x_yf[prediction_distances==mydist],3)
Adult_male <- round(g_x_am[prediction_distances==mydist],3)
Adult_female <- round(g_x_af[prediction_distances==mydist],3)
abline(v=mydist, lwd=3)
segments(0, Young_male, mydist, Young_male, col="darkgreen")
segments(0, Young_female, mydist, Young_female, col="green")
segments(0, Adult_male, mydist, Adult_male, col="darkblue")
segments(0, Adult_female, mydist, Adult_female, col="brown")
text(0, Young_male, Young_male, adj=c(-.1, -.3), cex=0.7)
text(0, Young_female, Young_female, adj=c(-.1, -.3), cex=0.7)
text(0, Adult_male, Adult_male, adj=c(-.1, -.3), cex=0.7)
text(0, Adult_female, Adult_female, adj=c(-.1, -.3), cex=0.7)

wlangera · 2024-12-02T13:45:04Z

Thanks to your quick replies I was able to write a simple function that produces what I want :)

predict_det_probs <- function(ds_model, ...) {
  # Extract data
  data <- ds_model$ddf$ds$aux$ddfobj$xmat

  # Extract ddf object
  ddf_object <- ds_model$ddf$ds$aux$ddfobj

  # Calculate observation specific detection probabilities
  det_probs <- sapply(seq_len(nrow(data)), function(i) {
    mrds::detfct(
      distance = data$distance[i],
      ddfobj = ddf_object,
      index = i,
      ...)
  })

  # Add detection probabilities to dataframe
  data$det_prob <- det_probs

  return(data)
}

Example:

# Load required library
library(Distance)

# Set seed for reproducibility
set.seed(123)

# Simulate dataset
n <- 200  # Number of observations
truncation_distance <- 250  # Maximum detection distance (truncation)

# Generate distances with a higher probability of being detected at shorter distances
distances <- rexp(n, rate = 0.01)  # Exponential distribution for decreasing probability
distances <- distances[distances <= truncation_distance]  # Truncate at 250 meters

# Add a categorical covariate (e.g., habitat type)
habitat <- sample(c("forest", "grassland"), length(distances), replace = TRUE)

# Combine into a data frame
data <- data.frame(
  distance = distances,
  habitat = habitat
)

# View the first few rows of the simulated data
head(data)
#>     distance   habitat
#> 1  84.345726    forest
#> 2  57.661027 grassland
#> 3 132.905487    forest
#> 4   3.157736    forest
#> 5   5.621098    forest
#> 6  31.650122    forest

# Fit a detection function using the Distance package
ds_model <- ds(
  data = data,
  truncation = truncation_distance,
  key = "hn",  # Half-normal key function
  adjustment = NULL,  # No adjustment terms
  formula = ~habitat  # Include habitat as a cov
)
#> Fitting half-normal key function
#> AIC= 1952.067
#> No survey area information supplied, only estimating detection function.

# Predict detection probability for custom distances
predict_df <- predict_det_probs(ds_model)
head(predict_df)
#>   object   habitat   distance binned observer detected  det_prob
#> 1      1    forest  84.345726  FALSE        1        1 0.6678533
#> 2      2 grassland  57.661027  FALSE        1        1 0.8852882
#> 3      3    forest 132.905487  FALSE        1        1 0.3670289
#> 4      4    forest   3.157736  FALSE        1        1 0.9994344
#> 5      5    forest   5.621098  FALSE        1        1 0.9982087
#> 6      6    forest  31.650122  FALSE        1        1 0.9447434

^{Created on 2024-12-02 with reprex v2.1.1}

wlangera · 2024-12-02T13:51:42Z

Of course this only returns predicted values and no measure of uncertainty ...

lenthomas added enhancement question labels Dec 3, 2024

lenthomas assigned lenthomas and erex Dec 3, 2024

erex mentioned this issue Dec 6, 2024

Distance new vignette for predicting detection probabilities #190

Open

LHMarshall added documentation needs further documentation To discuss Requires discussion labels Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to predict detection probabilities for observations on specific distances? #189

How to predict detection probabilities for observations on specific distances? #189

wlangera commented Nov 28, 2024

erex commented Nov 28, 2024

lenthomas commented Nov 28, 2024

wlangera commented Nov 28, 2024

erex commented Nov 28, 2024

lenthomas commented Nov 28, 2024

erex commented Nov 28, 2024

erex commented Nov 28, 2024

erex commented Nov 29, 2024

erex commented Nov 30, 2024

erex commented Dec 2, 2024

wlangera commented Dec 2, 2024

wlangera commented Dec 2, 2024

How to predict detection probabilities for observations on specific distances? #189

How to predict detection probabilities for observations on specific distances? #189

Comments

wlangera commented Nov 28, 2024

erex commented Nov 28, 2024

Apply function to the selected yr24_hr_2level model

Integration and conversion to detection probability

lenthomas commented Nov 28, 2024

wlangera commented Nov 28, 2024

erex commented Nov 28, 2024

lenthomas commented Nov 28, 2024

erex commented Nov 28, 2024

erex commented Nov 28, 2024

erex commented Nov 29, 2024

erex commented Nov 30, 2024

erex commented Dec 2, 2024

wlangera commented Dec 2, 2024

wlangera commented Dec 2, 2024

Apply function to the selected `yr24_hr_2level` model