Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to predict detection probabilities for observations on specific distances? #189

Open
wlangera opened this issue Nov 28, 2024 · 12 comments
Assignees
Labels
documentation needs further documentation enhancement question To discuss Requires discussion

Comments

@wlangera
Copy link

Hi

I want to get detection probabilities for observations on specific distances, but I only manage to predict average detection probabilities.

Is there any way to do this?
In this example you see that I get the same value for 50 m and 150 m although the detection probability clearly decreases over distance.

# Load required library
library(Distance)
#> Loading required package: mrds
#> This is mrds 3.0.0
#> Built: R 4.4.1; ; 2024-11-04 02:50:10 UTC; windows
#> 
#> Attaching package: 'Distance'
#> The following object is masked from 'package:mrds':
#> 
#>     create.bins

# Set seed for reproducibility
set.seed(123)

# Simulate dataset
n <- 200  # Number of observations
truncation_distance <- 250  # Maximum detection distance (truncation)

# Generate distances with a higher probability of being detected at shorter distances
distances <- rexp(n, rate = 0.01)  # Exponential distribution for decreasing probability
distances <- distances[distances <= truncation_distance]  # Truncate at 250 meters

# Add a categorical covariate (e.g., habitat type)
habitat <- sample(c("forest", "grassland"), length(distances), replace = TRUE)

# Combine into a data frame
data <- data.frame(
  distance = distances,
  habitat = habitat
)

# View the first few rows of the simulated data
head(data)
#>     distance   habitat
#> 1  84.345726    forest
#> 2  57.661027 grassland
#> 3 132.905487    forest
#> 4   3.157736    forest
#> 5   5.621098    forest
#> 6  31.650122    forest

# Fit a detection function using the Distance package
ds_model <- ds(
  data = data,
  truncation = truncation_distance,
  key = "hn",  # Half-normal key function
  adjustment = NULL,  # No adjustment terms
  formula = ~habitat  # Include habitat as a cov
)
#> Fitting half-normal key function
#> AIC= 1952.067
#> No survey area information supplied, only estimating detection function.

# Visualise detection curve
plot(ds_model)

# Predict detection probability for custom distance
predict(ds_model,
        newdata = data.frame(distance = 50, habitat = "forest"))
#> $fitted
#> [1] 0.4669516

predict(ds_model,
        newdata = data.frame(distance = 150, habitat = "forest"))
#> $fitted
#> [1] 0.4669516

predict(ds_model$ddf,
        newdata = data.frame(distance = 150, habitat = "forest"),
        integrate = FALSE)
#> $fitted
#> [1] 0.4669516

Created on 2024-11-28 with reprex v2.1.1

@erex
Copy link
Member

erex commented Nov 28, 2024

@wlangera This isn't actually a bug, instead it is predicting detection probability within the covered region, integrating over all distances < truncation.

When I have wanted to calculate detection probability at a given distance, in the face of covariates in the detection function (as you have), I have resorted to writing my own code to do so. Here is an example of the function, and its use:

```{r, detfnfn}
gz<-function(z,
             beta, sigintercept, sigcoef, DistWin=FALSE,
             key="HR", w=max(z)){
#this is a generic detection function that returns the probability of detecting an animal
#    z               generic distance (perpendicular) - can be scalar or vector
#    beta            shape coefficient
#    sigintercept  intercept coefficient for sigma
#    sigcoef         coefficient for specific factor level
#    DistWin         coefficients from Distance for Windows or from R
#    key             the detection function key, works for hazard rate and half normal
#    w               truncation distance, by default the max of the distances
#
#RETURNS: a probability
  
  if(key != "HN" & key != "HR") {
    stop("Argument 'key' must be either HN or HR")
  }
  if (DistWin) {
    sigma <- sigintercept + exp(sigcoef)
    exponent <- beta
  } else {
    numterms <- length(sigcoef)
    predictor <- 0
    for (i in 1:numterms) {
      predictor <- predictor + sigcoef[i]
    }
    sigma <- exp(sigintercept + predictor)
    exponent <- exp(beta)
  }
  if(key=="HR") {
    scale.dist <- z/sigma
    inside <- -(scale.dist)^(-exponent)
    gx <- 1 - exp(inside)
  } else {
    scale.dist <- z  # debatably don't scale for half normal
    inside <- -(scale.dist^2/(2*sigma^2))
    gx <- exp(inside)
  }
  return(gx)
}

Apply function to the selected yr24_hr_2level model

Code below plucks out the relevant estimated coefficients from our model that included species and bout as covariates. I demonstrate only for two species and the two bouts. Code then plots the derived detection function from the fitted covariate coefficients. This gives us a sense check whether we've picked up the coefficients correctly.

coefs <- yr24_hr_2level$ddf$par
xmax <- unname(yr24_hr_2level$ddf$meta.data$width)
xvals <- seq(0,xmax, length=100)
am1 <- gz(z=xvals, key="HR",beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
          sigcoef = c(coefs["speciesamerican"]))
am2 <- gz(z=xvals, key="HR",beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
          sigcoef = c(coefs["speciesamerican"], coefs["as.factor.sample.bout.2"]))
bomb1 <- gz(z=xvals, key="HR",beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
          sigcoef = c(coefs["speciesbombus"]))
bomb2 <- gz(z=xvals, key="HR",beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
          sigcoef = c(coefs["speciesbombus"], coefs["as.factor.sample.bout.2"]))
plot(xvals, am1, lwd=2, type="l",
     main="Construct species x bout detection functions manually",
     xlab="Distance", ylab="Detection probability")
lines(xvals, am2, lwd=2, col="blue")
lines(xvals, bomb1, lwd=2, col="darkgreen")
lines(xvals, bomb2, lwd=2, col="red")
legend("topright", legend=c("American1", "American2", "Bombus1", "Bombus2"),
       col=c("black","blue","darkgreen","red"), lwd=2)

Integration and conversion to detection probability

Sense check completed, we now use our gz function to "manually" compute species- and bout-specific detection probabilities for the for examples created above. The formula being applied here (see Lecture 2 of notes) is $\hat{P}_a$ is "area under curve" divided by "area of rectangle"

$$\hat{P}_a=\frac{\int^w_0 \hat{g}(x) dx}{w} $$

am1_p_int <- integrate(gz, lower=0, upper=xmax, key="HR",
                    beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
                    sigcoef = c(coefs["speciesamerican"]))$value / xmax
am2_p_int <- integrate(gz, lower=0, upper=xmax, key="HR",
                    beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
                    sigcoef = c(coefs["speciesamerican"], coefs["as.factor.sample.bout.2"]))$value / xmax
bomb1_p_int <- integrate(gz, lower=0, upper=xmax, key="HR",
                    beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
                    sigcoef = c(coefs["speciesbombus"]))$value / xmax
bomb2_p_int <- integrate(gz, lower=0, upper=xmax, key="HR",
                    beta=coefs["V1"], sigintercept = coefs["X.Intercept."], 
                    sigcoef = c(coefs["speciesbombus"], coefs["as.factor.sample.bout.2"]))$value / xmax

It's a bit of a faff, let me know if it works and makes sense for your situation.

If you're happy, I'll close the issue.

@lenthomas
Copy link
Member

It occured to me that there must be code to do this already, as used in the plot.ds function. It turns out to be the detfct function in the mrds package. It takes as input a "ddfobject" list, which can be pulled out of the ds_model object:

prediction_distances <- 0:100 #Just some example distances
#index = 1 uses covariates from the first row of data, which has habitat covariate forest
g_x_forest <- detfct(prediction_distances, ds_model$ddf$ds$aux$ddfobj, index = 1)
#index = 2 uses covariates from the second row of data, which has habitat covariate grassland
g_x_grassland <- detfct(prediction_distances, ds_model$ddf$ds$aux$ddfobj, index = 2)
plot(prediction_distances, g_x_forest, ylim = c(0, 1), xlab = "distance", ylab = "g(x)", type = "l")
lines(prediction_distances, g_x_grassland, col = "red")

There may be a still more elegant method that doesn't require you to know which row contains which covariate!

I also wonder if this might not make a nice article (vignette) at some point. (Also it took me a while to work out where the ddfobj was so that could be added to the help for the detfct function. Happy to go with your guidance on how best to document this, @erex.

@wlangera
Copy link
Author

@erex @lenthomas thanks for your quick replies!

I will test the code you provided.

I was not sure if I was using the wrong function or the predict.ds() function the wrong way.
It would be nice to add this option in the predict.ds() function I think.
The documentation already states that newdata "must include a column called distance".

@erex
Copy link
Member

erex commented Nov 28, 2024

Probably would be handy to have an example such as this kicking around in our vignettes/documentation.

I'll think about it. Unfortunately both predict.ds as well as detfct live within mrds so putting a vignette in that package might get missed by users of the Distance package.

@lenthomas
Copy link
Member

Hi @wlangera, good suggestion regarding extending predict.ds to allow it to predict individual detection probabilities. I'll add it as a new issue, labelling it "enhancement". I expect, however, it will be some time before we get to it!

In the meantime, we could add a bit to the documentation for predict.ds to make it clear it doesn't produce distance-specific detection probabilities and pointing to the detfct function in mrds. We could put a small example there to show users.

This doesn't preclude also having a more extensive example @erex. Regarding which package to put the vignette in -- how about (in any case) we add a web-only article to both the Distance and mrds github sites called something like "Other articles" and in it simply point out that other examples are available at [fill in the address of the other package here]. So that way users of our web site who use the drop-down to get a list of examples will see the "Other articles" link and can go browse the other site. Just a thought.

@erex
Copy link
Member

erex commented Nov 28, 2024

capturing the elements of this issue and placing them into a nascent .qmd vignette for further development

@erex
Copy link
Member

erex commented Nov 28, 2024

@erex
Copy link
Member

erex commented Nov 29, 2024

Somewhat better; half-normal for detection probability; cleaner way to identify index argument for detfct
nascent-vignette-about-detfct.pdf

@erex
Copy link
Member

erex commented Nov 30, 2024

This version interrogates the design matrix for proper index to pass to detfct. This should remedy mismatch of index when there is truncation and interrogation of the original data.
nascent-vignette-about-detfct.pdf

@erex
Copy link
Member

erex commented Dec 2, 2024

Newest version: using dsims to simulate data with actual differing detection functions for four covariate levels

library(dsims)
study.region <- make.region()
density <- make.density(region = study.region)
covariate.list <- list()
# The population equal proportion in four age/sex classes
covariate.list$agesex <- list(data.frame(level = c("youngmale", "youngfemale", "adultmale", "adultfemale"), 
                                          prob = rep(0.25,4)))
pop.desc <- make.population.description(region = study.region, 
                                        density = density, 
                                        N = 400,
                                        covariates = covariate.list,
                                        fixed.N = TRUE)
cov.params <- list()
cov.params$agesex = data.frame(level = c("youngmale", "youngfemale", "adultmale", "adultfemale"), 
                            param = c(0.2, 0.7, 1.0, 1.3))
detect.cov <- make.detectability(key.function = "hn" ,
                                 scale.param = 10,
                                 cov.param = cov.params, 
                                 truncation = 50)
plot(detect.cov, pop.desc, lwd=3)
design <- make.design(region = study.region,
                      transect.type = "line",
                      design = "systematic",
                      spacing = 100,
                      truncation = 50)
analysis.cov <- make.ds.analysis(dfmodel = list(~agesex),
                                 key = "hn",
                                 truncation = 50)
my.simulation <- make.simulation(reps = 999, 
                                 design = design,
                                 population.description = pop.desc,
                                 detectability = detect.cov,
                                 ds.analysis = analysis.cov)
my.survey <- run.survey(my.simulation)
plot(my.survey, study.region)
out <- analyse.data(analysis.cov, my.survey)

# simulation completed with a dsmodel object in "out"

ddfobj <- out$ddf$ds$aux$ddfobj
ym.indx <- which(ddfobj$xmat$agesex=="youngmale")[1]
yf.indx <- which(ddfobj$xmat$agesex=="youngfemale")[1]
am.indx <- which(ddfobj$xmat$agesex=="adultmale")[1]
af.indx <- which(ddfobj$xmat$agesex=="adultfemale")[1]

prediction_distances <- seq(0, 50) 
g_x_ym <- mrds::detfct(prediction_distances, ddfobj, index = ym.indx)
g_x_yf <- mrds::detfct(prediction_distances, ddfobj, index = yf.indx)
g_x_am <- mrds::detfct(prediction_distances, ddfobj, index = am.indx)
g_x_af <- mrds::detfct(prediction_distances, ddfobj, index = af.indx)

plot(out, pl.col="white", cex=0.7)
lines(prediction_distances, g_x_ym, col="darkgreen", lwd=2)
lines(prediction_distances, g_x_yf, col = "green", lwd=2)
lines(prediction_distances, g_x_am, col = "darkblue", lwd=2)
lines(prediction_distances, g_x_af, col = "brown", lwd=2)
legend("topright", title="Age/sex class", lwd=2,
       legend=c("young male", "young female", "adult male", "adult female"),
       col=c("darkgreen", "green", "darkblue", "brown"))

mydist <- 35
Young_male <- round(g_x_ym[prediction_distances==mydist],3)
Young_female <- round(g_x_yf[prediction_distances==mydist],3)
Adult_male <- round(g_x_am[prediction_distances==mydist],3)
Adult_female <- round(g_x_af[prediction_distances==mydist],3)
abline(v=mydist, lwd=3)
segments(0, Young_male, mydist, Young_male, col="darkgreen")
segments(0, Young_female, mydist, Young_female, col="green")
segments(0, Adult_male, mydist, Adult_male, col="darkblue")
segments(0, Adult_female, mydist, Adult_female, col="brown")
text(0, Young_male, Young_male, adj=c(-.1, -.3), cex=0.7)
text(0, Young_female, Young_female, adj=c(-.1, -.3), cex=0.7)
text(0, Adult_male, Adult_male, adj=c(-.1, -.3), cex=0.7)
text(0, Adult_female, Adult_female, adj=c(-.1, -.3), cex=0.7)

@wlangera
Copy link
Author

wlangera commented Dec 2, 2024

Thanks to your quick replies I was able to write a simple function that produces what I want :)

predict_det_probs <- function(ds_model, ...) {
  # Extract data
  data <- ds_model$ddf$ds$aux$ddfobj$xmat

  # Extract ddf object
  ddf_object <- ds_model$ddf$ds$aux$ddfobj

  # Calculate observation specific detection probabilities
  det_probs <- sapply(seq_len(nrow(data)), function(i) {
    mrds::detfct(
      distance = data$distance[i],
      ddfobj = ddf_object,
      index = i,
      ...)
  })

  # Add detection probabilities to dataframe
  data$det_prob <- det_probs

  return(data)
}

Example:

# Load required library
library(Distance)

# Set seed for reproducibility
set.seed(123)

# Simulate dataset
n <- 200  # Number of observations
truncation_distance <- 250  # Maximum detection distance (truncation)

# Generate distances with a higher probability of being detected at shorter distances
distances <- rexp(n, rate = 0.01)  # Exponential distribution for decreasing probability
distances <- distances[distances <= truncation_distance]  # Truncate at 250 meters

# Add a categorical covariate (e.g., habitat type)
habitat <- sample(c("forest", "grassland"), length(distances), replace = TRUE)

# Combine into a data frame
data <- data.frame(
  distance = distances,
  habitat = habitat
)

# View the first few rows of the simulated data
head(data)
#>     distance   habitat
#> 1  84.345726    forest
#> 2  57.661027 grassland
#> 3 132.905487    forest
#> 4   3.157736    forest
#> 5   5.621098    forest
#> 6  31.650122    forest

# Fit a detection function using the Distance package
ds_model <- ds(
  data = data,
  truncation = truncation_distance,
  key = "hn",  # Half-normal key function
  adjustment = NULL,  # No adjustment terms
  formula = ~habitat  # Include habitat as a cov
)
#> Fitting half-normal key function
#> AIC= 1952.067
#> No survey area information supplied, only estimating detection function.

# Predict detection probability for custom distances
predict_df <- predict_det_probs(ds_model)
head(predict_df)
#>   object   habitat   distance binned observer detected  det_prob
#> 1      1    forest  84.345726  FALSE        1        1 0.6678533
#> 2      2 grassland  57.661027  FALSE        1        1 0.8852882
#> 3      3    forest 132.905487  FALSE        1        1 0.3670289
#> 4      4    forest   3.157736  FALSE        1        1 0.9994344
#> 5      5    forest   5.621098  FALSE        1        1 0.9982087
#> 6      6    forest  31.650122  FALSE        1        1 0.9447434

Created on 2024-12-02 with reprex v2.1.1

@wlangera
Copy link
Author

wlangera commented Dec 2, 2024

Of course this only returns predicted values and no measure of uncertainty ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation needs further documentation enhancement question To discuss Requires discussion
Projects
None yet
Development

No branches or pull requests

4 participants