CTDS-prac5-soln.Rmd

---
title: Camera trap distance sampling workshop
description: |
  <p style="color: red; font-size: 20px;">Practical 5 solution:<br> accounting for temporal activity and incorporating uncertainty</p>
author:
  - name: Workshop development group 
    url: https://workshops.distancesampling.org
    affiliation: CREEM, Univ of St Andrews
    affiliation_url: https://www.creem.st-andrews.ac.uk
date: "`r Sys.Date()`"
output: 
  distill::distill_article:
    toc: true
    toc_depth: 2
bibliography: howeetal18.bib
csl: apa.csl
---

```{r include=FALSE}
knitr::opts_chunk$set(eval=TRUE, echo=TRUE, message=FALSE, warnings=FALSE)
solution <- TRUE
```


# Estimating temporal availability for detection

Heat- and motion-sensitive camera traps detect only moving animals within the range of the sensor and the field of view of the camera. Animals are therefore unavailable for detection by camera traps when they are stationary, and when they are above (e.g., semi-arboreal species) or below (e.g., semi-fossorial species) the range of the sensor or the camera, regardless of their distance from the camera in two dimensions. This temporally limited availability for detection must be accounted for to avoid negative bias in estimated densities. When data are abundant, researchers may choose to include only data from times when 100% of the population can be assumed to be active within the vertical range of camera traps @howeetal. However, for rarely-detected species or surveys with lower effort, it might be necessary to include most or all observations of distance. In these situations, survey duration ($T_k$) might be 12- or 24-hours per day, and it becomes necessary to estimate the proportion of time included in $T_k$ when animals were available for detection. Methods for estimating this proportion directly from CT data have been described @rowcliffe_2014, and it can be included in analyses to estimate density @bessone_2020, for example as another multiplier, potentially with an associated standard errors.  This will be the subject of Practical 5.

## Different data used for this practical

Howe et al. [-@howeetal] analysed data for cameras in C&ocirc;te d'Ivoire that were operating 11.5 hours per day.  They conducted analyses using data in two ways:

- only from the crepuscular times of day (dawn and dusk); that's the analysis you conducted in Practical 2, and
- detections from the entire 11.5 period each day; the data we will analyse in this practical.

# Data formatting for use in `activity` package

The file containing the start times of video cameras forms the basis of temporal availability calculations.  Data from the camera logs are stored in a text file, `VideoStartTimes_Daytime.txt` and need to be massaged so the data can be analysed by the `activity` package [@activity_pkg].

```{r datamunge, eval=solution}
library(activity)
duikers <- "VideoStartTimes_Daytime.txt"
actdata <- read.table(duikers, header=TRUE)
# data wrangling to make a datetime value from separate
# fields for year, month, day, hour and minute
actdata$date <- paste("2016",
                      sprintf("%02i", actdata$month), 
                      sprintf("%02i", actdata$day),
                      sep="/")
actdata$time <- paste(sprintf("%02i", actdata$hour),
                      sprintf("%02i", actdata$minute),
                      sep=":")
actdata$datetime <- paste(actdata$date, actdata$time)
```


#  Functions in the `activity` package

We will employ two functions from the `activity` package.  First, convert the time of day of a camera triggering event into the fraction of the 24hr cycle when the event took place, measured in radians.  In other words, an event occurring at midday is recorded as $\pi$ and an event occurring at midnight is recorded as 2$\pi$.

```{r radian, eval=solution}
actdata$rtime <- gettime(actdata$datetime, "%Y/%m/%d %H:%M")
```

With the radian conversion of the camera triggering times, the distribution of the triggering events times is smoothed, using a kernel smoother by the function `fitact`.  The function estimates the proportion of time (in a 24hr day) animals were active.  In addition, the triggering time data can be resampled to provide a measure of uncertainty in the point estimate of activity proportion.

```{r activity, eval=solution}
act_result <- fitact(actdata$rtime, sample="data", reps=100)
```

A plot of the histogram of triggering times, along with the fitted smooth is provided by a plot function applied to the object returned by `fitact`.

```{r actplot, eval=solution, fig.cap="Fitted smooth to histogram of camera triggering times for Maxwell's duiker data."}
plot(act_result)
```

The value computed by the smooth through the activity histogram can be extracted from the object created by `fitact`.  The extraction reaches into the object to look at the `slot` called `act`.  The uncertainty around the point estimate is derived from resampling that takes place within `fitact`.  The slot will display the point estimates, standard error and confidence interval bounds.

```{r thenumber, eval=solution}
print(act_result@act)
```

Note, this is **not** the value we will need in our subsequent calculations.  Details about this coming shortly.

# Detection distances in "Daytime" data set 

The full "daytime" data set, detections record between 0630 and 1800, including detections outside the crepuscular period used in "peak" data set.  We read in the file containing the daytime detections and reorganise it for detection function fitting.

```{r organise}
daytime <- read.csv("DaytimeDistances.txt", header=TRUE, sep="\t")
daytime <- subset(daytime, select=-c(utm.e, utm.n))
daytime <- daytime[, c("Region.Label", "Area", "multiplier",
                       "Sample.Label", "Effort", "distance")]
```

# Detection function fitting to the "daytime" data

Before fitting a detection function to these data, there are a few constants that will be useful in the subsequent analysis steps.  We will define them here.

```{r utility, eval=solution}
library(Distance)
viewangle <- 42
field.of.view <- viewangle / 360
camera.operation.per.day <- 11.5
conversion <- convert_units("meter", NULL, "square kilometer")
trunc.list <- list(right=15, left=2)
mybreaks <- c(seq(2,8,1), 10, 12, 15)
```

`viewangle` is the field of view of the camera lens (in degrees), `camera.operation.per.day` is number of hours cameras were in operation each day, `conversion` are units of measure of quantities in the "daytime" data file, `trunc.list` are right and left truncation values for the detection function modelling and `mybreaks` contains cutpoints for distance bins.

We should perform a complete analysis, fitting a suite of detection function models to the "daytime" data set.  However, this would repeat much of the work you performed in Practical 2, so we will simply fit the hazard rate model without adjustments.

```{r dayhr0, eval=solution}
daytime.hr0 <- ds(daytime, transect = "point", key="hr", adjustment = NULL,
                  cutpoints = mybreaks, truncation = trunc.list,
                  convert_units=conversion)
plot(daytime.hr0, pdf=TRUE)
```

# Adjustments to detection function to estimate duiker density

Simply fitting a detection function to the detection distances and incorporating the encounter rates is not sufficient to estimate duiker density.  A simplifed analysis assumes:

- animals are equally available at all times the cameras are active and
- cameras make detections in a $360^\circ$ around the camera.

Neither of these are true, so adjustments must be made, using information provided about the camera and the daily duration of operation.

## Adjustment for temporal availability

We use the temporal availability information to create a *multiplier*.  Our multiplier must be defined as
> proportion of the *camera operation time* animals were available to be detected

This is not equivalent to the value produced by the `fitact` function; that value is the proportion of *24hr* animals were available to be detected.  The availability multiplier must be adjusted based on the daily camera operation period.  Uncertainty in this proportion is also included in our computations.

The point estimate and standard error are pulled from the `fitact` object, adjusted for daily camera operation time and placed into a data frame named `creation` in a named list, specifically in the fashion shown.

```{r avmultiplier, eval=solution}
prop.camera.time <- camera.operation.per.day / 24
avail <- list(creation=data.frame(rate = act_result@act[1]/prop.camera.time,
                                  SE   = act_result@act[2]/prop.camera.time))
```

## Adjustment for viewing angle

This second adjustment is much simpler as it is not a computed quantity and is known with certainty.  The function we are about to use has an argument `sampling_fraction`, which exactly suits the purpose of partial field of view for each camera.

# `dht2` for density estimation

Combining these adjustments with the computations associated with detection functions and encounter rates, calls for the use of a new function in the Distance package: `dht2`.  Arguments to this function are considerable because of all the things this function must do.

- ddf: the detection function model
- flatfile: the original data frame
- strat_formula: in case this is a stratified survey design
- convert_units: distance measures in the original data
- sample_fraction: for the camera field of view
- multipliers: availability named list

```{r dht2, eval=solution}
daytime_dht2 <- dht2(ddf=daytime.hr0, flatfile=daytime, strat_formula=~1,
                    convert_units=conversion, sample_fraction=field.of.view, 
                    multipliers=avail)
print(daytime_dht2, report="density")
```


# Bootstrap for variance estimation

To produce a more reliable estimate of the precision of the point estimate, produce bootstrap estimates using `bootdht`. The user needs to create a function and another named list to facilitate use of the bootstrap: a summary function to extract information from each replicate and a multiplier list describing how temporal availability is being derived.

## Summary function

As constructed, `mysummary` will keep the density estimate produced by each bootstrap replicate and the stratum (if any) to which the estimate pertains.

```{r mysummary}
mysummary <- function(ests, fit){
  return(data.frame(Label = ests$individuals$D$Label,
                    Dhat = ests$individuals$D$Estimate))
}
```

## Multiplier function

This rather complex list makes use of `make_activity_fn` that exists in the `Distance` package used to call the `fitact` function from the `activity` package.  For the user, your responsibility is to provide three arguments to this function:

- vector containing the detection times in radians,
- the manner in which precision of the temporal availability estimate is produced and
- the number of hours per day the cameras are in operation

```{r multifunc, eval=solution}
mult <- list(availability= make_activity_fn(actdata$rtime, sample="data",
                                            detector_daily_duration=camera.operation.per.day))
```

## Remaining arguments to `bootdht`

Just as with `dht2` there are arguments for the `model`, `flatfile`,  `sample_fraction`, `convert.units` and `multipliers` (although for `bootdht` `multipliers` uses a function rather than a single value).  The only novel arguments to `dht2` are `resample_transects` indicating camera stations are to be resampled with replacement, `nboot` for the number of bootstrap replicates and `cores` which can take advantage of multiple cores in your computer to speed computation.

```{r, bootstrap, results='hide', eval=solution}
daytime.boot.hr <- bootdht(model=daytime.hr0, flatfile=daytime,
                          resample_transects = TRUE, nboot=300, cores = 11,
                          summary_fun=mysummary, sample_fraction = field.of.view,
                          convert_units = conversion, multipliers=mult)
```

Confidence limits computed via the percentile method of the bootstrap.

```{r bootresult, eval=solution}
print(summary(daytime.boot.hr))
```

```{r, fig.width=8, fig.cap="Distribution of density estimates from bootstrap replicates.", eval=solution}
hist(daytime.boot.hr$Dhat, breaks = 20, 
     xlab="Estimated density", main="D-hat estimates bootstraps")
abline(v=quantile(daytime.boot.hr$Dhat, probs = c(0.025,0.975), na.rm=TRUE), lwd=2, lty=3)
```

# Questions

Comment upon the following comparisons

- Point estimate of density from `dht2` with median of the bootstrap replicate distribution,
  -<p style="color: blue; font-size: 110%; font-style: italic;">Reasonably good agreement between estimate computed by `dht2` and the median of the bootstrap distribution</p>
- Mean and median of the bootstrap replicate distribution,
  -<p style="color: blue; font-size: 110%; font-style: italic;">Mean is larger than the median of the distribution because the distribution is skewed right (long right tail in the distribution)</p>
- Width of the 95% confidence interval from `dht2` and from `bootdht`
  -<p style="color: blue; font-size: 110%; font-style: italic;">The confidence interval width is wider when computed using the bootstrap than when computed analytically using `dht2`.  This is particularly notable with the upper confidence bound that is larger with the bootstrap because of the long right tail.</p>

- Discuss the magnitude of components (detection function, encounter rate and multiplier) of the uncertainty in the density estimate as shown in `dht2` output.
  - Are there any lessons in those components for the design of future duiker camera trap surveys in the region?
    -<p style="color: blue; font-size: 110%; font-style: italic;">A minute amount of uncertainty in the duiker density estimate comes from the detection function modelling.  Much much more uncertainty comes from the large variability in encounter rates between camera stations, with a small amount contributed by uncertainty in the temporal availability estimate.  For future studies, the objective should be to reduce the encounter rate variability, or improve the estimate of encounter rate variability by increasing the number of camera stations.  Also perhaps worth considering a stratified survey design, grouping together stations with high encounter rates and stations with low encounter rates.</p>