Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Survey with single transect (capercaillie), CV(D) seems incorrect #192

Open
erex opened this issue Dec 12, 2024 · 7 comments
Open

Survey with single transect (capercaillie), CV(D) seems incorrect #192

erex opened this issue Dec 12, 2024 · 7 comments
Assignees
Labels

Comments

@erex
Copy link
Member

erex commented Dec 12, 2024

I thought CV(D) would be computed using the delta method approximation; sqrt(CV(Pa)^2 + CV(ER)^2). That calculation checks for the ducknest data when analysed.

However, the delta method does not seem to be applied to the capercaillie data. That data set has only a single transect, hence CV(ER)=0. If the delta method was being used, when CV(ER)=0, CV(D)=CV(Pa). Correct?

That is not the results from the code below. Why not?

MRE

library(Distance)
data(capercaillie)
capercaillie$size <- NULL
conversion.factor <- convert_units("meter", "kilometer", "hectare")
caper <- ds(data=capercaillie, key="hn", convert_units=conversion.factor)
summary(caper)

Result

                       Estimate          SE         CV
Average p             0.6128171  0.05302834 0.08653208
N in covered region 182.7625285 19.12013229 0.10461735

Summary statistics:
            Region Area CoveredArea Effort   n k        ER se.ER cv.ER
1 Monaughty Forest 1472        3840    240 112 1 0.4666667     0     0

Abundance:
  Label Estimate       se        cv      lcl      ucl       df
1 Total 70.05897 10.33494 0.1475178 40.21721 122.0437 2.318619

Density:
  Label   Estimate          se        cv        lcl        ucl       df
1 Total 0.04759441 0.007021021 0.1475178 0.02732148 0.08291015 2.318619

Reconstructing the calculation, to arrive at CV(D)=0.14752 when CV(Pa)=0.086532, ds has fabricated CV(ER)=0.11947, when in reality CV(ER)=0.

@erex
Copy link
Member Author

erex commented Dec 12, 2024

Similarly, the ETP_dolphin dataset shipped with Distance also possesses just a single transect, hence CV(ER)=0. CV(D) does not seem to be computed via the delta method. I've removed group size from the data set to focus upon CV(D)

MRE

library(Distance)
data("ETP_Dolphin")
unique(ETP_Dolphin$Sample.Label)
ETP_Dolphin$size <- NULL
etp <- ds(ETP_Dolphin, key="hr")
summary(etp)

Result

                        Estimate           SE         CV
Average p              0.6006503   0.03836829 0.06387792
N in covered region 1814.6999155 121.01156175 0.06668406

Summary statistics:
   Region Area CoveredArea Effort    n k   ER se.ER cv.ER
1 Default   10          10      1 1090 1 1090     0     0

Density:
  Label Estimate       se         cv      lcl      ucl       df
1 Total   181.47 12.34989 0.06805471 158.4468 207.8386 67.24963

Reconstructing this calculation, I surmise that CV(ER) would have to be 0.02347 to result in CV(D)=0.06805 when CV(Pa)=0.063878,

Ducknest

Manipulate the data such that all detections are assigned to Sample.Label=1

MRE

data("ducknest")
ducknest$Sample.Label <- 1
duck <- ds(ducknest, key="hn")
summary(duck)

Result

                       Estimate          SE         CV
Average p             0.8693482  0.03902051 0.04488479
N in covered region 614.2533225 29.19681554 0.04753221

Summary statistics:
   Region Area CoveredArea Effort   n k       ER se.ER cv.ER
1 Default  618         618 128.75 534 1 4.147573     0     0

Density:
  Label  Estimate         se         cv       lcl     ucl       df
1 Total 0.9939374 0.05998836 0.06035426 0.8511562 1.16067 4.992072

Documentation about uncertainty fot dht says the following:

The second component (encounter rate variance) can be computed in one of several ways depending on the form taken for the encounter rate and the estimator used. To begin with there three possible values for varflag to calculate encounter rate:

  • 0 uses a binomial variance for the number of observations (equation 13 of Borchers et al. 1998). This estimator is only useful if the sampled region is the survey region and the objects are not clustered; this situation will not occur very often;
  • 1 uses the encounter rate n/L (objects observed per unit transect) from Buckland et al. (2001) pg 78-79 (equation 3.78) for line transects (see also Fewster et al, 2009 estimator R2). This variance estimator is not appropriate if size or a derivative of size is used in the detection function;
  • 2 is the default and uses the encounter rate estimator N^/L (estimated abundance per unit transect) suggested by Innes et al (2002) and Marques & Buckland (2004).

this doesn't resolve the issue for me

@lenthomas
Copy link
Member

If I look under the help for dht.se (which is linked from the uncertainty section of dht, it says:

image

Does that solve this issue?

@lenthomas lenthomas self-assigned this Dec 13, 2024
@erex
Copy link
Member Author

erex commented Dec 13, 2024

Thanks @lenthomas . I vaguely remembered something about the Poisson assumption, but I'm not clear what $x$ is in that notation?

My first thought is that the Poisson is invoked to approximate var(ER) and hence CV(ER). If that was the case, then for the cappercaillie, we would have:

er <- .46666667
var.er <- er  # by Poisson
cv.er <- sqrt(var.er)/er
print(cv.er)
[1] 1.46385

whereas, by my calculation dht is using CV(ER)=0.11947.

For ETP_Dolphin

er <- 1090
var.er <- er  # by Poisson
cv.er <- sqrt(var.er)/er
print(cv.er)
[1] 0.03028913

whereas I surmise that CV(ER) would have to be 0.02347.

Either I am misunderstanding what $x$ is in $Var(x)=x$, or Poisson is not the answer.

In any event, the software is creating a measure of uncertainty for encounter rate; but that measure is not printed in the output as both se(ER) and CV(ER) are printed as 0. Would be nice to know what is happening under the hood.

@erex
Copy link
Member Author

erex commented Dec 14, 2024

Running debug(mrds::dht.se) reveals the mystery.

The Poisson gets invoked, not on encounter rate, but rather on $\hat{N}$ here in a curious way ($Ni^2 / Ni$).

The Poisson variance of $\hat{N}$ is then added to the variance associated with the parameter estimates of the detection function model.

It doesn't seem right to me that the Poisson is applied to $\hat{N}$ rather than ER, but that's how the computation is carried out.

@lenthomas
Copy link
Member

I had a quick look through the code. I think that's how the Innes et al. estimator works, although I agree it seems strange. The encounter rate version that you're thinkng of is what gets invoked when options$varflag == 1. The Ni^2/Ni is a weighting -- scaling up variance on density to variance on abundance. I agree it can be simplified in that case but is probably left the way it is for clarity. (I'd have been tempted to just put Ni and then have a comment right after that explaining where it came from.

A proper check is going to require a thorough re-reading of the Marques and Buckland chapter, Innes et al paper and some thinking! If we do find any issues here, it'll be important to check what's implemented in Distance::dht2 to see if any changes are needed there also.

@erex
Copy link
Member Author

erex commented Dec 15, 2024

video for debugging session here
https://www.loom.com/share/9d5d4100dde44327a2bb3d29dededa4c?sid=8f2ba2f8-ac9b-465d-9377-8d03f0c92a29

vc1 comes from model parameter estimates, vc2 (heart of question) comes from Poisson $var(\hat{N})=\hat{N}$ but then vc1 is added together with vc2 to compute $SE(\hat{N})$. That doesn't seem right.

Section 4.7.2 of Buckland et al. (2001) discusses using the Poisson for $\widehat{var}(n)=n$, but not for capital N.

@erex
Copy link
Member Author

erex commented Dec 16, 2024

For entertainment, I check what DistWin 8.0 did with the capercaillie data with a single transect.

DistWin provides three ways of dealing with single transects:

Assume var(ER)=0

              Point        Standard    Percent Coef.        95% Percent
  Parameter   Estimate       Error      of Variation     Confidence Interval
  ---------  -----------  -----------  --------------  ----------------------
    D        0.47594E-01  0.41184E-02       8.65      0.40108E-01  0.56479E-01
    N         70.000       6.0572           8.65       59.000       83.000    
  ---------  -----------  -----------  --------------  ----------------------
          Estimate      %CV     df     95% Confidence Interval
                        ------------------------------------------------------
                 n/L    0.46667        0.00     0.00 0.46667      0.46667    

Not the CV(D)=0.1475 provided by ds()

Assume Poisson with overdispersion=0

              Point        Standard    Percent Coef.        95% Percent
  Parameter   Estimate       Error      of Variation     Confidence Interval
  ---------  -----------  -----------  --------------  ----------------------
    D        0.47594E-01  0.60981E-02      12.81      0.36961E-01  0.61287E-01
    N         70.000       8.9688          12.81       54.000       90.000    
  ---------  -----------  -----------  --------------  ----------------------
                        Estimate      %CV     df     95% Confidence Interval
                        ------------------------------------------------------
                 n/L    0.46667        9.45     0.00 0.38793      0.56138    

also not the CV(D)=0.1475 reported by ds

Assume Poisson with some overdispersion parameter, DistWin suggests b=2

              Point        Standard    Percent Coef.        95% Percent
  Parameter   Estimate       Error      of Variation     Confidence Interval
  ---------  -----------  -----------  --------------  ----------------------
    D        0.47594E-01  0.75771E-02      15.92      0.34786E-01  0.65119E-01
    N         70.000       11.144          15.92       51.000       96.000    
  ---------  -----------  -----------  --------------  ----------------------
                         Estimate      %CV     df     95% Confidence Interval
                        ------------------------------------------------------
                 n/L    0.46667       13.36     0.00 0.35955      0.60569    

also not the CV(D) produced by ds

Conclusion

The approach used by dht.se to estimate precision of abundance or density estimates in the situation of no replicate transects does not agree with any of the three approaches to the problem employed by DistWin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants