Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parametric bootstrapping for censored data #378

Open
joethorley opened this issue Sep 23, 2024 · 5 comments
Open

Implement parametric bootstrapping for censored data #378

joethorley opened this issue Sep 23, 2024 · 5 comments

Comments

@joethorley
Copy link
Collaborator

Currently can only be performed using parametric = FALSE

@Zhenglei-BCS
Copy link

Can ssdtools handle censored data currently?

@joethorley
Copy link
Collaborator Author

In short it can handle censored left or interval censored data if the distributions have the same number of parameters and non-parametric bootstrapping is used.

From the NEWS.md

ssdtools 2.0.0

Finally, with censored data confidence intervals can now only be estimated by non-parametric bootstrapping as the methods of parametrically bootstrapping censored data require review.

ssdtools 1.0.0

Censored Data

Censoring can now be specified by providing a data set with one or more rows that have

  • a finite value for the left column that is smaller than the finite value in the right column (interval censored)
  • a zero or missing value for the left column and a finite value for the right column (left censored)

It is currently not possible to fit distributions to data sets that have

  • a infinite or missing value for the right column and a finite value for the left column (right censored)

Rows that have a zero or missing value for the left column and an infinite or missing value for the right column (fully censored) are uninformative and will result in an error.

Akaike Weights

For uncensored data, Akaike Weights are calculated using AICc (which corrects for small sample size).
In the case of censored data, Akaike Weights are calculated using AIC (as the sample size cannot be estimated) but only if all the distributions have the same number of parameters (to ensure the weights are valid).

@Zhenglei-BCS
Copy link

Thanks for pointing me to the information.

In our ecotoxicological studies we often encounter endpoints that exceed the highest tested concentration, which means the tested species is not sensitive to the test item.It has been recommended to include these censored values in the SSD analysis, following the approach outlined in http://arxiv.org/abs/1311.5772. This method essentially extends a maximum likelihood approach by incorporating ( P(X > C) ) into the objective function.

However, I found it very puzzling because in the extreme case, including a very insensitive species could misleadingly suggest the presence of a very sensitive species, given that distributions like the lognormal are symmetric after taking the logarithm. This is counter-intuitive and could potentially skew the results.

I will certainly need to read how it is handled in ssdtools. I would appreciated any insights or clarifications.

@joethorley
Copy link
Collaborator Author

What you describe is right censoring and it is not yet implemented in ssdtools. And yes you simply give it the information that the concentrations is greater than a particular value. I'm not sure why you think that including a very insensitive species could misleadingly suggest the presence of a very sensitive species?

@Zhenglei-BCS
Copy link

What you describe is right censoring and it is not yet implemented in ssdtools. And yes you simply give it the information that the concentrations is greater than a particular value. I'm not sure why you think that including a very insensitive species could misleadingly suggest the presence of a very sensitive species?

Sorry for the late reply. I got that impression from my experience and from the MOSAIC tool, where they used a maximum likelihood approach to estimate the parameters for the lognormal and log-logistic distributions. Including a distinctively larger than value into the data will result in a much smaller HC5 compared to simply leaving it out, I suspect this occurs because MLE is also sensitive to outliers. I haven't delved deeper into this yet but I will provide a reproducible example later when I have the opportunity to revisit this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants