-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dealing with pathogen nondetects #356
Comments
Are you thinking this should be a separate function or an additional argument in the simple censored data function? I'm interested in working on this issue. |
Great! I think it should go into the simple censored data function. However this would be the only exception to the way all other censored data is handled in that function, so we will need to have good documentation of this nuance for bacteria in the Shiny app and function itself. |
Good catch! Is the one with the space coming from NWIS (see provider field)? The WQX version is "MPN/100ML" so lets go with that for the harmonization. I think the best place to address this would be in the TADA_ConvertResultUnits function that is run automatically in autoclean. I think we can add it to the USGS_units_speciation.csv. Can you share the specific query where you found this for testing?
|
Here is the Fecal Coliform data set I downloaded for testing and where I observed both "MPN/100ML" and "MPN/100 ML" as units. I'll take a look at the TADA_ConvertResultsUnits function and see if I can standardize it to the WQX version
I took a closer look into the "Coerced to NA" Fecal Coliform results that I mentioned during the Team Meeting. Specifically, I looked at those from the sample data set.
I found that all had the same OrganizationFormalName, "Maryland Dept. of the Environment Shellfish Data". The original ResultMeasureValue for all of these entries was "." and they were coerced to NA in TADA_ConvertSpecialChars() but are flagged as "Uncensored" in TADA.CensoredData.Flag. Currently, I have the coerced to NA pathogen results also receiving a TADA.MeasureResultValue of 1 (default) or the user specified value. But I can remove this if it confusing or incorrect to have a censored data function acting on non-censored data. Additionally, there are also TADA.ResultMeasureValues of -99, -1, and 0 included in the test data set. These will also cause problems for calculation of geometric means. I can add these situations (ResultMeasureValue of equal to or less than 0) to the function too, but this is another instance of the censored data function acting on non-censored data. What are your thoughts? Should I continue developing this as part of the TADA_SimpleCensoredMethod function? Or would it make more sense to create a separate function given that there are non-censored data which will require substitute values in order for users to calculate geometric means? |
Excellent! Let me know if you run into any questions or issues with the "MPN/100 ML" to "MPN/100ML" unit synonym conversion. Regarding the issue with Maryland data with the result value "." being coerced to NA... I looked at the data and don't see any metadata included that we could use to identify that data as censored, or that indicated how the result value "." should be interpreted. Currently, all non-numeric results like this that cannot be easily interpreted or wrangled end up getting removed. My initial thought is that the censored data functions should only handle the data we are able to ID as censored data based on the metadata provided in the Detection Limit Type Name, Result Detection Condition Text, and possibly the MeasureQualifierCode (other issue your working on). I don't think the censored data function should act on non-censored data. Would make more sense to include substitutions for negative values within a new geometric mean calculator function? |
I like the idea of a geometric mean calculator function. That would be an appropriate place to substitute negative or zero values. Do you think TADA.SimpleCensoredMethods should handle substitutions for non-detect pathogen data with additional substitutions for non-censored data in the geometric mean function? Or should all substitutions happen in the geometric mean function? |
Yes I think TADA.SimpleCensoredMethods should handle substitutions for non-detect pathogen data, with additional substitutions for non-censored data in the geometric mean function. I think most states do calculate a geometric mean for bacteria, but many do not (https://usepa.sharepoint.com/:x:/r/sites/WQPDataAssessmentTeam/_layouts/15/Doc.aspx?sourcedoc=%7B49ACC717-A45E-41F0-9F3F-00BAE8743D81%7D&file=Summary%20of%20CALMs.xlsx&action=default&mobileredirect=true). I recommend the separating these two because the censored data substitutions should be included as a pre-requisite for any assessment methodology (any criteria/methodology), but the substitutions for negative values may only be needed if the assessment method includes calculation of the geomean. Some E. coli, enterococci, and fecal coliform assessment criteria and methods simply use a "not to exceed x percent of the time" methodology (e.g. AZ: fecal coliform not to exceed 400 cfu/100mL more than 25% of the time from may-sept with a min of 8 samples evenly distributed across the season). In that case, the TADA user would still need to run the censored data substitutions function, but wouldn't need the neg value substitutions from the geometric mean function. |
I updated TADA.SimpleCensoredMethods to include only non-detect pathogen data. I think the next step is to figure out the list of characteristics to include by discussing with the work group. |
Is your feature request related to a problem? Please describe.
TADA's simple censored data function does not allow for characteristic specific non-detect handling, e.g., bacteria. Regardless of the method chosen and applied for all other characteristics, should bacteria (e.g. e. coli) always default to a whole number close to zero such as 1-2 Colony Forming Units (CFU) or Most Probable Number (MPN) per 100 mL? Does applying a single method across all characteristics cause issues for any other characteristics/characteristic groups in addition to bacteria?
Here are the notes from the meeting in Jan 2021 where we discussed nondetect handling methods:
Subgroup A
IssuePaper_RetrievalQAQC_Jan2021.docx
There is some more information in the TADA Master List of Requirements (starting on pg 12).
“Substituted values can include the reporting limit or a very low pathogen count (1 to 2 Colony Forming Units (CFU) or Most Probable Number (MPN) per 100 mL). A concentration of 0 CFU or MPN per 100 mL is not typically used because the summary statistic used for pathogens (geometric mean) cannot be calculated when the dataset includes zeros.”
TADA Master List of Requirements.docx
The text was updated successfully, but these errors were encountered: