Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upper and Lower Threshold Flag Functions/Reference File Issues #364

Open
cristinamullin opened this issue Nov 22, 2023 · 8 comments
Open
Assignees
Labels

Comments

@cristinamullin
Copy link
Collaborator

cristinamullin commented Nov 22, 2023

Describe the bug

Upper Sioux in MN (organization ID: USIOUX_WQX) has examples of negative values for depth and pH, extreme high and low (0) conductivity and dissolved oxygen, extreme high sodium over 1,000,000 mg/L. We also found commas in numbers and multiple decimal points which may be the ones the TADA Shiny app picked up as non-numbers.

The extreme high and low values are not currently being picked up on the flagging page of the shiny app.

To Reproduce

Query:
start date: 2021-11-22
end date: 2023-11-22
organization ID: USIOUX_WQX

Expected behavior

These issues should be flagged as part of the WQX upper and lower threshold checks. Review functions:

TADA_FlagAboveThreshold()
Check Result Value Against WQX Upper Threshold
TADA_FlagBelowThreshold()
Check Result Value Against WQX Lower Threshold

Review reference table:
TADA_GetWQXCharValRef()
WQX QAQC Characteristic Validation Reference Table

@cristinamullin
Copy link
Collaborator Author

possibly related to : #265

@cristinamullin
Copy link
Collaborator Author

Notes from 5/14

A while back Jonathan and I had talked to you about some of the data base issues and we went over an example from Upper Sioux on how some extreme outliers were getting into the WQX/Portal. I know these things are works in progress but I was wondering how the effort to correct some of the issues with the review process, like we saw with the Upper Sioux data, has been going. I believe Adam had mentioned some sort of notice that goes back to the entity that entered the data when there are potential errors detected and that notice system was going to be expanded to maybe folks in the Region etc. Not sure if that has come about yet. One of the reasons I ask is because we just went through another round of extreme data entries with the same Tribe. Battery data entered for pH including negatives and millivolts, percent saturation in the 1,000s etc.

Ed


Ultimately, the data owner (tribe) must update the data in WQX to fix these issues – we do not edit the data for the submitting organization.

When organizations submit data to WQX, they do currently receive a QAQC report (on both locations and results) that should be flagging these extreme outliers. However, I just reviewed the ranges we had for pH and DO and made improvements – some of the ranges for pH were not right in our WQX validation table. From WQX, ideally the data submitter would review and correct these kinds of issues right after reviewing the QAQC report. However, the data can still be uploaded even if data is flagged by WQX – meaning the WQX QAQC service is a recommendation but if a user chooses to ignore it, they can still upload the data with these issues. Do you know if the tribe is submitting their own data to WQX or if they are working with a contractor?

I am not aware of efforts to auto-share the WQX QAQC reports with the regions (assuming that would be for state and tribal organizations only, not all orgs in the region). Adam, can you clarify?

In TADA, we leverage the WQX QAQC service and can help flag these extreme outliers and invalid units as well. However, the tribe would still need to go back into WQX to make changes to the data. TADA cannot make any changes to data in WQX, though the reports from TADA should make it easier to find issues. For TADA, I’ve documented these examples here (#364) and we will take a closer look to make sure the functions (which leverage the same validation tables as WQX) are catching these issues. This is a good test case for us to use to make sure our functions are working appropriately.

Cristina

@cristinamullin
Copy link
Collaborator Author

We are planning to make the following edits to the WQX validation table to address this issue:

• The current upper threshold for Sodium (10,467.75 mg/L) is reasonable, but as is it would flag valid sodium values in coastal waters and the Great Salt Lake. A quick google suggests the upper range for seawater is 15,850 milligrams per liter (mg/L). We will update the upper threshold to 15,850.

• The current upper threshold for DO mg/L is too high. 20 mg/L would be a good upper limit for Dissolved oxygen (DO) in water. The equivalent Dissolved oxygen saturation upper threshold would be ~242% at STP. We will update both.

• I agree Depth and Temperature, water could be negative, and we don’t want to flag values that are reasonable (e.g., cold freshwater stream T can be lower than 0 C). Ocean water T could go as low as -1 or -2 C so maybe we want to use -2 C (lower threshold for water T)? For depth, I am not sure since this depends on the starting reference point or 0. For now, we could start with -1 (lower threshold for depth) which would cover the ones in this example dataset, and then if issues come up in the future with other datasets, we can always re-adjust.

• Agree, we can keep the current lower thresholds of zero in WQX for now for these characteristics: “Chlorophyll a (probe)”, “Chlorophyll a (probe) concentration, Cyanobacteria (bluegreen)”, “pH”, and “Turbidity”.

@cristinamullin
Copy link
Collaborator Author

Here are some other common WQX characteristic and unit combinations – I think it would be helpful to review and confirm the ranges and validation table flags are appropriate for these as well:
https://github.com/USEPA/TADA/blob/develop/inst/extdata/TADAPriorityCharUnitRef.csv

@cefergus
Copy link
Collaborator

I updated the WQX QAQC Characteristic Validation Table (on CDX) with the following changes: Sodium (WATER) now has an upper threshold of 15850000 ug/L; DO (WATER) has a more conservative upper threshold of 20,000 ug/L; DO % saturation (WATER) has an upper threshold of 242% to match the upper DO concentration; and Temperature (WATER) has a lower threshold of -2 degC.

@cefergus
Copy link
Collaborator

I'm working through the priority characteristic/unit combos in https://github.com/USEPA/EPATADA/blob/develop/inst/extdata/TADAPriorityCharUnitRef.csv to evaluate range of values in the WQX QAQCValidation table. Suggesting other minimum and maximum values where appropriate and documenting reasoning in an excel spreadsheet. Need to figure out how best to share with the group for review...

@cristinamullin
Copy link
Collaborator Author

Hi Emi, I think email would be easiest (to both TADA and WQX teams) and then if further discussion is needed the weekly WQX meeting would be a good place for that, since the ranges will ultimately be updated in the WQX domain table (you or I can implement the range updates via WQX web): QAQCCharacteristicValidation (ZIP) | (XML) | (CSV)

Thanks for reviewing these!

@hillarymarler
Copy link
Collaborator

In working on #397 I noticed some Characteristics have Maximum values listed, but no value units. In many of these cases, there is also a row for the same Characteristic with not Max/Min and no value units. Is this something that should be considered as part of this issue?

@hillarymarler hillarymarler linked a pull request Nov 7, 2024 that will close this issue
@hillarymarler hillarymarler removed a link to a pull request Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants