Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate picks in events from FDSN- (fixed from Post May2023 onward) #105

Open
calum-chamberlain opened this issue Mar 13, 2023 · 24 comments

Comments

@calum-chamberlain
Copy link

Hi all,
I wondered if there was a good reason for duplicate picks appearing in events downloaded from the FDSN service? For instance event 2022p788472 has duplicate picks for multiple stations including duplicate P-picks for BFZ. There are two P-picks sharing the same pick id (smi:nz.org.geonet/20221019.173832.28-AIC-NZ.BFZ.10.HHZ) that, to my eyes, have all the same information.

If there isn't a good reason, would it be possible to fix this?

Thanks,
Calum.

@salichon
Copy link
Contributor

Hello @calum-chamberlain
that s indeed a feature that i noticed a while ago - This is very occasional I reckon and a direct output from seiscomp
I had a few cases and i had no chance to get some clarification around the processing ~eg. been able to reproduce that feature and address it to Gempa. (Glitch over the streams from the station - or from the picking in seiscomp)
I ll resurrect this topic in the forum.

With regards to fixing this the answer is yes and easy, ..although the "process" to perform this
in an operational manner is much less so, for now.

To me this is an example of event level catalogue data curation (FYI @elidana @JonoHanson ) that should be added into the to do list.
We can expect an implementation of an environment that allows this to be performed immediatly in the incoming year.

NB:Please keep on postingevents there to keep on building the case.

thanks a lot
Jerome

@calum-chamberlain
Copy link
Author

This isn't rare at all in my experience. Consider the example below, which assumes that duplicate pick resource ids represent duplicated picks without checking other information explicitly:

from obspy.clients.fdsn import Client
from obspy import UTCDateTime
from collections import Counter

client = Client("GEONET")
cat = client.get_events(starttime=UTCDateTime(2022, 1, 1), endtime=UTCDateTime(2022, 2, 1))

duplicate_pick_count = 0
events_with_duplicates = 0
for ev in cat:
    pid_counts = Counter(p.resource_id for p in ev.picks)
    duplicate_picks = [pid for pid, count in pid_counts.items() if count > 1]
    duplicate_pick_count += len(duplicate_picks)
    if len(duplicate_picks):
        events_with_duplicates += 1

print(f"From {len(cat)} events, {events_with_duplicates} had duplicate picks.")
print(f"Total duplicate picks: {duplicate_pick_count}")

For this month there were 2,101 events returned, of which 1,877 had duplicate picks - there were 15,395 duplicate pick ids. Note that this is not all picks, it isn't obvious from my end which picks are duplicated. I haven't checked extensively whether it is only auto-picks that are duplicated, but so far I haven't seen any manual picks duplicated.

@salichon
Copy link
Contributor

salichon commented Mar 13, 2023

@calum-chamberlain thanks a lot for making the case ...
I went Through this observation and then amend my comments above:
1- This is not the same odd random feature i thought about then where 2 same station picks are set in the seiscompml file solutions within a really short delay (without exact same times)

2- Indeed 2,101 events returned, of which 1,877 had duplicate picks looks definitely more than i was expecting .. see above!!

so...
3 - With regards to #105 (comment)
Event file verification gives:
- for seiscompml (Base/reference event format ): only one BFZ pick
- for fdsnxml (inflight converted event format for webservices): Duplicated BFZ picks (*swear_word here)

e.g

 `
> **Seiscompml event file seiscompml07.s3-website-ap-southeast-2.amazonaws.com/2022p788472.xml**
>  **grep -A3  -E "publicID=\"20221019.173832.28-AIC-NZ.BFZ.10.HHZ" dup2022p788472.xml** 
>     
> <pick publicID="20221019.173832.28-AIC-NZ.BFZ.10.HHZ">
>       <time>
>         <value>2022-10-19T17:38:32.28Z</value>
>       </time>
> 
> 
> **FDSN XML  event File : https://service.geonet.org.nz/fdsnws/event/1/query?eventid=2022p788472** 
>  **grep -A3  -E "publicID=\"smi:nz.org.geonet/20221019.173832.28-AIC-NZ.BFZ.10.HHZ" dup2022p788472_fdsn.xml** 
> 
> 
>       <pick publicID="smi:nz.org.geonet/20221019.173832.28-AIC-NZ.BFZ.10.HHZ">
>         <time>
>           <value>2022-10-19T17:38:32.28Z</value>
>         </time>
> --
>       <pick publicID="smi:nz.org.geonet/20221019.173832.28-AIC-NZ.BFZ.10.HHZ">
>         <time>
>           <value>2022-10-19T17:38:32.28Z</value>
>         </time>
> --

"Arrival" for BFZ is unique in both xml files
NB for clarity ( andother users) Picks =/= Arrivals. Arrivals are used in the event localizations. Picks may be or not associated to that solution

First conclusion/assumption:
- seiscompml to fdsnxml event files conversion is looking buggy

Actions:

  • Replicate with the GeoNet conversion process

  • Confirm the pick hickhock, confirm a Bug and level of it.

  • Remediate

  • Since this is not related to the archived event files (seiscompml ) most likely - this could be fixed "easily" --> bug reports and fix. @calum-chamberlain

FYI @elidana - this is looking like an fdsn event xml service question to address (to me)

dup2022p788472.xml.txt
dup2022p788472_fdsn.xml.txt

@salichon
Copy link
Contributor

This may relate to thhis : https://github.com/GeoNet/fdsn/tree/main/cmd/fdsn-quake-consumer

@elidana
Copy link
Contributor

elidana commented Mar 14, 2023

@calum-chamberlain thanks a lot from my end as well for raising this.
And good catch @salichon about the issue being only on the FDSN service side of things! Definitively agree that this is reassuring.
@calum-chamberlain , have you noticed this issue only recently or is it something you have encountered since a while?
We have upgraded the event service about a month ago (main change from the user perspective was the addition of the event_type), so trying to pinpoint the timeframe might help with troubleshooting.

@calum-chamberlain
Copy link
Author

@elidana I am pretty sure this is a recent thing, probably in the one-month timeframe, but I'm no certain. I haven't had to cope with it until this year at least.

@elidana
Copy link
Contributor

elidana commented Mar 14, 2023

thanks @calum-chamberlain , that's very useful to know!

@junghao
Copy link

junghao commented Mar 15, 2023

@salichon @elidana
The QuakeMLs are generated by tool xsltproc (http://xmlsoft.org/xslt/xsltproc.html), not much about our own code.

You can try install xsltproc in your computer and run the command:

xsltproc sc3ml_0.11__quakeml_1.2.xsl 2022p788472.xml 

where the sc3ml_0.11__quakeml_1.2.xsl is provided by SeisComP3 here https://github.com/SeisComP3/seiscomp3/blob/master/src/trunk/libs/xml/0.11/sc3ml_0.11__quakeml_1.2.xsl

Have tested the command above and got the same result (having "duplicated pick").
Not sure if it's due to a bug (of xsltproc or the xsl file) , or it's the source data (sc3ml) makes it.

I guess we can file a bug to SeisCompP3 but seems need some elaboration about the event so probably not what I can do?

@salichon
Copy link
Contributor

salichon commented Mar 15, 2023

super @junghao thanks for that
note1 : This repository has been archived by the owner on Oct 14, 2022. It is now read-only.
we now shall point to https://github.com/SeisComP/common/tree/master/libs/xml/0.11 as point of reference and currently maintained repository.
and more generally to https://github.com/SeisComP/common/tree/master/libs/xml (0.12 is on the horizon .. :)

@junghao
Copy link

junghao commented Mar 15, 2023

Thanks @salichon .

Based on this PR SeisComP/common@132fc95#diff-e60a95631541fd9f0ff3a585e41fc7d493aec710d6f75c3e0edc3d7891d5ff71R262 , not sure if the comment (we exclude picks already referenced in amplitudes) is our case? (If yes then seems it didn't fix)

@salichon
Copy link
Contributor

salichon commented Mar 15, 2023

@junghao I confirm the conversion prioducing the same output thanks
the only diffrence so far (with xsltproc) is the "agency name" from the repo it is "org.gfz-potsdam.de/geofon/"
so we would have to use option ""-stringparam ID_PREFIX smi:nz.org.geonet" instead i suppose.

the event 2022p pick id output for 2022p788472
gives 40 picks ID (composed of duplicated ones such BFZ) as opposed to the original 29 picks

Duplication as shown in that screenshot
image

This shows a "sort of" random duplication

PickID files qml1.txt for quakeml and scml1.txt for original are attached

qml1.txt
scml1.txt
early comments: What the damned!

@salichon
Copy link
Contributor

  • Tried version 0.10 of the XSL same output.

Can we confirm that s a xsl template bug ?

@junghao
Copy link

junghao commented Mar 15, 2023

Also used xalan with same xsl template to do the transform, still got the duplicated picks.
Pretty sure it's an xml template bug.

@salichon
Copy link
Contributor

salichon commented Mar 15, 2023

@junghao
Now

  • seiscomp tools sccnv as in https://www.seiscomp.de/doc/apps/sccnv.html and its utility to convert into the qml by default
    output is correct from the seiscompml conversion

  • Conversion using "similar" but different SC3ML 0.12 to QuakeML 1.2 stylesheet converter (Real-Time version)
    xsltproc -v -o test0.12-RT sc3ml_0.12__quakeml_1.2-RT.xsl dup2022p788472_0.12.xml

output is correct from the seiscompml conversion

  • Would you confirm?: Yup confirmed (less picks)

See attached files
test0.12-RT.txt
sc3ml_0.11__quakeml_1.2-RT.xsl.txt

so questions:

  • Are we using the correct style sheet converter (0.11-quakeml1.2.xsl Vs 0.11-quakeml1.2-RT.xsl (2023)) ?
  • Are the Non-RT style sheets sc3ml_0.*__quakeml_1.2.xsl containing ... a bug or a feature .... ?

Ref source: https://github.com/SeisComP/common/tree/master/libs/xml

Now provided some additional context given the questions above

@salichon
Copy link
Contributor

salichon commented Mar 22, 2023

Hi @junghao - No feed back from Gempa/users through the community channel yet
so:
assumption from the docs (https://www.seiscomp.de/doc/apps/sccnv.html) :

  • GeoNet might use ther RT scheme as in
    $ xsltproc -o quakeml.xml $SEISCOMP_ROOT/share/xml/0.12/sc3ml_0.12__quakeml_1.2-RT.xsl scml.xml
    as opposed to the one used currently

I ll confirm with Stephan et al. to inform this and get a solid answer
cheers
j

@salichon
Copy link
Contributor

@calum-chamberlain @junghao
Stephan is working on a solution

Picks are included in the resulting QuakeML file if they are either referenced by an Arrival or by an Amplitude. 
The XSLT already handles the case were the same Pick is referenced by an Amplitude and an Arrival, 
however it falls short in case the same Pick is referenced by different Amplitudes.
The RT version of the XSLT does not produce duplicated picks because in QuakeML-RT,
similar to SeisComP, Picks are top level elements independent of Events.
In QuakeML (non-RT) the Picks must be moved below the Event element 
and since the SeisComPML may contain multiple Events, references to the Picks via Event/OriginReference
and corresponding Origin/Arrival and Origin/StationMagnitude/Amplitude must be evaluated. 

I’ll try to improve the converter in this regards.

The QuakeML-RT converter is no appropriate solution for your use case since the FDSNWS event standard dictates QuakeML (non-RT).
  • Note: RT-quakeml xslt is not appropriate for our needs.

@salichon
Copy link
Contributor

salichon commented Mar 26, 2023

Now this will rely on acceptance and deployment of the xml Style sheet onto seiscomp sources and Geonet services

  • Apply fix when deployed and close

@salichon
Copy link
Contributor

salichon commented Apr 3, 2023

@calum-chamberlain @junghao
the duplication issue is resolved with an update of the CSS xml templates: https://github.com/SeisComP/common/tree/master/libs/xml

This style sheet xml fix requires to be propagated to GeoNet services
to solve for the FDSN geonet event service.

  • update in process (Aprl2023) - polishing in action before deploy

@junghao
Copy link

junghao commented Apr 3, 2023

@salichon They have sc3ml_0.12 files, do we want to add them to our FDSN as well?

@salichon
Copy link
Contributor

salichon commented Apr 4, 2023

At the moment we re about to go 4 and expectedly pretty quickly to the above versions i reckon @junghao

@junghao junghao self-assigned this Apr 5, 2023
@salichon
Copy link
Contributor

salichon commented May 15, 2023

Kia Ora @junghao
thanks this most likely can get a closure soon as deployed (May need more testing ?)
thanks a lot and sorry for the delay

  • Test on event 2022p788472

  • Dev version uploaded pick info extracted and compared to original
    dev.txt
    ori.txt

  • Original version has duplicated picks

  • Dev version has unique pick information consistent with original but no duplication of the information

Verdict: Dev looks good

@salichon
Copy link
Contributor

salichon commented May 17, 2023

@calum-chamberlain Kia Ora
The updated template style was deployed - we will monitor this along in the next days

  • The fix will apply to any newly generated events (quakexml) for the moment

  • Will be performed over the older-than-May2023-quakexml events later - Lower priority atm

Please keep us informed if this is going okay for you - and/or any feed back

Upon happiness level reached we ll close that ticket

@salichon salichon changed the title Duplicate picks in events from FDSN Duplicate picks in events from FDSN- Post May2023 Jun 1, 2023
@salichon salichon changed the title Duplicate picks in events from FDSN- Post May2023 Duplicate picks in events from FDSN- ficxed from Post May2023 Jun 1, 2023
@salichon salichon added the Solved label Jun 1, 2023
@salichon salichon changed the title Duplicate picks in events from FDSN- ficxed from Post May2023 Duplicate picks in events from FDSN- fixed from Post May2023 Jun 1, 2023
@salichon salichon changed the title Duplicate picks in events from FDSN- fixed from Post May2023 Duplicate picks in events from FDSN- (fixed from Post May2023 onward) Jun 1, 2023
@elidana
Copy link
Contributor

elidana commented Oct 18, 2023

enough time has passed, and looks like the issue is now fixed. So I think that happiness level is now reached

closing this, @salichon please reopen if that's not the case!

@elidana elidana closed this as completed Oct 18, 2023
@salichon salichon reopened this Oct 19, 2023
@salichon
Copy link
Contributor

salichon commented Oct 19, 2023

@elidana this is not resolved entirely
If the problem is fixed the Quakexml datbase is required to be recomputed to "fix" event
XML content prior to May 2023.
afaik ALL xsl templates were corrected with that patch.
:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants