Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FDSN Request: provide metadata for stations even when they do not have complete metadata. #103

Closed
calum-chamberlain opened this issue Jul 11, 2022 · 27 comments

Comments

@calum-chamberlain
Copy link

When downloading station metadata from the FDSN webservice stations that do not have information at "channel" or "response" level have no information returned for them with level=channel, but they do return (basic) information when requests are made with level=station. In the case of station WTSZ the query:
https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=channel&format=text
returns nothing, but the query:
https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=station&format=text
returns the station location.

It would be helpful (to me) if the basic station information was returned for all stations, regardless of whether their metadata are complete. I appreciate that this may not be what everyone wants, so if there is good reason not to do this, or if this goes against FDSN protocol then I'm fine with it not changing, but wanted to at least post this somewhere so that others might find this before thinking that there were fewer stations active at a given time.

In my case, I query the station service to work out what stations are active, then look up the waveforms for those stations. I think that this is common practice (and is done by the Obspy FDSN massdownloader) so it might help to provide all the metadata that are available, even if those metadata are incomplete.

@mnaguit
Copy link

mnaguit commented Jul 13, 2022

Hi @calum-chamberlain
Thanks for posting this query. Indeed, it is helpful for other end users working on the same scenario to receive more info on this. We will check the metadata for WTSZ (or other stations of the same case) and will provide further details soon.

FYI @salichon

@calum-chamberlain
Copy link
Author

WTSZ may not be the best station to worry about due to #101 - but it would be worth checking which stations are missing in this query (channel level):
https://service.geonet.org.nz/fdsnws/station/1/query?station=*&level=channel&format=text
vs this query (station level):
https://service.geonet.org.nz/fdsnws/station/1/query?station=*&level=station&format=text

I noticed this particularly for stations that have a starttime before their earliest channel starttime, but when data are available.

@salichon
Copy link
Contributor

salichon commented Jul 13, 2022

Hello @calum-chamberlain!
thanks @mnaguit

  • 1 about incomplete/partial information provided by the FDSN station service
    this service to me is the front-end of the Delta metadata GeoNet public repository (https://github.com/GeoNet/delta)

The building of the FDSN station service relies on process that are building the xml information from the delta repo and compliant to the stationxml format.
As a consequence any partial or missing bit of information will enable a certain level of service through the service to none.
(that what we always check at the end over our instrument change procedure since 2016/2017)

so to your idea "It would be helpful (to me) if the basic station information was returned for all stations, regardless of whether their metadata are complete"
is possible yeap without too much of a pain on the delta git repo but Not over a downstream service with standards. -to my knowledge-

  • 2 WTSZ as refered into Stations WQSZ WMSZ WPSZ WDSZ WTSZ - Alpine Temp experiment - data metadata hosted by IRIS/Geonet -actions  #101 was entered as temporary instrumentation at the time 2015 and its metadata not maintained as for a National/regional permanent station: responsabilties got lost and information forgotten over time provided its temporary and non GeoNet status.
    For instance WTSZ is/was seismic site never closed (end most likely 2017) and instrument response never actually finalised
    I have been working on that family of mount/instrumentation to at least distribute this borehole instrument response.
    This needs to be bound to the datalogger existing a s a combo (Reftek) i think

So solutions exist. :) though require some proper work to be adequate and durable.
This is a great if we can progress on that legacy area of temporary and exp stations.

..... As a conclusion (sorry for the length )

Would you detail in that ticket too what would be according to you the minimum basic information required ?

cheers
regards

FYI @JonoHanson @ozym @staylorofford

@calum-chamberlain
Copy link
Author

Thanks Jerome, I don't follow all of that, and I think that the WTSZ/Whataroa things are for a different issue.

Just to be clear, I'm not asking for all the metadata to be complete - I get that for the legacy data that is likely impossible. What I would like is for a request like:
https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=channel&format=text

where there are no channel or response level metadata to return the station information that is available, e.g.:

#Network | Station | Location | Channel | Latitude | Longitude | Elevation | Depth | Azimuth | Dip | SensorDescription | Scale | ScaleFreq | ScaleUnits | SampleRate | StartTime | EndTime
NZ|WTSZ|||-43.302000|170.412000|100.000000||||||||2014-06-23T00:00:05|

and similar for the station xml: For example it would be good (in my mind) if the following two calls returned the same inventory for stations without channel level metadata:

from obspy.clients.fdsn import Client

client = Client("GEONET")
kwargs = dict(
    station="WTSZ", network="NZ")
inv = client.get_stations(level="station", **kwargs)

inv_channel = client.get_stations(level="channel", **kwargs)

assert {sta.code for net in inv for sta in net} == {sta.code for net in inv_channel for sta in net}

@salichon
Copy link
Contributor

salichon commented Jul 14, 2022

@CallumNZ
Okay got it -
In that case it smore related to the specs of the stationxml service, and what "https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=channel&format=text" returns as a minimum- info over a query ..right right ... instead of a blank or else. ...
Something to check with the sys dev team that is managing the web parts of the service
...If i got it clear ;)

btw :
I had a closer look to WTSZ tmp
for that case it is also missing the equivallent of the naming of the streams in delta .
So it seems legit that is not provided.
( hence blocking part of the (incomplete) information.)
can be modified on that case. (until further work)

@salichon
Copy link
Contributor

salichon commented Jul 14, 2022

Hello @sue-h-gns @junghao

FDSN station service query output Question

Does the fdsn station service query mechanism allow for returning information of higher level when the lower level query is "empty" ?
eg. :
level channel (low/more detailed)
"https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=channel&format=text
vs
level station (high/less detailed)
""https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=station&format=text"

Is it limited by the specs of the service ?

thank you cheers!

@calum-chamberlain
Copy link
Author

@salichon I think you are tagging the wrong Calum.

@junghao
Copy link

junghao commented Jul 14, 2022

The FDSN specification didn't mention about this situation.
FDSN spec only defines HTTP status 204 as "Request was properly formatted and submitted but no data matches the selection".

I met this dilemma while developing the station service - when there's no channel, should we respond 204 frankly, or to respond higher level of metadata? I chose the former. The idea was to let the client (supposed to be some application) knows there's nothing there, instead of giving providing a response and let it do the parsing then figure out the truth.

Also, not sure if all clients (still, applications) can cope with empty channel names when requesting level=channel.

However, since it's not defined well, so we can discuss what would benefit users most.

@calum-chamberlain
Copy link
Author

Thanks for that @junghao - my understanding (from the seismology side rather than how the data are handled in the back) is that I would expect that if some data (e.g. station information) but not all (e.g. no channel information) data were available, then those data would be returned. I get that that is not how "data" is defined for the backend, but with the heirachical structure of station-xml it makes sense (in my opinion) to revert to returning all metadata that match <= level requested (e.g. level=channel should return channel and station).

But that is just one biased opinion. I don't know how other organisations handle this, or what other seismologists think of this. I imagine @SquirrelKnight might have an opinion on this, as might others. Happy to ask around for opinions if it would help?

@ozym
Copy link

ozym commented Jul 14, 2022

The problem I see is that the "Channel" level of the stationxml schema http://docs.fdsn.org/projects/stationxml/en/latest/reference.html#channel has the Code as required. When there is no code available this level can't be formed.

So the question is, does a service requesting data at the channel level get back what it would get if asking for the station level, or nothing (as is the case now).

Sort of saying give me everything down to the channel level and those that don't have channel info, then just match to the station level etc.

But, this then has implications for wild-carding. Do you return station information that doesn't have channels that match the wildcards or do you skip those stations?

Or do you treat the case of no wildcards given as a special case.

@calum-chamberlain
Copy link
Author

Good point on wildcarding - in my opinion the current option (not returning metadata for stations that do not have channel metadata even though there are data that might match the requested channel) is worse than returning station metadata that may not match the requested channel. That (biased) opinion is based on having ignored relevant stations in my research because I did not know that they were missing channel metadata and were not included in the stationxml because of this.

@junghao
Copy link

junghao commented Jul 15, 2022

When query for channel level, the output fields should be regarding channels' information, thus the expected output would be ambiguous:

#Network | Station | Location | Channel | Latitude | Longitude | Elevation | Depth | Azimuth | Dip | SensorDescription | Scale | ScaleFreq | ScaleUnits | SampleRate | StartTime | EndTime
NZ|WTSZ|||-43.302000|170.412000|100.000000||||||||2014-06-23T00:00:05|

The field latitude/longitude/time is supposed to reflecting the channel's metadata, not the station. When there's no channel, they should be empty.

So if we're going to respond with common metadata, then the output would be

#Network | Station | Location | Channel | Latitude | Longitude | Elevation | Depth | Azimuth | Dip | SensorDescription | Scale | ScaleFreq | ScaleUnits | SampleRate | StartTime | EndTime
NZ|WTSZ||||||||||||||

@calum-chamberlain
Copy link
Author

Good point @junghao - I mostly care about the stationxml returned rather than the text output, which should contain the station location. Nevertheless, returning just the network and station for text would be helpful. It might help users who use the text output to fill the fields that are unknown with "unknown"? Although that would be a clear change that would affect other things and might break other peoples code/work.

@calum-chamberlain
Copy link
Author

In this issue it was pointed out that the preferred response for empty meatdata is suggested in the FDSN spec:

In cases where the response is unknown, for example really old channels, or where a response is not applicable, like textual log channels, it is preferred that an empty response element be used, , to positively indicate that no response exists.

@salichon
Copy link
Contributor

salichon commented Aug 2, 2022

@junghao @ozym what do you think about this fdsn spec ^^ suggestion <response\><response> or else empty when not applicable (it might have to be limited to very specific elements such as the response one)

@ozym
Copy link

ozym commented Aug 2, 2022

That's fine for stationxml but it won't help when the text format is being used (as discussed above).

@ozym
Copy link

ozym commented Aug 2, 2022

There is still something odd with the input/output of stationxml I think it's the requirement that there be at least one stream attached to the site.

@calum-chamberlain
Copy link
Author

Yes channel is required, and channel requires latitude, longitude, elevation and depth, but each of these attributes can also be empty if they are unknown, which I assume is the issue here?

Agreed that it won't help with the text output, but this should just be consistent with the stationxml format (so empty everything except network code and station code as @junghao suggested?).

@ozym
Copy link

ozym commented Aug 2, 2022

The problem is that the channel needs a code, i.e. "HHZ" etc. which is the bit missing, we generally know all the rest. So this issue will not be so much about the response, but knowing what was recorded.

@ozym
Copy link

ozym commented Aug 2, 2022

But in the way the system is written, even if the code is given it will then lookup a response and skip the channel if it can't find one. So this is likely to be an area which can be improved now.

@calum-chamberlain
Copy link
Author

calum-chamberlain commented Aug 2, 2022

The problem is that the channel needs a code, i.e. "HHZ" etc. which is the bit missing, we generally know all the rest.

But don't you have this information in the waveforms, along with the location code? Apologies if I'm missing something else there and being naive!

@ozym
Copy link

ozym commented Aug 2, 2022

The issue is that they are disconnected. There is no list of waveforms, just a list of sensors, a list of dataloggers, and a list of times. They need to be joined together to essentially predict what the channel codes will be, this is where the "response" element comes in. It says something like "a broadband sensor will have 3 components called Z N E or whatever), then there will be something else that says this instrument records a 100 Hz stream, which has a sensor attached to it and because it's a broadband it will be called HH . So this makes up the HHZ etc. However, if there's a bit missing (due to not knowing the sensor or datalogger types) then the join doesn't happen and it looks like there's no channel available.

@ozym
Copy link

ozym commented Aug 2, 2022

I think in some ways the hold up may be more along the lines, of "we don't know the full response so we're not going to even start the process" rather than saying, we know enough to at least determine what the code will be and just give an empty response (as suggested above).

@ozym
Copy link

ozym commented Aug 2, 2022

I've been slowly working on a rewrite of the backend code, this scenario will be much easier to handle as in the current system there are some hidden assumptions and logic steps.

@salichon
Copy link
Contributor

salichon commented Feb 12, 2023

  • (update) Some Work in Progress ...

@junghao
Copy link

junghao commented Mar 15, 2023

The update has deployed.

$ curl "https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=channel&format=text"
#Network | Station | Location | Channel | Latitude | Longitude | Elevation | Depth | Azimuth | Dip | SensorDescription | Scale | ScaleFreq | ScaleUnits | SampleRate | StartTime | EndTime
NZ|WTSZ||||||

@elidana
Copy link
Contributor

elidana commented Oct 18, 2023

Hi @calum-chamberlain , this should have been fixed back in March when we applied some improvements to the StationXML service.
I'm closing this, but please reopen if you are still having issues!

@elidana elidana closed this as completed Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants