-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FDSN Request: provide metadata for stations even when they do not have complete metadata. #103
Comments
Hi @calum-chamberlain FYI @salichon |
WTSZ may not be the best station to worry about due to #101 - but it would be worth checking which stations are missing in this query (channel level): I noticed this particularly for stations that have a starttime before their earliest channel starttime, but when data are available. |
Hello @calum-chamberlain!
The building of the FDSN station service relies on process that are building the xml information from the delta repo and compliant to the stationxml format. so to your idea "It would be helpful (to me) if the basic station information was returned for all stations, regardless of whether their metadata are complete"
So solutions exist. :) though require some proper work to be adequate and durable.
..... As a conclusion (sorry for the length ) Would you detail in that ticket too what would be according to you the minimum basic information required ? cheers |
Thanks Jerome, I don't follow all of that, and I think that the WTSZ/Whataroa things are for a different issue. Just to be clear, I'm not asking for all the metadata to be complete - I get that for the legacy data that is likely impossible. What I would like is for a request like: where there are no channel or response level metadata to return the station information that is available, e.g.:
and similar for the station xml: For example it would be good (in my mind) if the following two calls returned the same inventory for stations without channel level metadata: from obspy.clients.fdsn import Client
client = Client("GEONET")
kwargs = dict(
station="WTSZ", network="NZ")
inv = client.get_stations(level="station", **kwargs)
inv_channel = client.get_stations(level="channel", **kwargs)
assert {sta.code for net in inv for sta in net} == {sta.code for net in inv_channel for sta in net} |
@CallumNZ btw : |
Hello @sue-h-gns @junghao FDSN station service query output Question Does the fdsn station service query mechanism allow for returning information of higher level when the lower level query is "empty" ? Is it limited by the specs of the service ? thank you cheers! |
@salichon I think you are tagging the wrong Calum. |
The FDSN specification didn't mention about this situation. I met this dilemma while developing the station service - when there's no channel, should we respond 204 frankly, or to respond higher level of metadata? I chose the former. The idea was to let the client (supposed to be some application) knows there's nothing there, instead of giving providing a response and let it do the parsing then figure out the truth. Also, not sure if all clients (still, applications) can cope with empty channel names when requesting level=channel. However, since it's not defined well, so we can discuss what would benefit users most. |
Thanks for that @junghao - my understanding (from the seismology side rather than how the data are handled in the back) is that I would expect that if some data (e.g. station information) but not all (e.g. no channel information) data were available, then those data would be returned. I get that that is not how "data" is defined for the backend, but with the heirachical structure of station-xml it makes sense (in my opinion) to revert to returning all metadata that match But that is just one biased opinion. I don't know how other organisations handle this, or what other seismologists think of this. I imagine @SquirrelKnight might have an opinion on this, as might others. Happy to ask around for opinions if it would help? |
The problem I see is that the "Channel" level of the stationxml schema http://docs.fdsn.org/projects/stationxml/en/latest/reference.html#channel has the Code as required. When there is no code available this level can't be formed. So the question is, does a service requesting data at the channel level get back what it would get if asking for the station level, or nothing (as is the case now). Sort of saying give me everything down to the channel level and those that don't have channel info, then just match to the station level etc. But, this then has implications for wild-carding. Do you return station information that doesn't have channels that match the wildcards or do you skip those stations? Or do you treat the case of no wildcards given as a special case. |
Good point on wildcarding - in my opinion the current option (not returning metadata for stations that do not have channel metadata even though there are data that might match the requested channel) is worse than returning station metadata that may not match the requested channel. That (biased) opinion is based on having ignored relevant stations in my research because I did not know that they were missing channel metadata and were not included in the stationxml because of this. |
When query for channel level, the output fields should be regarding channels' information, thus the expected output would be ambiguous:
The field latitude/longitude/time is supposed to reflecting the channel's metadata, not the station. When there's no channel, they should be empty. So if we're going to respond with common metadata, then the output would be
|
Good point @junghao - I mostly care about the stationxml returned rather than the text output, which should contain the station location. Nevertheless, returning just the network and station for text would be helpful. It might help users who use the text output to fill the fields that are unknown with "unknown"? Although that would be a clear change that would affect other things and might break other peoples code/work. |
In this issue it was pointed out that the preferred response for empty meatdata is suggested in the FDSN spec:
|
That's fine for stationxml but it won't help when the text format is being used (as discussed above). |
There is still something odd with the input/output of stationxml I think it's the requirement that there be at least one stream attached to the site. |
Yes channel is required, and channel requires latitude, longitude, elevation and depth, but each of these attributes can also be empty if they are unknown, which I assume is the issue here? Agreed that it won't help with the text output, but this should just be consistent with the stationxml format (so empty everything except network code and station code as @junghao suggested?). |
The problem is that the channel needs a code, i.e. "HHZ" etc. which is the bit missing, we generally know all the rest. So this issue will not be so much about the response, but knowing what was recorded. |
But in the way the system is written, even if the code is given it will then lookup a response and skip the channel if it can't find one. So this is likely to be an area which can be improved now. |
But don't you have this information in the waveforms, along with the location code? Apologies if I'm missing something else there and being naive! |
The issue is that they are disconnected. There is no list of waveforms, just a list of sensors, a list of dataloggers, and a list of times. They need to be joined together to essentially predict what the channel codes will be, this is where the "response" element comes in. It says something like "a broadband sensor will have 3 components called Z N E or whatever), then there will be something else that says this instrument records a 100 Hz stream, which has a sensor attached to it and because it's a broadband it will be called HH . So this makes up the HHZ etc. However, if there's a bit missing (due to not knowing the sensor or datalogger types) then the join doesn't happen and it looks like there's no channel available. |
I think in some ways the hold up may be more along the lines, of "we don't know the full response so we're not going to even start the process" rather than saying, we know enough to at least determine what the code will be and just give an empty response (as suggested above). |
I've been slowly working on a rewrite of the backend code, this scenario will be much easier to handle as in the current system there are some hidden assumptions and logic steps. |
|
The update has deployed.
|
Hi @calum-chamberlain , this should have been fixed back in March when we applied some improvements to the StationXML service. |
When downloading station metadata from the FDSN webservice stations that do not have information at "channel" or "response" level have no information returned for them with
level=channel
, but they do return (basic) information when requests are made withlevel=station
. In the case of station WTSZ the query:https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=channel&format=text
returns nothing, but the query:
https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=station&format=text
returns the station location.
It would be helpful (to me) if the basic station information was returned for all stations, regardless of whether their metadata are complete. I appreciate that this may not be what everyone wants, so if there is good reason not to do this, or if this goes against FDSN protocol then I'm fine with it not changing, but wanted to at least post this somewhere so that others might find this before thinking that there were fewer stations active at a given time.
In my case, I query the station service to work out what stations are active, then look up the waveforms for those stations. I think that this is common practice (and is done by the Obspy FDSN massdownloader) so it might help to provide all the metadata that are available, even if those metadata are incomplete.
The text was updated successfully, but these errors were encountered: