Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v6.db.transport.rest API status page inconsistent with my test results #33

Closed
LilaHexe0 opened this issue Dec 17, 2024 · 3 comments
Closed
Labels
ops operations question Further information is requested

Comments

@LilaHexe0
Copy link

The API status page (https://stats.uptimerobot.com/57wNLs39M/793274556) claims 100% uptime on days where my personal monitoring (datadog) indicates otherwise:

Screenshot 2024-12-17 at 15-09-45 Test on v6 db transport res  Datadog

Most/all of the failures are caused by 503 errors.

How exactly does the UptimeRobot check if the service is operational?

@traines-source
Copy link
Member

I think the status page only indirectly, if at all, monitors whether HAFAS requests themselves are working. Maybe one could switch to monitoring the /health endpoint directly, of course entailing many more additional requests towards HAFAS.

The 503s you've been encountering are mostly due to errors on the HAFAS side, and it seems that the DB HAFAS mgate.exe endpoint will be shut off soon (see public-transport/hafas-client#331 and
schildbach/public-transport-enabler#610)

@derhuerst derhuerst added the question Further information is requested label Jan 6, 2025
@schaerfo
Copy link

schaerfo commented Jan 9, 2025

I agree, using the /health endpoint for status monitoring would result in uptime stats that reflect real-world use cases of the API better.

@derhuerst
Copy link
Member

derhuerst commented Jan 9, 2025

At least with regards to DB's HAFAS API (and v6.db.transport.rest), this seems obsolete now that it's likely shut-off for good.

However, let me make a more general point that applies to other HAFAS-based *.transport.rest APIs: Obviously, the /health endpoint is not using caching. If you all use it to monitor availability of the API, you'll quickly exhaust the shared resource "requests from the server's single static IP to HAFAS", so you effectively prioritise your personal insight when the API is available over everyone's access to it. To keep the rate of requests low, I don't see any solution to this other than making the /health endpoint private.

As an alternative, I suggest you to monitor "user-need-driven" requests (for actual public transport data), specifically e.g. their rate of success/error and the last successful one.

It might also be worthwhile to add Prometheus-/OpenMetrics-compatible metrics to hafas-rest-api and expose them to the public, so you can ingest and monitor them.

@LilaHexe0 LilaHexe0 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 9, 2025
@derhuerst derhuerst added the ops operations label Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ops operations question Further information is requested
Development

No branches or pull requests

4 participants