-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any chance to add new metric: nats_stream_total_messages_(per_subject) ? #305
Comments
Hey. Thanks for the report. |
For example, in JetStream: S (RetentionPolicy=workqueue) handling subjects: domain.>, If we only monitor the total count (e.g., total messages/bytes), it will be impossible to distinguish which consumer (APP-x) is abnormal, whether it's due to slow processing speed or crashed. If there could be a metric: nats_stream_total_messages_per_subject |
Ok, I got it! Thanks for the explanation. There are few issues with this approach:
However, NATS server does not know why the client stopped using the ephemeral consumer. Nor it knows what are the intentions of the apps if they did abandon a client, and it's hard to derive it from server metrics. Also keep in mind, that having unprocessed messages does not mean that the JetStream is overwhelmed. It only means that you are publishing faster than you are consuming. And again - only in workqueue streams. Would not say anything in Limits based streams. Of course I'm aware that above are true for your use case, and I'm only pointing out they can't be generalized. Maybe let's try to look at the problem differently: Why use ephemeral consumers instead of durable ones? |
I agree that this doesn't sound like a server metric. From my limited understanding, NATS can have the following use cases (modes):
For a NATS cluster, the health metrics required may vary depending on the different modes. Just as existing metrics are categorized from various perspectives such as consumer, server, and stream, similarly, could we establish necessary metrics for different usage modes? Clearly, my use case is in MQ mode. Since messages in workqueue will only be consumed once, there is no need to rely on durable features to remember which record was last consumed. Additionally, there might be 1000+ ephemeral consumers (and 100+ durable ones) in the entire system; choosing ephemeral reduces load. In MQ mode, a high message count/percentage under a specific stream subject is usually considered abnormal regardless of whether it's due to slow or dead consumers. The suggestion for "msg per subject" comes from two bases:
Thank you again for your reply. |
What motivated this proposal?
This would be very helpful for detecting the health and performance of ephemeral、pull type consumers in the system.
What is the proposed change?
This project already has metrics: nats_stream_total_messages and nats_stream_total_bytes, but not enough for ephemeral、pull type consumer situation.
Who benefits from this change?
No response
What alternatives have you evaluated?
No response
The text was updated successfully, but these errors were encountered: