Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JetStream ordered consume stability fixes #202

Merged
merged 3 commits into from
Nov 13, 2023

Conversation

mtmk
Copy link
Collaborator

@mtmk mtmk commented Nov 10, 2023

Because ordered consume operations use ephemeral consumers it's important to be able to create new consumers when things go wrong. There are two major events that might strongly indicate we won't be able to find our consumer on the server: disconnects and idle heartbeat timeouts.

With this approach introduced with this fix, we recreate the consumer on server disconnects and idle heartbeat timeouts, making sure the consumer can carry on from where it's left off (sequence state is maintained when this happens).

Justification for large code duplication: NatsJSOrderedConsume.cs is a copy of NatsJSConsume.cs subscription class. Reason for this is to maintain stability of the normal consumer since the main behaviour is fairly different. There is also a chance the behaviours might diverge even greater as we discover other issues. We may consider to merge these classes in a tidy-up effort later on.

We also introduced a general timeout (same as connection request timeout) for all JetStream API calls. We needed this because the consumer deletion process was sometimes hanging due to the server receiving the request being killed and never sending a response back. Before this we were relying on the CommandTimeout (which is 1 minute by default) to kick in. Now we use RequestTimeout (5 seconds by default) on the subscription waiting for the reply.

Because ordered consume operations use ephemeral consumers it's important
to be able to create new consumers when things go wrong. There are two
major events that might strongly indicate we won't be able to find our
consumer on the server: disconnects and idle heartbeat timeouts.

With this approach introduced with this fix, we recreate the consumer on
server disconnects and idle heartbeat timeouts, making sure the consumer
can carry on from where it's left off (sequence state is maintained when
this happens).

Justification for large code duplication: NatsJSOrderedConsume.cs is a
copy of NatsJSConsume.cs subscription class. Reason for this is to
maintain stability of the normal consumer since the main behaviour is
fairly different. There is also a chance the behaviours might diverge
even greater as we discover other issues. We may consider to merge
these classes in a tidy-up effort later on.

We also introduced a general timeout (same as connection request timeout)
for all JetStream API calls. We needed this because the consumer deletion
process was sometimes hanging due to the server receiving the request
being killed and never sending a response back. Before this we were
relying on the CommandTimeout (which is 1 minute by default) to kick in.
Now we use RequestTimeout (5 seconds by default) on the subscription
waiting for the reply.
@mtmk mtmk requested a review from caleblloyd November 10, 2023 05:14
Copy link
Collaborator

@caleblloyd caleblloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mtmk mtmk merged commit 240a7e5 into main Nov 13, 2023
9 checks passed
@mtmk mtmk deleted the js-ordered-consume-stability-fixes branch November 13, 2023 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants