-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the kafka reader got an unknown error reading partition #726
Comments
Hello @nsomarouthu, thanks for reporting the issue. Do you know whether these errors are causing adverse effect on your application? Or are they simply transient errors that are handled and recovered by kafka-go automatically? I/O timeouts seem like frequent problems in network systems, so I would not be too concerned about it, but let me know if this is causing any other issues! |
Hello @achille-roussel, If there is any setting in kafka.ReaderConfig to reduce the number of errors that we are experiencing currently with our configuration. Reader exit the loops when reporting this issue and reconnect. This doesn't impact the reading functionality of the message, but this seems expensive. |
@achille-roussel I am in the same team/company as @nsomarouthu, working together on the 2 issues. For the 2nd issue, we have a few error messages about 5 seconds after finished handling and trying to commit the Kafka message with error such as |
A MaxWait for 100ms definitely seems short. Typically, I've seen this value configured to between 250ms and a couple of seconds. Kafka is designed for batch processing, clients use long-polling to fetch batches of messages, which is what MaxWait configures. In my experience, Kafka can be complex to configure well to achieve low latency. For you second question, it appears we are not exposing this configuration option when creating the ConsumerGroup internally in the Reader. Would you be able to contribute a change to add this option? |
@achille-roussel thanks for the feedback. We will test tuning of MaxWait (likely in combination of MinBytes and MaxBites) in our applications. We can also test configuring |
Hello @chihweichang, how did your investigation go? Are there any follow ups we should discuss here? |
same error message when reading large data, server side client reader no issue, local dev environment reader cannot read the large data with error: func (r *reader) read(ctx context.Context, offset int64, conn *Conn) (int64, error) {
r.stats.fetches.observe(1)
r.stats.offset.observe(offset)
t0 := time.Now()
conn.SetReadDeadline(t0.Add(r.maxWait))
batch := conn.ReadBatchWith(ReadBatchConfig{
MinBytes: r.minBytes,
MaxBytes: r.maxBytes,
IsolationLevel: r.isolationLevel,
})
highWaterMark := batch.HighWaterMark()
t1 := time.Now()
r.stats.waitTime.observeDuration(t1.Sub(t0))
var msg Message
var err error
var size int64
var bytes int64
const safetyTimeout = 60 * time.Second // changed from 10 to 60
deadline := time.Now().Add(safetyTimeout)
conn.SetReadDeadline(deadline) |
And not sure the meaning of code below
it overwrite the deadline which been set before and cannot be configured with MaxWait in Reader configure. Can we remove the code above? Refer to commented code below
|
@achille-roussel sorry, I haven't had time to investigate further on this issue. |
Hello @chihweichang! I wanted to follow up on this issue and ask if you were able to investigate it further. @lujiacn thanks for following up with suggested changes. The 10s safeguard is definitely very opinionated, it would be better for this value to be configurable. Would you be available to send a pull request to support tuning this timeout? |
One of the users of my project is experiencing the same issue and I don't know what's causing the issue. @nsomarouthu @chihweichang Have you been able to pinpoint the issue here? @moogacs You seem to have fixed the issue on your fork. Is there anything you can add here? |
@mostafa, the issue has been detected and indeed allowing to configure the reads timeout will fix is so the user of the lib will be able to adjust the read timeouts based on their needs. atm it's hard coded to 10 sec. and increasing that will resolve it. So I am waiting in approving my PR and merging it |
@achille-roussel i think can help in peroritizing #989 |
@achille-roussel I saw your review on @moogacs's PR, #989. What does it take to merge that PR? Is there anything missing? Can I help in any way? |
#989 has been merged, let me know if you are still experiencing the issue on the latest version of kafka-go! |
@achille-roussel |
First issue
we're often getting a reader error ~500k/day.
kafka-go/reader.go
Lines 1366 to 1375 in a4890bd
Kafka-go 0.4.16
Kafka 2.5.0
Second issue
The errors happened when kafka reader is committing the message after it has been processed successfully, the message was re-consumed by another replica
"msg": "debezium.Consumer: failed to commit message: write tcp IP_ADDRESS:49610->IP_ADDRESS:9093: use of closed network connection"
Receiving Successfully handled message and a while afterward getting failed to commit message
The text was updated successfully, but these errors were encountered: