-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix RecursionError because of repeated channel reconnections. #380
base: main
Are you sure you want to change the base?
Fix RecursionError because of repeated channel reconnections. #380
Conversation
Finally raise RecursionError:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please add unit test for the changes? so that we can reproduce the issue
@pawl @michael-lazar can you guys please try this patch? |
@auvipy I don't know much about this project. I think it is a problem because it repeats in my production environment. But I don't know how to reproduce it and really solve it. |
@liuyaqiu Is this a recent issue for you or has this been happening for a while? |
This has been happening for a while. But it now always repeats on my production environment. I think it is because of:
I think the rabbitmq may be in wrong status, so the client received too much Then, the RecurssionError is not captured by the celery framework, celery think it is a task's runtime error, so it reported the task failed. In fact, the task didn't event start. (The task failed when it publish task's status to rabbitmq backend). I think now my solution is a quick fix for this problem. When the client found too much on_close, it should stop channel reviving and raise ChannelError, rather than repeated to cause a A better idea may be: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with following statement:
I think now my solution is a quick fix for this problem. When the client found too much on_close, it should stop channel reviving and raise ChannelError, rather than repeated to cause a RecurssionError which can't be catched by downstream application.
A better idea may be:
When a channel is reviving, ignore all frame other than S:OPEN-OK. Then the channel should not auto reviving after too much open operation during a period, and then exit to avoid infinity loop.
Me personally, I prefer to have final fix. this PR is honestly just dirty fix which can lead to other hidden problems. |
Thanks. I will try to solve it in a better way. |
@pawl if you have time in coming days |
@auvipy @liuyaqiu @matusvalo Hello guys, I had this issue too, any updates about it ? |
I don't know your problem context. Previously I call a subtask synchronously in a parent task and use the rpc result backend to store task state and result, I try to get subtask's state and result in the parent task. But now I don't use rpc result backend and use the mongodb result backend. Previously my error is encountered when I get the subtask's state and result from rpc result backend. And now I just use RabbitMQ as broker, there is no such error. And you should not use the rpc result backend in production environment because the rpc result backend will create a unique queue for every task to store its state and result. Then this leads to too many result queues in RabbitMQ, which will waste resource of RabbitMQ and harm RabbitMQ's performance. |
What I am describing remains in the version v5.2.1. Is this changed in the master? |
no you are right, that didn't changed |
Fix RecursionError because of repeated channel reconnections.