Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 6372 - Deadlock while doing online backup #6475

Merged
merged 2 commits into from
Jan 7, 2025

Conversation

progier389
Copy link
Contributor

@progier389 progier389 commented Jan 6, 2025

Sometime server hangs during online backup because of a deadlock due to lock order inversion between the dse backup_lock mutex and the dse rwlock.

Solution:
Add functions to manage the lock/unlock to ensure consistency.
Ensure that threads always tries to lock dse_backup_lock mutex before the dse write lock
Code cleanup:

  • Move the backup_lock into the dse struct
  • Avoid the obsolete warning during tests (I think that we will have to do a second cleanup phase later to see if we could not replace self.conn.add_s by self.create .

Issue: #6372

Closing: #6372

Reviewed by: @mreynolds389 (Thanks!)

Copy link
Contributor

@mreynolds389 mreynolds389 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, ack. I know that historically in DS we use "int" for boolean purposes, but using a PRBool for "dse_backup_in_progress" would be nice. I'm not asking for it, but it's something to consider.

@progier389
Copy link
Contributor Author

I know that historically in DS we use "int" for boolean purposes, but using a PRBool for "dse_backup_in_progress" would be nice.
I have no strong opinion about that one. In fact I hesitated before choosing 0/1 !
I wonder if we should not rather use the C11 bool type and true/false (I think that these these days most compiler support it and this looks more logic for the part of code that is independent of nspr ...) but once again I have no strong opinion ...

@mreynolds389
Copy link
Contributor

I know that historically in DS we use "int" for boolean purposes, but using a PRBool for "dse_backup_in_progress" would be nice.
I have no strong opinion about that one. In fact I hesitated before choosing 0/1 !
I wonder if we should not rather use the C11 bool type and true/false (I think that these these days most compiler support it and this looks more logic for the part of code that is independent of nspr ...) but once again I have no strong opinion ...

Yeah I feel nspr is becoming less and less "needed" - if not already obsolete. It would actually be nice to completely remove it from DS as some point :-) So lets do this, lets use the C11 bool type here, so it's in the code for all to see, and then we can all start using this moving forward.

@progier389
Copy link
Contributor Author

Push the change with C11 bool

@mreynolds389
Copy link
Contributor

Push the change with C11 bool

Thanks, re-ack!

@progier389 progier389 merged commit 7e98ab3 into 389ds:main Jan 7, 2025
198 of 199 checks passed
progier389 added a commit that referenced this pull request Jan 7, 2025
* Issue 6372 - Deadlock while doing online backup

Sometime server hangs during online backup because of a deadlock due to lock order inversion between the dse backup_lock mutex and the dse rwlock.

Solution:
Add functions to manage the lock/unlock to ensure consistency.
Ensure that threads always tries to lock dse_backup_lock mutex before the dse write lock
Code cleanup:

Move the backup_lock into the dse struct
Avoid the obsolete warning during tests (I think that we will have to do a second cleanup phase later to see if we could not replace self.conn.add_s by self.create .
Issue: #6372

Reviewed by: @mreynolds389 (Thanks!)

(cherry picked from commit 7e98ab3)
progier389 added a commit that referenced this pull request Jan 7, 2025
* Issue 6372 - Deadlock while doing online backup

Sometime server hangs during online backup because of a deadlock due to lock order inversion between the dse backup_lock mutex and the dse rwlock.

Solution:
Add functions to manage the lock/unlock to ensure consistency.
Ensure that threads always tries to lock dse_backup_lock mutex before the dse write lock
Code cleanup:

Move the backup_lock into the dse struct
Avoid the obsolete warning during tests (I think that we will have to do a second cleanup phase later to see if we could not replace self.conn.add_s by self.create .
Issue: #6372

Reviewed by: @mreynolds389 (Thanks!)

(cherry picked from commit 7e98ab3)
progier389 added a commit that referenced this pull request Jan 7, 2025
* Issue 6372 - Deadlock while doing online backup

Sometime server hangs during online backup because of a deadlock due to lock order inversion between the dse backup_lock mutex and the dse rwlock.

Solution:
Add functions to manage the lock/unlock to ensure consistency.
Ensure that threads always tries to lock dse_backup_lock mutex before the dse write lock
Code cleanup:

Move the backup_lock into the dse struct
Avoid the obsolete warning during tests (I think that we will have to do a second cleanup phase later to see if we could not replace self.conn.add_s by self.create .
Issue: #6372

Reviewed by: @mreynolds389 (Thanks!)

(cherry picked from commit 7e98ab3)
progier389 added a commit that referenced this pull request Jan 9, 2025
* Issue 6372 - Deadlock while doing online backup

Sometime server hangs during online backup because of a deadlock due to lock order inversion between the dse backup_lock mutex and the dse rwlock.

Solution:
Add functions to manage the lock/unlock to ensure consistency.
Ensure that threads always tries to lock dse_backup_lock mutex before the dse write lock
Code cleanup:

Move the backup_lock into the dse struct
Avoid the obsolete warning during tests (I think that we will have to do a second cleanup phase later to see if we could not replace self.conn.add_s by self.create .
Issue: #6372

Reviewed by: @mreynolds389 (Thanks!)

(cherry picked from commit 7e98ab3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants