Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 5954 - Disable Transparent Huge Pages v2 #5964

Merged
merged 1 commit into from
Dec 11, 2023
Merged

Issue 5954 - Disable Transparent Huge Pages v2 #5964

merged 1 commit into from
Dec 11, 2023

Conversation

vashirov
Copy link
Member

@vashirov vashirov commented Oct 18, 2023

Bug Description:
THP can have negative effects on DS performance when large caches are
used.

Fix Description:

  • Add a new variable for ns-slapd THP_DISABLE.
    When THP_DISABLE is set to 1, THP is disabled for ns-slapd process
    via prctl(2). With any other value, THP settings are untouched.

Before:

$ grep THP /proc/$(pidof ns-slapd)/status
THP_enabled:    1

After

$ grep THP /proc/$(pidof ns-slapd)/status
THP_enabled:    0
  • Add a new healthcheck linter, that checks if THP is disabled system-wide
    or per instance. In case THP is enabled for both the system and the
    process, it prints recommendations how to disable THP.

Fixes: #5954

Reviewed-by: ???

Copy link
Contributor

@tbordaz tbordaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM
Let's see which option is preferred

@Firstyear
Copy link
Contributor

My concern here is we are addressing the symptom not the problem. I think if we disable THP then we'll likely never turn it back on again, and we could be losing out on benefits. Shouldn't we pursue a fix to the underlying issue?

@vashirov
Copy link
Member Author

I understand your concern. I think that the underlying issue is that THP is not applicable to certain workloads, where data access patterns are not contiguous, with a lot of small allocations and deallocations.
Some examples:
https://www.pingcap.com/blog/transparent-huge-pages-why-we-disable-it-for-databases/
https://www.mongodb.com/docs/manual/tutorial/transparent-huge-pages/
https://redis.io/docs/management/optimization/latency/#latency-induced-by-transparent-huge-pages
https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-HUGE-PAGES
https://docs.oracle.com/en/database/oracle/oracle-database/19/ladbi/disabling-transparent-hugepages.html

Every optimization is a tradeoff and comes with a cost. THP can reduce page faults, but in our case it also causes memory fragmentation, CPU spikes (khugepaged has a non-zero overhead) and latency spikes. If we can't take advantage of this optimization due to our workload and usage patterns, I see no reason to keep it enabled.

@Firstyear
Copy link
Contributor

Every optimization is a tradeoff and comes with a cost. THP can reduce page faults, but in our case it also causes memory fragmentation, CPU spikes (khugepaged has a non-zero overhead) and latency spikes. If we can't take advantage of this optimization due to our workload and usage patterns, I see no reason to keep it enabled.

Okay, lets disable it then.

@vashirov
Copy link
Member Author

@Firstyear, thanks!

@tbordaz, any thoughts on the healthcheck part? Do you think it's still needed?

@tbordaz
Copy link
Contributor

tbordaz commented Oct 21, 2023

@Firstyear, thanks!

@tbordaz, any thoughts on the healthcheck part? Do you think it's still needed?

IIUC, with the patch the server requests to disable THP for itself. So the benefit of healthcheck would be to double check that the request was successful. I think it worthes doing that check if it is not too complex to implement.

@vashirov
Copy link
Member Author

vashirov commented Dec 5, 2023

I've added healthcheck linter and tests for it, please review.
Thanks!

@vashirov
Copy link
Member Author

vashirov commented Dec 5, 2023

Newly added test failed, investigating.

@vashirov
Copy link
Member Author

I fixed the test, now it passes. @tbordaz, please let me know if you're happy with the healthcheck changes, thanks.

@tbordaz
Copy link
Contributor

tbordaz commented Dec 11, 2023

I fixed the test, now it passes. @tbordaz, please let me know if you're happy with the healthcheck changes, thanks.

Yes that looks perfect to me

Bug Description:
THP can have negative effects on DS performance when large caches are
used.

Fix Description:
* Add a new variable for `ns-slapd` THP_DISABLE.
  When THP_DISABLE is set to 1, THP is disabled for `ns-slapd` process
  via `prctl(2)`. With any other value, THP settings are untouched.

Before:
```
$ grep THP /proc/$(pidof ns-slapd)/status
THP_enabled:    1
```

After
```
$ grep THP /proc/$(pidof ns-slapd)/status
THP_enabled:    0
```

* Add a new healthcheck linter, that checks if THP is disabled system-wide
  or per instance. In case THP is enabled for both the system and the
  process, it prints recommendations how to disable THP.

Fixes: #5954

Reviewed-by: @tbordaz, @Firstyear, @droideck (Thank you all!)
@vashirov vashirov merged commit 8928951 into 389ds:main Dec 11, 2023
9 checks passed
@vashirov vashirov deleted the i5954v2 branch January 24, 2024 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disable Transparent Huge Pages
4 participants