-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory issue with sorting_with_executors
#17
Comments
Turns out, it’s a known issue. |
Also relates to this issue. |
@ashvardanian I reproduced the issue with a manually built oneTBB. WSL Ubuntu 22.04, gcc version 13.1.0. Latest NB. Same issue outside of WSL. #include <vector>
#include <cstdint>
#include <numeric>
#include <execution>
#include <algorithm>
#include <oneapi/tbb.h>
int main() {
int count = 100000;
std::vector<std::uint32_t> array(count);
std::iota(array.begin(), array.end(), 1u);
#ifndef PAR
auto policy = std::execution::seq;
#else
auto policy = std::execution::par_unseq;
#endif
for (int i = 0; i < 100000; ++i) {
std::sort(policy, array.begin(), array.end());
}
} In contrast, using #include <vector>
#include <cstdint>
#include <numeric>
#include <execution>
#include <algorithm>
int main() {
int count = 100000;
std::vector<std::uint32_t> array(count);
std::iota(array.begin(), array.end(), 1u);
#pragma omp parallel for
for (int i = 0; i < 100000; ++i) {
std::sort(array.begin(), array.end());
}
} |
@ashvardanian The problem lies with libstdc++. We are waiting for a fix here: Currently, one can fix it locally by modifying the libstdc++ files at: This works for me class __task : public tbb::detail::d1::task
{
protected:
@@ -646,10 +646,15 @@
_PSTL_ASSERT(__parent != nullptr);
_PSTL_ASSERT(__parent->_M_refcount.load(std::memory_order_relaxed) > 0);
- if (--__parent->_M_refcount == 0)
+ auto __refcount = --__parent->_M_refcount;
+
+ // Placing the deallocation after the refcount decrement allows another thread to proceed with tree
+ // folding concurrently with this task cleanup.
+ __alloc.deallocate(this, *__ed);
+
+ if (__refcount == 0)
{
_PSTL_ASSERT(__next == nullptr);
- __alloc.deallocate(this, *__ed);
return __parent;
} Found solution here uxlfoundation/oneTBB#1533 |
@alexbarev, how about we add a |
Run
sorting_with_executors
benchmark using thestd::execution::par_unseq
policy.Memory consumption quickly exceeds my machine's availability of 60GB after the second test variant finishes:
sorting_with_executors/par_unseq/4194304/
Observations:
Memory does not decrease between tests with inputs of different sizes.
I tried moving the policy inside the loop, but it did not resolve the issue:
I noticed that only 16 threads are spawned, which matches the number of available CPU's. The issue is puzzling, and I am yet to understand its root cause.
The text was updated successfully, but these errors were encountered: