Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix macros which call Bun.file operations causing build commands to hang #15991

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

brainkim
Copy link
Contributor

@brainkim brainkim commented Dec 26, 2024

What does this PR do?

  • Documentation or TypeScript types (it's okay to leave the rest blank in this case)
  • Code changes

Users reported in #7611 and #14104 that calling Bun.file().text() in a macro would cause builds to hang. This bug was introduced in v1.0.16, likely when some changes related to how Bun handles multithreading occurred.

I went through a minimal reproduction with a debugger and determined that work would get stuck in ThreadPool threads because the ThreadPool was initialized twice. The double instantiation of ThreadPool, which is assumed to be a global singleton, caused issues in macros because file I/O would get scheduled on a thread, the work would be pushed to the thread local queue/buffer, and then the thread would wait for the promise to resolve, causing a dead lock. There is code which allows for “work stealing” but threads initialized from different thread pools do not steal work each other.

The least invasive fix I found was to consistently call WorkPool.get() in the bundle_v2’s initialization code.

Fixes #7611 and fixes #14104.
Happy holidays!

How did you verify your code works?

  • I included a test for the new code, or existing tests cover it
  • I ran my tests locally and they pass (bun-debug test test-file-name.test)

@brainkim brainkim changed the title Fix macros which call Bun.file operations from causing build commands to hang Fix macros which call Bun.file operations causing build commands to hang Dec 26, 2024
@brainkim brainkim marked this pull request as draft December 26, 2024 13:56
@brainkim
Copy link
Contributor Author

I guess things are never this easy...

@Jarred-Sumner
Copy link
Collaborator

My guess is that by not using the bundler’s threadpool, the threads never call their init function

it sounds like we should have a global variable for “threadpool ID” and only use the global one in that case maybe?

@brainkim
Copy link
Contributor Author

brainkim commented Dec 28, 2024

I tried building on ubuntu and the failing test passes locally. I also tested it by hand.

Screenshot 2024-12-28 at 12 16 20 AM

Is it possible that the CI is running the tests against a stale build? I’m not sure how this failure could be platform dependent. Perhaps it is due to the number of threads available?

Running the build process in a container takes a painfully long time, wishing for cross-compilation here.

Screenshot 2024-12-27 at 7 44 59 PM

@brainkim brainkim marked this pull request as ready for review December 28, 2024 01:00
@brainkim
Copy link
Contributor Author

Sorry about the failures. The additional Windows failure may indicate that this is a race condition of some sort. I searched the codebase for additional direct uses of ThreadPool.init() and found one in installer.zig, which might be getting called during the test running process. Beyond that, I’m stumped unless I figure out how to reproduce the test failure locally.

Again, sorry about the test failures!

@brainkim brainkim force-pushed the fix-macros-hanging branch 2 times, most recently from 6ffdf04 to a07bb13 Compare January 23, 2025 21:27
@brainkim
Copy link
Contributor Author

brainkim commented Jan 23, 2025

@Jarred-Sumner
Took another look at this and I think I figured out the problem. The platform differences are likely due to maximum number of threads for the work pool being set to the number of logical cores available, which is probably 2 for the linux environments. The deadlock in the test suite happens when the max_threads is set to anything less than 3.

This would have been fine, except there’s a bug in ThreadPool.warm function:

bun/src/thread_pool.zig

Lines 416 to 435 in 7830e15

pub fn warm(self: *ThreadPool, count: u14) void {
var sync = @as(Sync, @bitCast(self.sync.load(.monotonic)));
if (sync.spawned >= count)
return;
const to_spawn = @min(count - sync.spawned, @as(u14, @truncate(self.max_threads)));
while (sync.spawned < to_spawn) {
var new_sync = sync;
new_sync.spawned += 1;
sync = @as(Sync, @bitCast(self.sync.cmpxchgWeak(
@as(u32, @bitCast(sync)),
@as(u32, @bitCast(new_sync)),
.release,
.monotonic,
) orelse break));
const spawn_config = std.Thread.SpawnConfig{ .stack_size = default_thread_stack_size };
const thread = std.Thread.spawn(spawn_config, Thread.run, .{self}) catch return self.unregister(null);
thread.detach();
}
}

In short, cmpxchngWeak will return null on success, but this code breaks on null, meaning the spawned property is incremented without actually spawning any new threads, causing an off-by-one error when spawning new threads normally. This, combined with the task dependency created by macros, where the parsing thread is blocked until the macro finishes, causes a deadlock at two max threads when the macro call schedules file I/O. You can confirm this caused the bug by commenting out the warm() call here (

this.pool.warm(8);
) at two max threads. I assumed that there are performance reasons for warming the thread pool so I fixed the function instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bun build stucks with async reading file in macro Build macros hang on Bun.file(...).text()
2 participants