Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ninja: default pool depth for link.exe #13937

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lb90
Copy link
Contributor

@lb90 lb90 commented Nov 21, 2024

link.exe doesn't quite benefit from parallel invocations. It's internally threaded and consumes LOTS of memory, so it makes sense to limit the link stage concurrency with a ninja pool. As always, the value can be overriden by the user with max_backend_link

Before that, I could easily run out of memory on a laptop with 8 logical CPUs and 16GB of RAM. The linking stage was also very slow due to frequent swapping

See also:

@lb90 lb90 requested a review from jpakkane as a code owner November 21, 2024 13:51
@lb90 lb90 force-pushed the default-pool-link-exe branch 2 times, most recently from 6f50df3 to f1c4984 Compare November 21, 2024 15:35
@eli-schwartz
Copy link
Member

Fixes #13573

It certainly does not! You propose to reset the existing default value of the one pool meson currently exposes. That ticket was asking for user-defined pools.

@eli-schwartz
Copy link
Member

Your quotes from the ninja issue tracker don't support the position you are taking here. One of them is about ninja's default -j value being processors + 2 resulting in too much parallelism for the compile stage of heavy C++ sources, and the other ticket is by the author of mold, who strenuously objects to the use of link pools in ninja "because it just results in mold dying of memory issues when running in parallel with compiling C++ code".

Your Microsoft links are very sparse on information, one of them is a problem report about CL.exe, not LINK.exe, and the other is an article about how CL.exe (again) can/should be made to operate in batched mode where one CL.exe compiles multiple files (or possibly a server mode where running CL.exe submits jobs to a daemon).

Copy link
Member

@eli-schwartz eli-schwartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole approach is misguided, see how GCC's -flto=jobserver mode works for how to correctly do link-time multithreading that actually has the intended effect.

Linkers that currently perform fully uncontrolled resource exhaustion because they hardcode the assumption that they are the only process on the system that runs at the same time (and cannot even fathom the notion of another process performing heavy C++ compilation in the same build !!!) need to simply give up on their notion of automatically spawning $(nproc) threads, and collaborate via the jobserver instead.

Meantime, you can <highly opinionated comment>use ld.bfd<highly opinionated comment/>.

mesonbuild/backend/ninjabackend.py Outdated Show resolved Hide resolved
mesonbuild/backend/ninjabackend.py Outdated Show resolved Hide resolved
@eli-schwartz
Copy link
Member

As an aside, mold run by hand, with no other processes running in the background or in another terminal, will exhaust the open file descriptors count. This ticket has been open for over 2 years and the official response is:

I don't think we can fix it on our side, as I believe LTO uses that many files.

Despite frequent requests to add a very obvious solution, and despite the fact that other linkers handle this just fine, the mold author appears to be very worried that the problem may be unsolvable. At this point it's just become a betting game how long the ticket will stay open.

@lb90
Copy link
Contributor Author

lb90 commented Nov 21, 2024

Hi @eli-schwartz, thanks for the feedback! :)

Yeah, I linked some issues that provide relevant informations, though are not specifically about link.exe. That said, the author of mold suggests that modern linkers do not benefit from parallel invocations, and link.exe is one of those. MS link uses memory in the order of gigabytes even without LTO, while cl.exe is generally in the order of a few megabytes (about 20-30 MB when compiling GTK - based on C). I experience OOM errors and severe slowdowns while building GStreamer

The whole approach is misguided, see how GCC's -flto=jobserver mode works for how to correctly do link-time multithreading that actually has the intended effect.

indeed having link.exe in "server mode" would be ideal, but I dont' know if such interface is supported

@eli-schwartz
Copy link
Member

eli-schwartz commented Nov 21, 2024

indeed having link.exe in "server mode" would be ideal, but I dont' know if such interface is supported

To be clear, link.exe wouldn't run in server mode, and gcc doesn't run in server mode either.

-flto=jobserver means that when you link with gcc -flto=jobserver foo.o bar.o -o myprog, it checks to see if a parent Make process exists, then asks GNU Make how many available -j slots are available (presuming that when you run make -j8 it's frequently the case that e.g. two other compile jobs are running, for a total of three jobs -- that means that LTO is free to use five parallel LTO processes).

link.exe would need to be a client. It would also need to implement the desired functionality, likely by having Microsoft's corporate office decide they care about this.

You can read about it at https://www.gnu.org/software/make/manual/html_node/Job-Slots.html

@lb90 lb90 force-pushed the default-pool-link-exe branch from f1c4984 to bd120d9 Compare November 22, 2024 09:55
@lb90 lb90 force-pushed the default-pool-link-exe branch from bd120d9 to 055d6b6 Compare November 22, 2024 09:57
@bonzini
Copy link
Collaborator

bonzini commented Dec 6, 2024

it checks to see if a parent Make process exists, then asks GNU Make how many available -j slots are available (presuming that when you run make -j8 it's frequently the case that e.g. two other compile jobs are running, for a total of three jobs -- that means that LTO is free to use five parallel LTO processes).

Except the Ninja patches for jobserver mode have been rejected for years (and though there's now some action, it won't be in a release until 1.13, i.e. for a year or so!). As to Samurai, my own jobserver PR has been ignored for months; the project seems completely dead and I am seriously considering forking it.

So the lecture is a bit rich from a project that only supports Ninja and that has itself rejected a PR for jobserver usage in mtest (#4898).

@eli-schwartz
Copy link
Member

Except the Ninja patches for jobserver mode have been rejected for years (and though there's now some action, it won't be in a release until 1.13, i.e. for a year or so!). As to Samurai, my own jobserver PR has been ignored for months; the project seems completely dead and I am seriously considering forking it.

I can empathize with your serious consideration as it reflects my own disappointment with both projects. ;)

So the lecture is a bit rich from a project that only supports Ninja and that has itself rejected a PR for jobserver usage in mtest (#4898).

  • that PR didn't have any involvement from me (I've never seen it before, it predates my activity?). Different people from the same team of project maintainers can have different opinions and it's not being "a bit rich", maybe we just disagree. ;)
  • that PR wasn't rejected, there was a discussion about preferred implementation details for implementing it which frankly I don't understand at all, the review request felt like a very strange thing to worry over and is asking to instead use an obviously inferior mechanism for reasons I don't understand. But instead of continuing to argue your case you simply closed the PR yourself.

I would have probably recommended you leave that PR open and instead say "sorry but I'm convinced my approach is a more technically correct approach and don't want to redesign it. Can you please tell me if that's a dealbreaker for you and either merge or close this PR?"

I'll also say that I'm (consistent with my responses in this PR) quite sympathetic to your linked mtest PR (and would benefit from it myself, if portage learns to spin up a job server the way yocto does).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants