Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: multithread linting #129

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

feat: multithread linting #129

wants to merge 3 commits into from

Conversation

fasttime
Copy link
Member

Summary

This document proposes adding a new multithread mode for ESLint#lintFiles(). The aim of multithread linting is to speed up ESLint by allowing workload distribution across multiple CPU cores.

Related Issues

eslint/eslint#3565

@bradzacher
Copy link

bradzacher commented Dec 20, 2024

I don't see any mention of the previous RFCs around parallelisation:

#42
#87

Both of these have a lot of context about the difficulties of parallelisation outside of the core rules - eg in cases where the parser or the rules store state in any form.
Two quick and prevalent examples:

  • eslint-plugin-import and friends store a cache of modules that has resolved and that it has parsed outside of the eslint cycle.
  • @typescript-eslint has type information produced by the parser and consumed by rules.

Naively parallelising by just "randomly" distributing files across threads may lead to a SLOWER lint run in cases where people use such stateful plugins because the cached work may need to be redone once for each thread.

I would like to see such usecases addressed as part of this RFC given that these mentioned usecases are very prevalent - with both mentioned plugins in use by the majority of the ecosystem.

These problems have been discussed before in the context of language plugins and parser contexts (I can try to find the threads a bit later).

@fasttime
Copy link
Member Author

Thanks for the input @bradzacher. How would you go about incorporating context from #42 and #87 into this RFC?

I see that #42 suggests introducing a plugin setting disallowWorkerThreads and also limiting the number of threads depending on the number of files. Those both measures could be actually useful when the concurrency is calculated automatically. Do you think that would be helpful?

As for #87, it seems about an unrelated feature that doesn't even require multithreading. But I get why it would be beneficial to limit the number of instances of the same parser across threads, especially if the parser takes a long time to load its initial state, like typescript-eslint with type-aware parsing. If you have any concrete suggestions on how to do that, I'd love to know.

Naively parallelising by just "randomly" distributing files across threads may lead to a SLOWER lint run in cases where people use such stateful plugins because the cached work may need to be redone once for each thread.

I would like to see such usecases addressed as part of this RFC given that these mentioned usecases are very prevalent - with both mentioned plugins in use by the majority of the ecosystem.

I imagine the way one would address such use cases is by making no changes, i.e. not enabling multithread linting if the results are not satisfactory. But if something can be done to improve performance for popular plugins that would be awesome.

@bradzacher
Copy link

To be clear - I'm all for such a system existing. Like caching it can vastly improve the experience for those that fit within the bounds.

The thing I want to make sure of is that we ensure the bounds are either intentionally designed to be tight to avoid complexity explosion, or that we are at least planning a path forward for the mentioned cases.


#87 has some discussions around parallel parsing which are relevant to the sorts of ideas we'd need to consider here.

Some other relevant discussions can be found in
eslint/eslint#16819
eslint/eslint#16818 (some concepts discussed are semi-relevant here)
eslint/eslint#16557 (comment) (and other threads in that discussion)
eslint/eslint#14139 (some more relevant context about plugin setup)

I'm pretty swamped atm cos holiday season and kids and probably won't be able to get back to this properly until the new year.

@nzakas
Copy link
Member

nzakas commented Dec 31, 2024

Thanks for putting this together. I'm going to need more time to dig into the details, and I really appreciate the amount of thought and explanation you've included in this RFC. I have a few high-level thoughts from reviewing this today:

  1. I'd like to see an exploration of how other tools handle concurrency. I know other tools aren't a one-to-one comparison with ESLint, but there are plenty of tools in the ecosystem that do concurrency. For instance, Jest was very early to implement concurrency, so it would be good to include how they do it. Ava also runs tests concurrently. (Threads vs. processes, at what level, how do they determine defaults etc.)
  2. There have been a number of forks that implement parallelization of ESLint over the years. Even though some are old and eslintrc-based, I'd still like to see those included in this RFC with a summary of how each worked. There's a lot of history here, so we should be sure we're considering all past attempts. Here are a few:
  1. I'm wondering if implementing this as part of ESLint#lintFiles() is the correct abstraction? If each worker needs to create an instance of ESLint, then I wonder if perhaps a separate class that solely manages concurrency would make things a bit cleaner?

@fasttime
Copy link
Member Author

fasttime commented Jan 9, 2025

  1. I'd like to see an exploration of how other tools handle concurrency. I know other tools aren't a one-to-one comparison with ESLint, but there are plenty of tools in the ecosystem that do concurrency. For instance, Jest was very early to implement concurrency, so it would be good to include how they do it. Ava also runs tests concurrently. (Threads vs. processes, at what level, how do they determine defaults etc.)

Yes, it would be interesting to look into other tools to understand how they handle concurrency. This could actually bring in some interesting ideas even if the general approach is different. I was thinking to check Prettier but haven't managed to do that yet. Jest and Ava are also good candidates.

  1. There have been a number of forks that implement parallelization of ESLint over the years. Even though some are old and eslintrc-based, I'd still like to see those included in this RFC with a summary of how each worked. There's a lot of history here, so we should be sure we're considering all past attempts. Here are a few:

Thanks for the list. I missed most of those links while skimming through the discussion in eslint#3565. I'll be sure to go through the items and add a prior art mention.

  1. I'm wondering if implementing this as part of ESLint#lintFiles() is the correct abstraction? If each worker needs to create an instance of ESLint, then I wonder if perhaps a separate class that solely manages concurrency would make things a bit cleaner?

Workers don't need to create a new instance of ESLint each. Only the constructor options and the list of files must be known in each thread, so it makes perfect sense to keep the runtime logic in a separate module/class. I will emphasize this in the wording.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants