-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ramblings about our (lack of) query threading model #6310
Comments
Nice writeup. This proposal makes a lot of sense. We should do a passthrough of the code and see how many places we currently violate the proposed pre-determined query constraint. The sooner we can lock ourselves into this mental model the better. The other thing I'm still curious about is the future path to doing some kind of filtering or transformation. For many cases this probably pushes us in the direction of over-querying + doing more work in-visualizer. But is there a path to declaring that you're interested in a filtered view of the data? |
Yeah I think that can happen naturally over time with incremental improvements to the query DSL. |
I'm new here and aware that this is a very complicated topic, but I solved a related issue in a previous project I was working on. Have you ever considered using persistent data types? It's all based on hash array mapped tries, B-trees and RRB-tree vectors - they have some really cool properties and in my experience this solves 99.9% of all concurrency related issues. You can basically get rid of all locking and everything just becomes simple and nice again 😄 In practice it works a lot like RCU in the Linux kernel: everyone (thread) has a mutable copy of the data so mutating it doesn't affect anyone else, it's internally mutable but externally read-only. So whenever you make a change and want it to be visible to everyone else you just replace an atomic pointer with the new state, whenever you read from it again you get a new updated state. And it's all super cheap to copy (basically just an So really worth looking into imo! |
Great thing that you're bringing up copy-on-write datastructures, they always slip my mind 👍 The specific caching datastructures mentioned in this issue exist pretty much solely for cache locality purposes though, so any tree-based datastructure is out of the picture (think e.g. crunching through millions of scalars for plot rendering). You could of course amortize things by making each entry its own contiguous bucket of data etc, but then you pretty much end up back where we are today (bucket splits & merges require synchronization across several keys, not just pointer swaps, and that's before you get into the whole in-place bucket modifications business...). Similarly, our lack of a data-driven threading model still needs to be addressed in any case: low-level workarounds won't fix a flawed design, and contending on atomics isn't really any better than contending on locks anyhow. That being said, there are a lot of places in the app that already deal exclusively with buckets/chunks of data, and those places could potentially benefit from CoW datastructures: if only to not have to deal with these nasty mutex guards that always turn into a usability and maintenance nightmare in any non-trivial scenario (and we have a lot of those). There might be some maintenance/usability wins to be had here. |
Thanks for clarifying, that's definitely a very interesting challenge 🤓 Here is the original Clojure paper with benchmarks: Hope I didn't misunderstand...there is also not really much contention or synchronization going on with persistent data structures - besides the usual Also one huge advantage is that you can concurrently access data without any cache coherence issues - because it's read only the CPU caches don't get invalidated through writes. With mutable state this is impossible to achieve. If you think about it it also makes sense: the only way for two threads to safely access the same data without any synchronization(!) is with read-only data. |
@teh-cmc I'm sorry - I did some benchmarks out of curiosity and add. It really shines in a concurrent setting though: Anyway, this is just to show that immutable data structures can be efficient - their properties make them auto-scalable. I was actually always wondering how an immutable ECS could look like - maybe I will try to implement it one day :) |
Context
A Rerun frame is divided into a bunch of non-overlapping phases.
By "phase" I mean any logical unit of work. Individual phases might be parallelized internally, but as a whole, phases are purely sequential: there is never 2 phases running concurrently.
Here's a (non-exhaustive) list of the phases that make up a frame today (focusing on the data side of things):
In the before times, phase 3 ("Query and space view execution") was single-threaded, and everything was fine.
Then, Rayon was introduced into the codebase in order to provide parallelism at the space view level, using its work-stealing scheduler.
From that point on, space views were running in parallel, in a non-deterministic and potentially reentrant order.
And everything was fine, because space views were read-only back then.
Then, query caching was introduced, and more specifically it was introduced at the edge, in each individual space view, since we don't have a centralized query engine (yet).
Suddenly, space views were not read-only anymore: each space view would potentially write to a central, multi-tenant cache as it was running.
Parallel reentrant writes (both deletions and insertions) coming in a non-deterministic order into a central cache with multiple layers of granular locks is as bad news as it gets... And still, everything was fine, because space views had zero data overlap in practice (there was no easy way to specify custom queries and no blueprint from code, meaning you effectively never had any data overlap between your views in 99.5% of practical cases).
Finally, configurable queries and blueprint from code were introduced.
Suddenly, it became very easy to end up with multiple space views sharing the exact same component paths, and those previously benign concurrent/reentrant/multi-layered writes started to become a real problem (e.g. this or that).
These problems can always be fixed by adding more and more complexity, fixing one edge case after the other, but at some point you have to wonder whether that's worth the trouble at all: if these writes were problematic to begin with, then by definition it's because they were contending on the same resource -- all this added complexity is merely bringing more contention.
Rather, the issue lies in our threading-model during phase 3 -- or lack thereof: we've reached a point where our parallelism partitioning scheme (space views) has very little relation whatsoever to how the data actually flows through the app, and pain ensues.
Rayon knows nothing about how the data is supposed to flow through our pipes. We do, though.
At the end of phase 2 ("Blueprint handling and query planning"), we have all the information we need to schedule things in an optimal way that actually reflects the data flow [1].
There is still a lot of value to be had in using work-stealing schedulers though, especially in read-only scenarios, if used appropriately.
[1] That's actually not entirely true today -- we should make it so, as discussed below!
Proposal
The proposal is two-fold:
1. We should make it so that all queries are known and planned for at the beginning of the frame, no exception.
A pre-requisite to optimizing the data flow is to know at the start of the frame, before any view starts rendering (i.e. after phase 2 but before phase 3), all the queries that will need to be executed for the entire viewport.
The corollary is that views shouldn't be allowed to run queries in their
execute()
implementation anymore.Rather, each view would implement a new method as part of the
SpaceView
trait where they would have to return all the queries they might possibly need for the upcoming frame:Which of these query results the view actually ends up using in practice is up to the view.
This allows the view to implement data-driven behaviors, at the cost of running some superfluous queries. This is fine: queries are cheap to run and to cache, as opposed to deserialization of results.
(This is only possible because queries have recently become untyped!)
2. We should have appropriate threading models for the read-heavy and write-heavy parts of the query flow.
Running a query is a 5-step process today:
Note: Step 4 & 5 are typed and therefore must run as part of the space view execution phase.
As of today, all 5 of these steps happen during the space view execution phase.
With the full query plan known ahead of the start of the frame (described just above), we now have an opportunity to split this phase into two more fine-grained phases, and implement appropriated threading strategies into each of these.
The first on these two query phases is the write-heavy phase: it encompasses steps 1 to 3 (included) described above, is executed just after the query planning phase, and just before the space view execution phase.
Since both queries and the query cache work at the entity-component level, the logical thing to do is to group all queries that share the same
ComponentPath
in the same task.All tasks can then be executed through a work-stealing scheduler, without contention nor nasty edge cases.
The second query phase is the ready-heavy one and covers steps 4 & 5, and still runs as part of the space view execution phase, like it does today.
Since it is mostly reads, blind work-stealing semantics are fine and even encouraged.
So, all in all we now have:
The text was updated successfully, but these errors were encountered: