-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frozen heap: a design proposal for an ancient-like heap #36
base: master
Are you sure you want to change the base?
Conversation
The benefit of this feature is to avoid marking and sweeping some part of the heap, and possibly also to prevent the "ancient" part of the heap affecting GC scheduling etc. Most systems which implement something like this do allow pointers from this distinguished part of the heap (called "tenured" in some systems) to the rest of the heap, usually by maintaining a remembered set of references (I'm using "remembered set" here in the most general sense, of any mechanism for efficiently finding pointers from one part of the heap to another, rather than our current implementation for major -> minor pointers). So: could we generalize |
cc @pascal-cuoq |
Hello, I can confirm that TrustInSoft Analyzer would benefit from this feature. TrustInSoft Analyzer is a sound static analyzer for C and C++ programs based on Abstract Interpretation, that records all intermediate steps of the analysis so that the user can inspect how the abstract value for an expression in a dangerous position (think: divisor of an integer division operation) ended up containing forbidden values (think: zero). This allows the user to triage rapidly the alarms emitted by the analyzer into bugs (the forbidden value happens for some inputs and some choices at the points where the real execution is non-deterministic) and false positives (the forbidden value cannot really happen, its presence was a consequence of an over-approximation somewhere, perhaps even one that the analyzer can be convinced not to make with the right tuning). This recording is fundamental to the way the analyzer is used during the “analysis tuning” phase and it is known in advance that these intermediate results will never be discarded. In theory, it's possible to build programs that accumulate a nearly-infinite graph of abstract states when analyzed, while maintaining a constant footprint of new allocations for the actual analysis. In this theoretical setup, it's possible to make the GC overhead of scanning values that are known by the analyzer to be forever alive arbitrarily close to 100%, assuming infinite memory. Realistic values are, for some very specific advanced uses, 62GiB analyses made of, say, 58GiB of intermediate results to be kept forever and 4GiB of transient memory allocated and possibly released as part of making the analysis progress. The only particularity I can think of is that the values we want to freeze are hash-consed, so the new feature should support pointers from weak hashtables to ancient values: nothing conceptually difficult but a corner case nonetheless. We know exactly when a node becomes ancient, and the general case will be that most pointers below the node being frozen are pointing to nodes that have already been frozen earlier. In some cases the node being frozen will itself already have been frozen. EDIT: On second thought, another particularity may be that the value being frozen might ultimately contain pointers to references (I am thinking of localization information which points to the AST which is rife with references, and some of these references are even updated while the analysis is ongoing. There are also local caches for expensive computations, e.g. the size of a C type, which is computed the first time, recursively in the case of aggregate types, and stored in a mutable slot right inside the type. The ideal ancient system for us would automatically stop freezing when it encounters mutable fields, and since I realize that this is not practical, would perhaps allow to declare that freezing should not proceed down some ad-hoc markers? The markers could be registered as permanent roots for the GC. This would have to be done atomically with the choice of a permanent location for the mutable field itself. (Does that last paragraph make sense?) EDIT: On third thought, the OCaml values we need to freeze are memory states, and the problem with mutable fields in memory states entirely comes from the OCaml values representing C variables (that serve as keys in memory states and that can appear on the right-hand side whenever the value of a variable is an address). We would only need to use the unique number that we already assign variables instead of the variable itself, in memory states and in values, and maintain a hashtable from number to variable. |
I would like to mention that as far as our work is concerned, this RFC is only a second-hand account. Notably, our discussions revolved about specific needs of the Coq Proof Assistant, such as doing hash-consing on-the-fly and other things. So far, we have not considered it as a possible first-class, user-facing extension to OCaml, and are not intending an eventual write-up of a RFC with such a feature. The choice of segfaulting when mutating a frozen value raises concern in terms of usability regarding a user-facing language feature. Have you considered other options relying on the write barrier? |
I think it's worth experimenting with a third generation (in addition to the existing "minor heap" - generation 0 - and "shared heap" - generation 1), to which objects are only promoted by some explicit function(s) (either with or without their transitive closure). We'd maintain a remembered set for pointers from this generation to the rest of the heap, and collect this generation with lower frequency than the shared heap (zero frequency, in a first implementation). |
But how do you populate the remembered set while preserving the performance of the write barrier? There is a veto on the implementation of something akin to a page table that can be used to implement this efficiently. |
Could use memory protection? We need the common case in the write barrier to be fast, but maybe don't care so much about rare cases (such as writes to this third generation)? As I say, I think it's a worthwhile experiment. |
Or use the existing remembered set, and filter when we process it? |
|
Rendered
I have this draft RFC waiting on my laptop since November, co-designed with @damiendoligez. Today @NickBarnes and @sadiqj told me that they were thinking along similar lines, so I thought that sharing the RFC draft would be useful.