Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frozen heap: a design proposal for an ancient-like heap #36

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gasche
Copy link
Member

@gasche gasche commented Jan 16, 2024

Rendered

I have this draft RFC waiting on my laptop since November, co-designed with @damiendoligez. Today @NickBarnes and @sadiqj told me that they were thinking along similar lines, so I thought that sharing the RFC draft would be useful.

@NickBarnes
Copy link

NickBarnes commented Jan 16, 2024

The benefit of this feature is to avoid marking and sweeping some part of the heap, and possibly also to prevent the "ancient" part of the heap affecting GC scheduling etc.

Most systems which implement something like this do allow pointers from this distinguished part of the heap (called "tenured" in some systems) to the rest of the heap, usually by maintaining a remembered set of references (I'm using "remembered set" here in the most general sense, of any mechanism for efficiently finding pointers from one part of the heap to another, rather than our current implementation for major -> minor pointers). So: could we generalize caml_modify() for this purpose, rather than forbid such pointers?

@damiendoligez
Copy link
Member

cc @pascal-cuoq

@pascal-cuoq
Copy link

pascal-cuoq commented Jan 18, 2024

Hello,

I can confirm that TrustInSoft Analyzer would benefit from this feature. TrustInSoft Analyzer is a sound static analyzer for C and C++ programs based on Abstract Interpretation, that records all intermediate steps of the analysis so that the user can inspect how the abstract value for an expression in a dangerous position (think: divisor of an integer division operation) ended up containing forbidden values (think: zero). This allows the user to triage rapidly the alarms emitted by the analyzer into bugs (the forbidden value happens for some inputs and some choices at the points where the real execution is non-deterministic) and false positives (the forbidden value cannot really happen, its presence was a consequence of an over-approximation somewhere, perhaps even one that the analyzer can be convinced not to make with the right tuning).

This recording is fundamental to the way the analyzer is used during the “analysis tuning” phase and it is known in advance that these intermediate results will never be discarded. In theory, it's possible to build programs that accumulate a nearly-infinite graph of abstract states when analyzed, while maintaining a constant footprint of new allocations for the actual analysis. In this theoretical setup, it's possible to make the GC overhead of scanning values that are known by the analyzer to be forever alive arbitrarily close to 100%, assuming infinite memory. Realistic values are, for some very specific advanced uses, 62GiB analyses made of, say, 58GiB of intermediate results to be kept forever and 4GiB of transient memory allocated and possibly released as part of making the analysis progress.

The only particularity I can think of is that the values we want to freeze are hash-consed, so the new feature should support pointers from weak hashtables to ancient values: nothing conceptually difficult but a corner case nonetheless.

We know exactly when a node becomes ancient, and the general case will be that most pointers below the node being frozen are pointing to nodes that have already been frozen earlier. In some cases the node being frozen will itself already have been frozen.

EDIT:

On second thought, another particularity may be that the value being frozen might ultimately contain pointers to references (I am thinking of localization information which points to the AST which is rife with references, and some of these references are even updated while the analysis is ongoing. There are also local caches for expensive computations, e.g. the size of a C type, which is computed the first time, recursively in the case of aggregate types, and stored in a mutable slot right inside the type. The ideal ancient system for us would automatically stop freezing when it encounters mutable fields, and since I realize that this is not practical, would perhaps allow to declare that freezing should not proceed down some ad-hoc markers? The markers could be registered as permanent roots for the GC. This would have to be done atomically with the choice of a permanent location for the mutable field itself.

(Does that last paragraph make sense?)

EDIT:

On third thought, the OCaml values we need to freeze are memory states, and the problem with mutable fields in memory states entirely comes from the OCaml values representing C variables (that serve as keys in memory states and that can appear on the right-hand side whenever the value of a variable is an address). We would only need to use the unique number that we already assign variables instead of the variable itself, in memory states and in values, and maintain a hashtable from number to variable.

@gadmm
Copy link

gadmm commented Feb 9, 2024

I would like to mention that as far as our work is concerned, this RFC is only a second-hand account. Notably, our discussions revolved about specific needs of the Coq Proof Assistant, such as doing hash-consing on-the-fly and other things. So far, we have not considered it as a possible first-class, user-facing extension to OCaml, and are not intending an eventual write-up of a RFC with such a feature.

The choice of segfaulting when mutating a frozen value raises concern in terms of usability regarding a user-facing language feature. Have you considered other options relying on the write barrier?

@NickBarnes
Copy link

I think it's worth experimenting with a third generation (in addition to the existing "minor heap" - generation 0 - and "shared heap" - generation 1), to which objects are only promoted by some explicit function(s) (either with or without their transitive closure). We'd maintain a remembered set for pointers from this generation to the rest of the heap, and collect this generation with lower frequency than the shared heap (zero frequency, in a first implementation).

@gadmm
Copy link

gadmm commented Feb 9, 2024

But how do you populate the remembered set while preserving the performance of the write barrier? There is a veto on the implementation of something akin to a page table that can be used to implement this efficiently.

@NickBarnes
Copy link

Could use memory protection? We need the common case in the write barrier to be fast, but maybe don't care so much about rare cases (such as writes to this third generation)? As I say, I think it's a worthwhile experiment.

@NickBarnes
Copy link

Or use the existing remembered set, and filter when we process it?

@sadiqj
Copy link

sadiqj commented Feb 10, 2024

  1. The proposed Gc.freeze API has some very sharp edges that I think will make it difficult to use, especially in larger codebases.

  2. The 5.x runtime already supports off-heap blocks - they just need to have a valid object header with the NOT_MARKABLE colour. Fields in these blocks that point to non-NOT_MARKABLE blocks can be registered as global roots. Global roots could be made quicker as scanning is serialised at present.

  3. Given we have a method of having off-heap unscanned blocks which could also point in to the normal heap - the complexity actually comes from figuring out how to move them there:

    • Do we need a design that doesn't promote all reachable memory? We can have references in to the normal heap and I feel this is also difficult to reason about in large codebases.
    • What if instead of a move we provided a copy?
  4. Further if we restricted this to only immutable blocks (header bit which Gc.freeze could check?) then we can remove NOT_MARKABLE to NOT_MARKABLE references from the global roots at scanning (or maybe compaction).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants