Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native functions may cause a crash if GC happens #1034

Open
xxDark opened this issue Dec 30, 2024 · 4 comments
Open

Native functions may cause a crash if GC happens #1034

xxDark opened this issue Dec 30, 2024 · 4 comments

Comments

@xxDark
Copy link

xxDark commented Dec 30, 2024

Question

        FloatBuffer buffer = ByteBuffer.allocateDirect(64).order(ByteOrder.nativeOrder()).asFloatBuffer();
        GL11.glLoadMatrix(buffer);
        // buffer is out of scope
        // Reference.reachabilityFence(buffer);

I have not observed the crash in a real time scenario, but that might happen if a cleaner of a native resource deallocates it at the right moment.

@xxDark
Copy link
Author

xxDark commented Dec 30, 2024

The question is whether the caller should ensure that the buffer or whatever resource remains reachable during a native call or LWJGL should be taking care of this. (with Reference::reachabilityFence for example)

@Spasi
Copy link
Member

Spasi commented Jan 3, 2025

Hey @xxDark,

This is a legitimate concern and something I've been worrying about since the very first lwjgl3 commit. For reference, the issue is explained in detail here:

JVM Anatomy Quark #8: Local Variable Reachability

In the context of LWJGL, the problem may occur in code such as:

// in user code
ByteBuffer buffer = ByteBuffer.allocateDirect(...);
bindingMethod(..., buffer, ...);

// in LWJGL code
void bindingMethod(..., ByteBuffer buffer, ...) {
    long a = memAddress(buffer);
    nbindingMethod(..., a, ...);
}

LWJGL is being naughty here, because it reads the buffer's off-heap memory address, then proceeds to call the native method, which accepts the buffer address, not the buffer object itself. When this code gets JIT compiled (and everything gets inlined nicely), the buffer instance is considered unreachable after the memAddress call, which makes it fair game for GC. If that happens before the native method returns, the buffer's off-heap memory may be freed while still in use.

I've never been able to reproduce this and there have been no relevant bug reports in the 10 years since lwjgl3's release.

Before replying to this issue however, I figured it might be a good idea to retest with latest JVMs and GC algorithms. Turns out, it is indeed possible to reproduce, even on JDK 8. My mistake in previous attempts was using debug mode to identify buffer cleanup (with breakpoints inside JDK) but debug mode effectively disables GC of unreachable local variables (makes sense in hindsight). This time I used normal execution and Cleaner registration to detect cleanup. It indeed can happen and was able to reproduce actual crashes because of it.

So, yeah, bad news so far. The good news is that it's extremely unlikely to happen in actual programs, for the following reasons:

  • LWJGL 3 promotes extensive use of stack-allocated memory (via MemoryStack) and manually-managed memory (via MemoryUtil.memAlloc*/memFree), applicable to the majority of buffer allocations (temporary buffers, or buffers with trivial/clear life-cycles). GC-managed buffers (via BufferUtils, i.e. ByteBuffer.allocateDirect) are rarely used outside of demos & toy programs.
  • When GC-managed buffers are actually necessary (e.g. complicated/unclear life-cycle), there is almost always some kind of strong reference holding on to them, outside of any native calls.
  • Even in legitimate scenarios when that is not true, it's still very hard for a premature free to happen. When a reference registered with a Cleaner (like GC-managed buffers are) becomes unreachable, it is marked as such (becomes a phantom reference) in the next GC cycle, but is not cleaned immediately. The cleanup action happens in the GC cycle after that. So, memory pressure has to be so high, that GC happens TWICE during the same native call.

We also cannot use Reference::reachabilityFence in LWJGL, because of Java 8 compatibility (it was added in JDK 9). Supposedly we could define a custom method and it would achieve the same effect, but I feel it's too late to bother with this issue in lwjgl3. As I said no one has reported this as an actual problem, it's unlikely to ever happen in real programs and there is a very simple workaround for those few buffers that may of legitimate concern: just hold a strong reference to them while in use. That is, leave the responsibility to the caller as you suggested.

Note that in lwjgl4 this is automatically handled for MemorySegment instances by the FFM API itself. Wherever LWJGL decides to use raw pointers (i.e. long values) internally, the new binding generator will produce reachabilityFence calls as needed. So this will never be an issue in lwjgl4.

@furstenheim-goodnotes
Copy link

Hi @Spasi, super clear answer.

Maybe critical flag is an option?

In this fork of your library it is used
https://github.com/CRKatri/lwjgl3/blob/d620cd80295fd1631faaf1cf03d874ab0f1738ba/modules/lwjgl/lz4/src/generated/c/org_lwjgl_util_lz4_LZ4.c#L14

If I read this correctly it prevents gc compactation.

It has it's own problems like possibly leading to OOM

@Spasi
Copy link
Member

Spasi commented Jan 8, 2025

Hey @furstenheim-goodnotes,

We supported critical JNI in the past, but eventually removed it. It was never a supported feature and was eventually removed from the JDK (around versions 16-18). The overhead reduction was impressive in benchmarks, but the GC blocking was catastrophic in non-trivial (especially multithreaded) applications that actually needed the GC to do its job.

It also doesn't eliminate the above issue, because the buffer instance becomes unreachable before the JNI call, i.e. before the GC is actually blocked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants