Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C#: use stackalloc where appropriate for caller-allocated buffers #1012

Open
dicej opened this issue Aug 1, 2024 · 8 comments · May be fixed by #1144
Open

C#: use stackalloc where appropriate for caller-allocated buffers #1012

dicej opened this issue Aug 1, 2024 · 8 comments · May be fixed by #1144
Assignees
Labels
gen-c# Related to the C# code generator

Comments

@dicej
Copy link
Collaborator

dicej commented Aug 1, 2024

Per dotnet/runtimelab#2614 (comment), using stackalloc would likely be more efficient and avoid the need for GC pinning.

@jsturtevant
Copy link
Collaborator

I looked into this, my concern with using stackalloc here is that the list can be any length since the buffer is constructed based on the length of the List passed in.

@pavelsavara Maybe ArrayPool<T> is the appropriate option here?

self.needs_cleanup = true;
uwrite!(
self.src,
"
var {buffer_size} = {size} * (nuint){list}.Count;
var {address} = NativeMemory.AlignedAlloc({buffer_size}, {align});
cleanups.Add(()=> NativeMemory.AlignedFree({address}));
for (int {index} = 0; {index} < {list}.Count; ++{index}) {{
{ty} {block_element} = {list}[{index}];
int {base} = (int){address} + ({index} * {size});
{body}
}}
"

@pavelsavara
Copy link

stackalloc is really much cheaper/faster AFAIK. It doesn't create GC pressure and it doesn't need to do any (heap) synchronization.

But it would only work if the current stack frame is alive during the call to the external function (WASI import).
I mean, that we can't have LowerList helper function with stackalloc in it.

Can we calculate the size of the buffer during code gen ?
If so we could conditionally generate stackalloc when the size is under some threshold. Say 1KB?

And if the payload size is dynamic, we can also generate the condition and both branches into C# code.

The 3rd option is to pin the original array and pass it's address without copying it.
That would work only if the element is blittlable.

  • There is fixed keyword for that.
  • Or GCHandle.Alloc()
  • Or Memory<T>.Pin()

@AaronRobinsonMSFT and his team have lot of experience doing this.

They also understand backward compatibility consequences, although this looks like implementation detail right now.

Also note that cleanups.Add(()=> NativeMemory.AlignedFree({address})); will allocate closure.

@pavelsavara
Copy link

Maybe until we get to WASI p3 or 1.0, this is premature optimization.

@AaronRobinsonMSFT
Copy link

@pavelsavara Thanks for including me on this.

I looked into this, my concern with using stackalloc here is that the list can be any length since the buffer is constructed

That is correct. We address this by having the marshaller accept a buffer (typically stackalloc). If it exceeds a minimum size (BufferSize), we perform an allocation. This pattern is most common in our string marshallers - see Utf8StringMarshaller as an example.

Reference our design docs for other considerations.

/cc @jkoritzinsky

If so we could conditionally generate stackalloc when the size is under some threshold. Say 1KB?

Yep, see the interop team's pattern in the response above. You will need to experiment with the size. It is too easy to exceed the stack and cause all sorts of issues and performance cliffs when performing interop.

That would work only if the element is blittlable.

At the interop boundary this should be a general requirement, but I'm not sure if it is practical in the WASI world.

There is fixed keyword for that.

+1

Or GCHandle.Alloc()

I'm not sure how this would solve the interop issue considering if it is needed, how would unmanaged code leverage the GCHandle contents?

Also note that cleanups.Add(()=> NativeMemory.AlignedFree({address})); will allocate closure.

Agree. I would consider handling this in a stack allocated struct that implements IDisposable or a ref struct with a Dispose() method. Note that Roslyn allows a ref struct with a Dispose(), not implementing IDisposable, to be used in a using statement.

@jsturtevant
Copy link
Collaborator

Can we calculate the size of the buffer during code gen ?
If so we could conditionally generate stackalloc when the size is under some threshold. Say 1KB?

And if the payload size is dynamic, we can also generate the condition and both branches into C# code.

This particular payload size is dynamic, so the we should use a pattern like Utf8StringMarshaller.

That would work only if the element is blittlable.

At the interop boundary this should be a general requirement, but I'm not sure if it is practical in the WASI world.

There is fixed keyword for that.

Awesome, We using a variety of all these patterns. See for the latest #1138

Or GCHandle.Alloc()

I'm not sure how this would solve the interop issue considering if it is needed, how would unmanaged code leverage the GCHandle contents?

You're right this won't work for this particular scenario since the memory needs to be layed out according to the ABI, it does work with the primative/canonical types.

Also note that cleanups.Add(()=> NativeMemory.AlignedFree({address})); will allocate closure.

Agree. I would consider handling this in a stack allocated struct that implements IDisposable or a ref struct with a Dispose() method. Note that Roslyn allows a ref struct with a Dispose(), not implementing IDisposable, to be used in a using statement.

I'll open an issue to clean this up

@AaronRobinsonMSFT
Copy link

Awesome, We using a variety of all these patterns. See for the latest #1138

From that PR:

// Despite the name GCHandle.Alloc here this does not actually allocate memory on the heap.

This is slightly misleading. Yes it isn't reallocating the object, but it is allocating a GCHandle, which is from a special resource pool that can create a similar, but different, kind of GC pressure.

You're right this won't work for this particular scenario since the memory needs to be layed out according to the ABI, it does work with the primative/canonical types.

I'd be curious to see why this is needed. Why isn't the fixed? Does the object need to to be pinned for longer than the associated call?

@jsturtevant
Copy link
Collaborator

This is slightly misleading. Yes it isn't reallocating the object, but it is allocating a GCHandle, which is from a special resource pool that can create a similar, but different, kind of GC pressure.

👍 makes sense, I'll adjust the comment to be more specific

I'd be curious to see why this is needed. Why isn't the fixed? Does the object need to to be pinned for longer than the associated call?

For instance, if you have a wit function with some parameters that are lists that has variants list-of-variants: func(a: list<bool>, b: list<result>, c: list<my-errno>) we need to convert the c# equivalent to a ResultType into the Wasm ABI type in linear memory. Passing a pointer to where the list in c# lives would result in an error since the ABI doesn't know how it is laid out in C# memory. It works well for the Blittable types since they have the same layout.

@jsturtevant
Copy link
Collaborator

I may have misinterpreted the last message. Where you saying why use GCHandle.Alloc vs fixed in the case of canonical lists? Is there an advantage to using fixed over GCHandle.Alloc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gen-c# Related to the C# code generator
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants