-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cooperative Groups Impl #80
Comments
A basic version of cooperative groups could probably be done, the difficulty with them is that it's a C++ API, which means digging into the C++ code for them in the SDK files, which will probably be very painful. |
Currently experimenting with creating a C++ bridge via: https://github.com/dtolnay/cxx. Bindgen itself does support C++ bindings, but there is a fair number of known limitations. Update: I'll try using bindgen first, and just enable the C++ features (-std=c++11). If I run into any serious difficulties there, I'll cut over to CXX. |
@RDambrosio016 I'm having trouble determining how you were originally generating the cust_raw bindings via bindgen. The setup that is currently in master is a bit non-intuitive, and I don't see any docs for this. I see the I will probably expand things in the script so that:
As it is right now, the script is not portable and does not work on its own. |
Quick update. My approach was off a bit initially. I did indeed need to update the bindgen.sh script for cust_raw, as it was not working correctly. I've updated it to work in a fairly nice way now with minimal updates. However, the cooperative groups API is pretty much all kernel side, so as long as the Next, looks like the real task is to update the cuda_std crate with some gpu_only code which will link to the correct symbols from the cooperative_groups API. Experimenting with that now. |
@RDambrosio016 ok, another update here. Neither bindgen nor CXX seem suited to exposing the cooperative_groups internals to our cuda_std code.
I'm wondering if it would be reasonable to define an extern "C" wrapper around the needed cooperative_groups classes and functions, then we compile that down to PTX, ship it with cuda_std, and then have the cuda_builder just link our wrapper PTX with whatever PTX is generated for users. We would then simply update cuda_std to declare the extern bits and wrap them as needed. I'm just not sure what other options we have. Are you familiar with any good ways to expose the following code to Rust (simplified C++ from cooperative_groups.h): class grid_group : public thread_group_base<details::grid_group_id>
{
_CG_STATIC_CONST_DECL unsigned int _group_id = details::grid_group_id;
friend _CG_QUALIFIER grid_group this_grid();
private:
// .. snip ..
public:
_CG_QUALIFIER void sync() const {
if (!is_valid()) {
_CG_ABORT();
}
details::grid::sync(&_data.grid.gridWs->barrier);
}
// .. snip ..
}
_CG_QUALIFIER grid_group this_grid() {
grid_group gg(details::get_grid_workspace());
return gg;
}
The things that I need most right now are Thoughts? |
Else ... best approach might be to use c-bindgen and expose Rust bits to C++ kernels. Compile those down to PTX and then just launch from the Rust code. |
Ok, after lots of experimentation and dead ends, I've got a working solution here: #87. More to be done, but this proves that there is a viable path forward. Now I just need to make it pretty. |
I believe I am at a point where I need the cooperative groups API. Instead of re-writing my kernel code in C++, or using CXX to bridge the Rust code into C++, I would prefer to implement the Cooperative Groups API instead (at least some portion of it).
I've read the documentation on it a few times now. Not sure if others have already looked into this. Just wanted to touch base if folks have concerns or pointers as I dig into implementation.
The text was updated successfully, but these errors were encountered: