-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CUDA build support and some code refinements #581
Conversation
operators/math/cuda/negpos.cu
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@souptc , @RandySheriffH , it looks like we at least need 4 files to write a simplest cuda kernel due to the limitation of nvcc, is there any good idea for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's keep it this way for bypass NVCC incompatibilities for now.
Hi @wenbingl Is it possible to support a general Cuda Op? This op is only responsible for converting tensor to DLPACK and collect output tensor. |
The current goal is to implement a kernel just like we did in onnxruntime. Are you suggesting how we can support the CUDA custom op in Python, which hasn't been implemented yet? |
Yes, It seems quite useful if we have such a mechanism to support user customized cuda kernel but no touch python inside the graph as what ort-training did (python-op). |
It's a good idea and it's another 'extensions' for the current custom op mechanism. Do you have any use case or a good torch extensions/TVM ops that can be integrated? |
Yes, PagedAttention , Flashattention, memory_efficent_attn Especially, all these kernels can comunicate with ORT by DLPACK Tensor. |
No description provided.