-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 [Bug] Encountered bug when using CudaGraph in Torch-TensorRT #3349
Comments
@keehyuna my trt is 2.5.0 ,and I add pr https://github.com/pytorch/TensorRT/pull/3289/files to slove dynamic shape error, and add https://github.com/pytorch/TensorRT/pull/3310/files to slove mutex,but when I use the cudagraph, it has error |
you can just reproduce it by this |
Thanks @yjjinjie , I could reproduce the issue with your sample. If https://github.com/pytorch/TensorRT/pull/3310/files is applied, we expect Input/output Name logging are atomic.
|
@keehyuna ok,the model is old,but I export the new model using new trt (with pr3310) ,it also has problem. because the problem is ossurs in runtime, the model script version is not releated
|
@keehyuna your trt version contains the pr 3310 or not? the problem is runtime, whatever the model is old or new with pr3310, the logging is always aotmic, it is not releated to scripted model. if your trt is not contains pr3310, the thread result is not equal to process result.
|
@keehyuna I find the resnet has no problem too, you can change the threads 10->8 in test_model_trt_cudagraph, my model also has no problem. I think it is releated with the multi-threads or the speed ,or the mutex in cudagraph? |
you are right, I was testing with wrong trt-version. Checking on it |
Hi @yjjinjie |
@keehyuna yes. my model is embedding+ trt(dense) , how can I give you model source? Are you Chinese?We can discuss by voice |
@keehyuna if trt don't support some op, it will always use torch.api + trt module,then it will always has this issue? |
@keehyuna my code is so large, you can use the code and container. https://github.com/alibaba/TorchEasyRec
|
Thank @yjjinjie. Your model is wrapper module that contains pytorch + trt model. |
yes, we have recent change to support cudagraph with graph break. https://pytorch.org/TensorRT/tutorials/_rendered_examples/dynamo/torch_export_cudagraphs.html |
Bug Description
when I use cudagraph, torch_tensorrt.runtime.set_cudagraphs_mode(True), the program occasional issue
To Reproduce
Steps to reproduce the behavior:
my code is so large, and use the multi threads to predict the model
Expected behavior
Environment
The text was updated successfully, but these errors were encountered: