-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Linux docker images #23244
base: main
Are you sure you want to change the base?
Update Linux docker images #23244
Conversation
@tianleiwu , there is a strange error from CUDNN frontend , which was caused by upgrading CUDNN from 9.5 to 9.6. Could you please help me take a look? |
Tried upgrade both cudnn-frontend and cudnn, and submitted a test build: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1579205&view=results Worst case is that we may add an fallback to cudnn backend directly as before for the case that cannot be handled by cudnn frontend. |
The error was: [E:onnxruntime:yolov3, sequential_executor.cc:505 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'conv2d_2_0' Status Message: Failed to initialize CUDNN Frontend/onnxruntime_src/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:99 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:91 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, common::Status> = void] CUDNN_FE failure 8: HEURISTIC_QUERY_FAILED ; GPU=0 ; hostname=98d137446008 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/nn/conv.cc ; line=225 ; expr=s_.cudnn_fe_graph->create_execution_plans({heur_mode}); |
@gedoensmax, @JTischbein, is it a known issue that cudnn 9.6 has regression of support convolution for yolo v3? Here is cudnn 9.6 debug log:
|
c9c52e1
to
5944339
Compare
I downgraded the CUDA 12 image's CUDNN version back to 9.5, then the test passed.
It means we cannot same the same cudnn version for both CUDA 11 and 12. But, that's ok. |
69ddb90
to
494982c
Compare
The new images contain the following updates:
Also, this PR updated some source code to make the CPU EP's source code compatible with GCC 14.