onnxruntime-python on AWS #23291

yaniv5678 · 2025-01-08T15:27:01Z

Describe the issue

Hi,
When i run pip install onnxruntime or pip install onnxruntime-gpu on EC2 (Amazon Linux based) the package server can't retrieve 1.20.1, but older versions.
Is it possible to support 1.20.1 (onnxruntime CPU based, as well as CUDA based onnxruntime-gpu package) for Amazon Linux based machines?
Thanks!

To reproduce

pip install onnxruntime

Urgency

No response

Platform

Linux

OS Version

Amazon Linux

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.20.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

snnn · 2025-01-08T19:12:08Z

What is your Amazon Linux version? We recommend using the latest one.

m0hammadjaan · 2025-01-09T08:09:40Z

@yaniv5678 @snnn
I have an EC2 instance of type g5g.xlarge. I have installed the following:

CUDA-Toolit: Cuda compilation tools, release 12.4, V12.4.131
CUDNN Version: 9.6.0
Python: 3.12
Pytorch: Compiled from source as for aarch64 v2.5 is not available.
Onnxruntime: Compiled from source as the distrubution package is not available for the architecture
Architecture: aarch64
OS: Amazon Linux 2023

In order to make it work, I have to build it from the source code:
Please do the following inorder to build it from the source:

1. Clone the Repo:

git clone https://github.com/microsoft/onnxruntime.git --recursive
cd onnxruntime
git checkout v1.20.1

2. Update Submodule:

git submodule update --init --recursive

3. Build oxxruntime-gpu:

./build.sh --config Release \
           --build_shared_lib \
           --use_cuda \
           --cuda_home <path/to/cuda> \
           --cudnn_home <path/to/cudnn> \
           --enable_pybind \
           --build_wheel \
           --update --build --parallel <NUM_CORES>

4. Install the wheel using pip

sudo pip3 install build/Linux/Release/dist/*.whl

This installed perfectly, but while I convert my pytorch model to onnx I am getting the following error:

EP Error: [ONNXRuntimeError] : 11 : EP_FAIL : Non-zero status code returned while running Conv node. Name:'/features/features.0/Conv' Status Message: Failed to initialize CUDNN Frontend/home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:99 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:91 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN_FE failure 11: CUDNN_BACKEND_API_FAILED ; GPU=0 ; hostname=sg-gpu-1 ; file=/home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/nn/conv.cc ; line=224 ; expr=s_.cudnn_fe_graph->build_operation_graph(handle); 


with the cudnn frontend json:
{"context":{"compute_data_type":"FLOAT","intermediate_data_type":"FLOAT","io_data_type":"FLOAT","name":"","sm_count":-1},"cudnn_backend_version":"9.6.0","cudnn_frontend_version":10700,"json_version":"1.0","nodes":[{"compute_data_type":"FLOAT","dilation":[1,1],"inputs":{"W":"w","X":"x"},"math_mode":"CROSS_CORRELATION","name":"","outputs":{"Y":"::Y"},"post_padding":[2,2],"pre_padding":[2,2],"stride":[4,4],"tag":"CONV_FPROP"}],"tensors":{"::Y":{"data_type":"FLOAT","dim":[1,64,55,55],"is_pass_by_value":false,"is_virtual":false,"name":"::Y","pass_by_value":null,"reordering_type":"NONE","stride":[193600,3025,55,1],"uid":0,"uid_assigned":false},"w":{"data_type":"FLOAT","dim":[64,3,11,11],"is_pass_by_value":false,"is_virtual":false,"name":"w","pass_by_value":null,"reordering_type":"NONE","stride":[363,121,11,1],"uid":1,"uid_assigned":true},"x":{"data_type":"FLOAT","dim":[1,3,224,224],"is_pass_by_value":false,"is_virtual":false,"name":"x","pass_by_value":null,"reordering_type":"NONE","stride":[150528,50176,224,1],"uid":0,"uid_assigned":false}}} using ['CUDAExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CPUExecutionProvider'] and retrying.
2025-01-08 12:06:10.797719929 [E:onnxruntime:Default, cudnn_fe_call.cc:33 CudaErrString<cudnn_frontend::error_object>] CUDNN_BACKEND_TENSOR_DESCRIPTOR cudnnFinalize failed cudnn_status: CUDNN_STATUS_SUBLIBRARY_LOADING_FAILED
2025-01-08 12:06:10.797924540 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/features/features.0/Conv' Status Message: Failed to initialize CUDNN Frontend/home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:99 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/cudnn_fe_call.cc:91 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, SUCCTYPE, const char*, const char*, int) [with ERRTYPE = cudnn_frontend::error_object; bool THRW = true; SUCCTYPE = cudnn_frontend::error_code_t; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN_FE failure 11: CUDNN_BACKEND_API_FAILED ; GPU=0 ; hostname=sg-gpu-1 ; file=/home/ec2-user/onnxruntime/onnxruntime/core/providers/cuda/nn/conv.cc ; line=224 ; expr=s_.cudnn_fe_graph->build_operation_graph(handle); 

with the cudnn frontend json:
{"context":{"compute_data_type":"FLOAT","intermediate_data_type":"FLOAT","io_data_type":"FLOAT","name":"","sm_count":-1},"cudnn_backend_version":"9.6.0","cudnn_frontend_version":10700,"json_version":"1.0","nodes":[{"compute_data_type":"FLOAT","dilation":[1,1],"inputs":{"W":"w","X":"x"},"math_mode":"CROSS_CORRELATION","name":"","outputs":{"Y":"::Y"},"post_padding":[2,2],"pre_padding":[2,2],"stride":[4,4],"tag":"CONV_FPROP"}],"tensors":{"::Y":{"data_type":"FLOAT","dim":[1,64,55,55],"is_pass_by_value":false,"is_virtual":false,"name":"::Y","pass_by_value":null,"reordering_type":"NONE","stride":[193600,3025,55,1],"uid":0,"uid_assigned":false},"w":{"data_type":"FLOAT","dim":[64,3,11,11],"is_pass_by_value":false,"is_virtual":false,"name":"w","pass_by_value":null,"reordering_type":"NONE","stride":[363,121,11,1],"uid":1,"uid_assigned":true},"x":{"data_type":"FLOAT","dim":[1,3,224,224],"is_pass_by_value":false,"is_virtual":false,"name":"x","pass_by_value":null,"reordering_type":"NONE","stride":[150528,50176,224,1],"uid":0,"uid_assigned":false}}}

I am getting the above error of the following code:

def to_numpy(tensor):
    return tensor.detach().gpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(input_batch)}
ort_outs = ort_session.run(None, ort_inputs)

Prints which confirms that everting is installed perfectly:

print("Pytorch CUDA:", torch.cuda.is_available())
print("Available Providers:", onnxruntime.get_available_providers())
print("Active Providers for this session:", ort_session.get_providers())

Output:

Pytorch CUDA: True
Available Providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
Active Providers for this session: ['CUDAExecutionProvider', 'CPUExecutionProvider']

Please find the detailed issue at: #23301

yaniv5678 changed the title ~~onnxruntime-python on AWSS~~ onnxruntime-python on AWS Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onnxruntime-python on AWS #23291

onnxruntime-python on AWS #23291

yaniv5678 commented Jan 8, 2025 •

edited

Loading

snnn commented Jan 8, 2025

m0hammadjaan commented Jan 9, 2025

onnxruntime-python on AWS #23291

onnxruntime-python on AWS #23291

Comments

yaniv5678 commented Jan 8, 2025 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

snnn commented Jan 8, 2025

m0hammadjaan commented Jan 9, 2025

yaniv5678 commented Jan 8, 2025 •

edited

Loading