[TensorRT] Support Multiple EP Context #23294

jingyanwangms · 2025-01-08T21:59:19Z

Description

Use CreateEpContextModel from graph_partitioner.cc to save model with context ep. Now multi ep context in a model is supported
Updated merging ep context related options from session option and tensorrt option
Updated and adding unit test

Supported scenarios:

Save/run static single ep context node using engine cache
Save/run static single ep context node with embedded ep context info
Save/run static multiple ep context node using engine cache
Save/run static multiple ep context node with embedded ep context info
Save/run dynamic multiple ep context node using engine cache
Save/run dynamic multiple ep context node with embedded ep context info

Unsupported scenarios:

Subsequent runs with dynamic ep context node where dynamic input dimension changed
Supporting this will require a call up from execution provider to CreateEpContextModel in graph_partitioner.cc during run time. It will require significant changes in the existing infrastructure.

Motivation and Context

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

chilo-ms · 2025-01-14T05:49:19Z

You should modify tensorrt_execution_provider.cc line # 3853 to 3856

      // dump ep context model
      if (dump_ep_context_model_ && ep_context_embed_mode_) {
        UpdateCtxNodeModelEngineContext(model_proto_.get(), reinterpret_cast<char*>(serialized_engine->data()), serialized_engine->size());
        DumpCtxModel(model_proto_.get(), ctx_model_path_);
      }

The code above handles the case when the graph has dynamic shape input(s) and the engine is being updated during inference.
Old TRT EP behavior will update the engine binary embedded in EP Context node and dump the EP Context model to disk.
In this PR to support EP Context model for partitioning, it's graph partitioner which dumps the model to disk, but we still need to think about how to handle the special case here for TRT EP. If not, the new TRT EP might not work for the old app which works on dynamic shape input and ep_context_embed_mode is 1.

jingyanwangms · 2025-01-22T22:29:55Z

if (dump_ep_context_model_ && ep_context_embed_mode_) {

I added a warning to prompt user generate ep context model. Handling this case will require changes to all EP context design. We have confirmed this is a lower priority use case

jingyanwangms added 6 commits December 21, 2024 06:58

Support EP context partition

00cecb7

Unit test and perftest fix

bf7c0df

perf test and EP updates

e8880c5

Updated session option merge and unit test

48c882e

Clean up

842b3e1

Removee debug logs

a63973e

jywu-msft requested a review from chilo-ms January 10, 2025 17:06

chilo-ms reviewed Jan 14, 2025

View reviewed changes

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc Show resolved Hide resolved

chilo-ms reviewed Jan 14, 2025

View reviewed changes

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc Show resolved Hide resolved

jingyanwangms added 4 commits January 15, 2025 00:10

Set correct ep_cache_context

0ac2ac7

Fix orphaned output

cfaee44

Fix GetCapabilities logic

41958dc

Add tensor(bool) to valid EPContext input type (Zcode model error)

44818f1

jingyanwangms added 2 commits January 24, 2025 00:26

merge with main

1fa99bd

Skip memory test

90c2554

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorRT] Support Multiple EP Context #23294

[TensorRT] Support Multiple EP Context #23294

jingyanwangms commented Jan 8, 2025

chilo-ms commented Jan 14, 2025 •

edited

Loading

jingyanwangms commented Jan 22, 2025

[TensorRT] Support Multiple EP Context #23294

Are you sure you want to change the base?

[TensorRT] Support Multiple EP Context #23294

Conversation

jingyanwangms commented Jan 8, 2025

Description

Motivation and Context

chilo-ms commented Jan 14, 2025 • edited Loading

jingyanwangms commented Jan 22, 2025

chilo-ms commented Jan 14, 2025 •

edited

Loading