Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorRT] Support Multiple EP Context #23294

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

jingyanwangms
Copy link
Contributor

Description

  • Use CreateEpContextModel from graph_partitioner.cc to save model with context ep. Now multi ep context in a model is supported
  • Updated merging ep context related options from session option and tensorrt option
  • Updated and adding unit test

Supported scenarios:

  • Save/run static single ep context node using engine cache
  • Save/run static single ep context node with embedded ep context info
  • Save/run static multiple ep context node using engine cache
  • Save/run static multiple ep context node with embedded ep context info
  • Save/run dynamic multiple ep context node using engine cache
  • Save/run dynamic multiple ep context node with embedded ep context info

Unsupported scenarios:

  • Subsequent runs with dynamic ep context node where dynamic input dimension changed
    Supporting this will require a call up from execution provider to CreateEpContextModel in graph_partitioner.cc during run time. It will require significant changes in the existing infrastructure.

Motivation and Context

@jywu-msft jywu-msft requested a review from chilo-ms January 10, 2025 17:06
@chilo-ms
Copy link
Contributor

chilo-ms commented Jan 14, 2025

You should modify tensorrt_execution_provider.cc line # 3853 to 3856

      // dump ep context model
      if (dump_ep_context_model_ && ep_context_embed_mode_) {
        UpdateCtxNodeModelEngineContext(model_proto_.get(), reinterpret_cast<char*>(serialized_engine->data()), serialized_engine->size());
        DumpCtxModel(model_proto_.get(), ctx_model_path_);
      }

The code above handles the case when the graph has dynamic shape input(s) and the engine is being updated during inference.
Old TRT EP behavior will update the engine binary embedded in EP Context node and dump the EP Context model to disk.
In this PR to support EP Context model for partitioning, it's graph partitioner which dumps the model to disk, but we still need to think about how to handle the special case here for TRT EP. If not, the new TRT EP might not work for the old app which works on dynamic shape input and ep_context_embed_mode is 1.

@jingyanwangms
Copy link
Contributor Author

if (dump_ep_context_model_ && ep_context_embed_mode_) {

I added a warning to prompt user generate ep context model. Handling this case will require changes to all EP context design. We have confirmed this is a lower priority use case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants