-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Runtime output buffer optimization #3276
Conversation
d0ef3cd
to
377248e
Compare
0a98180
to
4a5f0d1
Compare
I think this PR doesn't have to do with fake tensor as output shape was inferred from trt function( |
core/runtime/execute_engine.cpp
Outdated
@@ -263,19 +284,15 @@ std::vector<at::Tensor> execute_engine(std::vector<at::Tensor> inputs, c10::intr | |||
output_profiler_guard = | |||
std::make_unique<torch::autograd::profiler::RecordProfile>(compiled_engine->output_profile_path); | |||
} | |||
if ((false == compiled_engine->use_pre_allocated_outputs) || shape_changed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!compiled_engine->use_pre_allocated_outputs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a context manager to enable this across subgraphs
core/runtime/TRTEngine.h
Outdated
struct RuntimeStates { | ||
bool need_cudagraphs_record; | ||
bool can_use_pre_allocated_outputs; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If weight streaming budget is changed in cuda graph mode, new capture is required.
weight streaming state will be added
@@ -17,6 +17,9 @@ | |||
"Torch-TensorRT runtime is not available", | |||
) | |||
class TestCudagraphsCPP(TestCase): | |||
def tearDown(self): | |||
# Reset to default cuda graph mode after each test | |||
torch_tensorrt.runtime.set_cudagraphs_mode(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if multiple tests are running by pytest, other test can run on cuda graph mode. Ensure to turn off cuda graph mode after test.
"Error while setting the tensor address for shape inputs"); | ||
|
||
if (CUDAGRAPHS_MODE) { | ||
// @peri044 I dont know if this makes sense since they are supposed to be GPU buffers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For shape tensor input, TRT requires them to be on CPU. I'm not sure though if this holds true in the CUDA graphs case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm trying to find the use case that is using isShapeInferenceIO
40f3eaf
to
b3bf3b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Latency hiding by creating the output tensor for next output buffer
Fixes #3275
Type of change
Please delete options that are not relevant and/or add your own.
Checklist: