You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
my model outputs a tuple of mu and logvar. for the mu, there are 4 columns (features), consisting of 3 features of type A and 1 feature of type B. you can see the FinalEncoder.forward() code in the gist below for the details.
as sene below, for 3 features of type A, only the first feature matches the pytorch model. the 2nd and 3rd features are total garbage. for the type B feature, it matches the pytorch model.
this used to work perfectly fine on the previous version of torch-TensorRT (2.2.0) before I updated to 2.3.0. in fact, if you look at the model code, i had to write the trt_compat_mode specially for 2.3.0. When I was using 2.2.0, the original pytorch forward() actually compiled fine and gave the expected speedups (4 to 5 times)
i've tried unrolling the loop myself (so hardcoding the indices provided into torch.index_select() just in case there was something wrong when tracing the for-loop. it still didn't fix the issue.
i tried to do stuff with torch._constrain_as_size(bs or num_inv_feats) but didn't find success as torch complained that those are not of type SymInt.
i have also tried changing all the .view() to .reshape() but that didn't change anything.
i tried adding .clone(), .contiguous() and that didn't help either.
also something weird is that I was forced to use torch.index_select(). previously in torch_tensorrt 2.2.0, I could do plain slice-indexing and it compiled just fine, something like curr_input = masked_input[:, i, ...].
i tried to revert to torch_tensorrt 2.2.0, but very strangely, it rejects the use of torch.index_select() lol! with 2.2.0, i have to set trt_compat_mode=False, and then it compiles fine, AND it gives the correct outputs
orioninthesky98
changed the title
🐛 [Bug] compiled model gives different outputs from torch model
🐛 [Bug] compiled model gives different outputs from torch model (used to work on tensorRT 2.2.0)
Jul 9, 2024
orioninthesky98
changed the title
🐛 [Bug] compiled model gives different outputs from torch model (used to work on tensorRT 2.2.0)
🐛 [Bug] compiled model gives different outputs from torch model (used to work on torch_tensorrt 2.2.0)
Jul 9, 2024
Hi @orioninthesky98 thanks for the details.
I'm able to get the same results of torch_tensorrt and pytorch models by using the repro (a little changes) you gave:
Bug Description
my model outputs a tuple of
mu
andlogvar
. for themu
, there are 4 columns (features), consisting of 3 features of type A and 1 feature of type B. you can see the FinalEncoder.forward() code in the gist below for the details.as sene below, for 3 features of type A, only the first feature matches the pytorch model. the 2nd and 3rd features are total garbage. for the type B feature, it matches the pytorch model.
this used to work perfectly fine on the previous version of torch-TensorRT (2.2.0) before I updated to 2.3.0. in fact, if you look at the model code, i had to write the
trt_compat_mode
specially for 2.3.0. When I was using 2.2.0, the original pytorch forward() actually compiled fine and gave the expected speedups (4 to 5 times)torch mu
tensorRT mu, 2nd & 3rd column is wrong
To Reproduce
Steps to reproduce the behavior:
this is the model code
https://gist.github.com/orioninthesky98/d0a987197950bc0b945d28b240d5bc53#file-model-py-L327-L352
the problematic part is highlighted in the gist. you can see the for-loop here and somehow only the 1st feature (
inv_mu
/inv_logvar
) is correct but the remaining 2 are garbagei've tried unrolling the loop myself (so hardcoding the indices provided into
torch.index_select()
just in case there was something wrong when tracing the for-loop. it still didn't fix the issue.i tried to do stuff with torch._constrain_as_size(bs or num_inv_feats) but didn't find success as torch complained that those are not of type SymInt.
i have also tried changing all the .view() to .reshape() but that didn't change anything.
i tried adding .clone(), .contiguous() and that didn't help either.
also something weird is that I was forced to use
torch.index_select()
. previously in torch_tensorrt 2.2.0, I could do plain slice-indexing and it compiled just fine, something likecurr_input = masked_input[:, i, ...]
.i tried to revert to torch_tensorrt 2.2.0, but very strangely, it rejects the use of
torch.index_select()
lol! with 2.2.0, i have to settrt_compat_mode=False
, and then it compiles fine, AND it gives the correct outputsfor the compilation I am using this code:
Expected behavior
compiled model outputs need to match torch model outputs, at least in approximation
Environment
conda
,pip
,libtorch
, source): pipAdditional context
The text was updated successfully, but these errors were encountered: