-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Error in Shape inference of Constant Op #3014
Comments
@AlexandreEichenberger can you please look into this issue. |
Let me look into this. |
Ok, so very simply, we attempt to print an error because the range for an operation is wrong. The range is from [-rank... rank-1], but since rank is zero (possibly after errors in our code), the range is [0, -1] and our error printer asserts because of an illegal range. So it kind of defeat the purpose of having a nice printout for the error. I have fixed that, but I still don't get an error message, so looking into why. At a higher level, we will have to figure out why the rank is perceived as zero. I have seen (in other places) that Microsoft ONNX models have sometimes assumptions that are different from what we think is the ONNX specs. To be determined. I will have a PR to fix at least the error messaging, as it is truly unacceptable to provide such asserts on error for model inputs. If you have the time to determine why we see a rank of zero there, would be much appreciated, keeping in mind that possibly the model is non-conforming (but maybe a simple fix, aka a normalization rule may fix it). |
Ok, there were a few things wrong. We now get the diagnostic again, and we issue warning instead of assert (it was for reporting an error to begin with, so it's better that way). Fix are in PR #3020. |
@amd-abhikulk do you mind looking at the root cause of the error and submit a PR for it? Meaning adding code in the verifier to check the specs, namely
If that is the case in the model, its better to report the root cause of the error, not the subsequent axis issue that seems to derived from that root error (did not personally verify that claim but that seems very plausible). Thanks |
I was trying to lower Mistral-7B-v0.1.onnx model, when I get the following error.
This error doesn't much context into what went wrong.
I did some debugging and found out that it was Concat Op was giving this error here
https://github.com/onnx/onnx-mlir/blob/a5ae8baff62209567eec4cc1e8a621cd36292cce/src/Dialect/ONNX/ONNXOps/Tensor/Concat.cpp#L99C1-L103C1
This happens cause concat received tensors with different ranks and the logic kinda fell apart. IMO
verify
function must check whether all the operands have same rank. (I will send a PR for this soon).Now as to how it got different ranked tensors, that's the actual cause of this error.
The concat Op has a constant
[-1]
Tensor to concatenated to the input tensor. But for some reason the onnx-mlir lowers it toThe constant Op strips the constant tensors Op of it's dimensionality for some reason, and I can't undersrtand why
The text was updated successfully, but these errors were encountered: