-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QR on a single matrix: valgrind reports invalid reads and writes #2328
Comments
Thanks for reporting, we will have a look at this |
Hi @lucbv . Thanks for looking into this. It seems like the problem is related to handling rectangular matrices. If the
there are no errors under valgrind. Digging into the code a bit, but without a full understanding of it, I see that this loop over matrix rows:
that successively removes one row and one column to form the 3x3 partitioned matrix In the original case of the 16x10 matrix, running valgrind with the gdbserver I see that the first invalid read occurs in The test case in the repo for QR appears to only run with square matrices: test without column pivoting:
test 'WithColumnPivoting': kokkos-kernels/batched/dense/unit_test/Test_Batched_TeamVectorQR_WithColumnPivoting.hpp Line 121 in 2c4dd7e
|
Okay, thanks for digging a bit into this, I will run the code in valgrind / gdb as well and hopefully can reproduce and report my observation. The algorithm indeed uses a partitioning in the matrix to perform some operations but it should still work for rectangular matrices. Once I find something promising I will let you know about it : ) |
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well.
The PR above, #2342, has a fix for the rectangular matrices and introduces more tests for the Serial QR feature. The tests are not fully implemented yet but the fix seems to be okay if you want to give it a try. |
This is great. Thank you @lucbv. Using the PR branch (9121f0a) I ran the reproducer under valgrind again and the Running the expanded version of the reproducer: stack at first Trsv invalid readRunning under gdb reports the following values of variables at the point of the first reported invalid read.
Given this loop from
p twice to compute the index into A here (where the invalid read occurs):
valgrind log
|
Okay, I will try to wrap up the PR and get that tested and merged, then I can move on to trsv, hopefully it's not more complicated than the QR fix but writing proper tests is what takes time! |
So I have not looked at it in detail but my guess is that we are assuming the triangular matrix to be stored in a square matrix, size |
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well.
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well. Signed-off-by: Luc <[email protected]>
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well. Signed-off-by: Luc <[email protected]>
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well. Signed-off-by: Luc <[email protected]>
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well. Fixing unused variable error It looks like the Left NoTranspose ApplyQ is not doing the correct thing. Will have a look at that next. Signed-off-by: Luc <[email protected]>
The serial QR algorithms does not have unit-tests and is failing for non square matrices. See issue kokkos#2328. This first commit fixes the issue with rectangular matrices and adds a basic test for that use case. Next will work on adding a test that exercises the interfaces on multiple matrices of different sizes within a parallel_for. Finally equivalent tests will be added for the square case as well. Fixing unused variable error It looks like the Left NoTranspose ApplyQ is not doing the correct thing. Will have a look at that next. Signed-off-by: Luc <[email protected]> Signed-off-by: Luc Berger-Vergiat <[email protected]>
Hello,
Calling
SerialQR
on a single matrix defined asKokkos::View<double[16][10]>
and running with the Kokkos Serial backend results in valgrind invalid read and write errors (pasted below). The reproducer is pasted below.Interestingly, when using the CUDA backend in an expanded version of the reproducer (which includes a result comparison after applying the QR factorization via
ApplyQ
andTrsv
) there are no obvious issues.Note, I'm still figuring out how the QR interface works, hence the single matrix input to QR.
Am I doing anything obviously wrong here? Any help is appreciated.
reproducer
kokkos and kokkos-kernels build
I'm building kokkos (develop @ c2a342b26) and kokkos-kernels (develop @ f26fbca) with the following cmake commands using GCC 12.3.0 on a RHEL9 system.
valgrind errors
The text was updated successfully, but these errors were encountered: