Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QR on a single matrix: valgrind reports invalid reads and writes #2328

Open
cwsmith opened this issue Sep 5, 2024 · 7 comments
Open

QR on a single matrix: valgrind reports invalid reads and writes #2328

cwsmith opened this issue Sep 5, 2024 · 7 comments
Assignees
Labels

Comments

@cwsmith
Copy link

cwsmith commented Sep 5, 2024

Hello,

Calling SerialQR on a single matrix defined as Kokkos::View<double[16][10]> and running with the Kokkos Serial backend results in valgrind invalid read and write errors (pasted below). The reproducer is pasted below.

Interestingly, when using the CUDA backend in an expanded version of the reproducer (which includes a result comparison after applying the QR factorization via ApplyQ and Trsv) there are no obvious issues.

Note, I'm still figuring out how the QR interface works, hence the single matrix input to QR.

Am I doing anything obviously wrong here? Any help is appreciated.

reproducer

#include <KokkosBatched_QR_Decl.hpp>     //KokkosBlas::QR
#include <KokkosBatched_Util.hpp>        //KokkosBlas::Algo
#include <Kokkos_Core.hpp>

void testQR() {
  typedef Kokkos::View<double[16][10]> MatrixViewType;
  typedef Kokkos::View<double[10]> ColVectorViewType;
  typedef Kokkos::View<double[10]> ColWorkViewType;

  MatrixViewType A("A");
  ColVectorViewType t("t");
  ColWorkViewType w("w");

  // roughly following
  // kokkos-kernels/batched/dense/unit_test/Test_Batched_TeamVectorQR.hpp
  typedef KokkosBlas::Algo::QR::Unblocked AlgoTagType;
  Kokkos::parallel_for("serialQR", 1, KOKKOS_LAMBDA(int) {
        // compute the QR factorization of A and store the results in A and t
        // (tau) - see the lapack dgeqp3(...) documentation:
        // www.netlib.org/lapack/explore-html-3.6.1/dd/d9a/group__double_g_ecomputational_ga1b0500f49e03d2771b797c6e88adabbb.html
        KokkosBatched::SerialQR<AlgoTagType>::invoke(A, t, w);
      });
}

int main(int argc, char **argv) {
  Kokkos::ScopeGuard scope_gaurd(argc, argv);
  testQR();
}

kokkos and kokkos-kernels build

I'm building kokkos (develop @ c2a342b26) and kokkos-kernels (develop @ f26fbca) with the following cmake commands using GCC 12.3.0 on a RHEL9 system.

bdir=buildKokkosSerial
cmake -S kokkos -B $bdir \
  -DBUILD_SHARED_LIBS=on \
  -DCMAKE_CXX_COMPILER=g++ \
  -DKokkos_ENABLE_SERIAL=ON \
  -DKokkos_ENABLE_OPENMP=off \
  -DKokkos_ENABLE_DEBUG=off \
  -DCMAKE_INSTALL_PREFIX=$PWD/$bdir/install
cmake --build $bdir -j 24 --target install

bdir=buildKokkosKernelsSerial
cmake -S kokkos-kernels -B $bdir \
  -DCMAKE_CXX_COMPILER=g++ \
  -DKokkos_ROOT=buildKokkosSerial/install \
  -DCMAKE_INSTALL_PREFIX=$bdir/install
cmake --build $bdir -j 24 --target install

valgrind errors

==3560139== Memcheck, a memory error detector
==3560139== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==3560139== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==3560139== Command: ./QRTests
==3560139==
==3560139== Invalid read of size 8
==3560139==    at 0x413BF5: int KokkosBatched::SerialLeftHouseholderInternal::invoke<double>(int, double*, double*, int, double*) (KokkosBatched_Householder_Serial_Internal.hpp:48)
==3560139==    by 0x4129B9: int KokkosBatched::SerialQR_Internal::invoke<double>(int, int, double*, int, int, double*, int, double*) (KokkosBatched_QR_Serial_Internal.hpp:68)
==3560139==    by 0x4118B0: int KokkosBatched::SerialQR<KokkosBlas::Algo::Level3::Unblocked>::invoke<Kokkos::View<double [16][10]>, Kokkos::View<double [10]>, Kokkos::View<double [10]> >(Kokkos::View<double [16][10]> const&, Kokkos::View<double [10]> const&, Kokkos::View<double [10]> const&) (KokkosBatched_QR_Serial_Impl.hpp:34)
==3560139==    by 0x40F5DF: testQR()::{lambda(int)#1}::operator()(int) const (testQR.cpp:21)
==3560139==    by 0x410015: std::enable_if<is_void_v<void>, void>::type Kokkos::Impl::ParallelFor<testQR()::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::exec<void>() const (Kokkos_Serial_Parallel_Range.hpp:37)
==3560139==    by 0x40FEBF: Kokkos::Impl::ParallelFor<testQR()::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::execute() const (Kokkos_Serial_Parallel_Range.hpp:56)
==3560139==    by 0x40FC85: void Kokkos::parallel_for<Kokkos::RangePolicy<Kokkos::Serial>, testQR()::{lambda(int)#1}, void>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::RangePolicy<Kokkos::Serial> const&, testQR()::{lambda(int)#1} const&) (Kokkos_Parallel.hpp:146)
==3560139==    by 0x40FA98: void Kokkos::parallel_for<testQR()::{lambda(int)#1}>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, testQR()::{lambda(int)#1} const&) (Kokkos_Parallel.hpp:167)
==3560139==    by 0x40F786: testQR() (testQR.cpp:17)
==3560139==    by 0x40F8C2: main (testQR.cpp:27)
==3560139==  Address 0x8991400 is 0 bytes after a block of size 1,408 alloc'd
==3560139==    at 0x484615B: operator new(unsigned long, std::align_val_t, std::nothrow_t const&) (vg_replace_malloc.c:663)
==3560139==    by 0x5C8C0A9: Kokkos::HostSpace::impl_allocate(char const*, unsigned long, unsigned long, Kokkos_Profiling_SpaceHandle) const (Kokkos_HostSpace.cpp:79)
==3560139==    by 0x5C8C284: Kokkos::HostSpace::allocate(char const*, unsigned long, unsigned long) const (Kokkos_HostSpace.cpp:58)
==3560139==    by 0x5C8C767: checked_allocation_with_header<Kokkos::HostSpace> (Kokkos_SharedAlloc.hpp:203)
==3560139==    by 0x5C8C767: Kokkos::Impl::SharedAllocationRecordCommon<Kokkos::HostSpace>::SharedAllocationRecordCommon(Kokkos::HostSpace const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, void (*)(Kokkos::Impl::SharedAllocationRecord<void, void>*)) (Kokkos_SharedAlloc_timpl.hpp:62)
==3560139==    by 0x4150F8: Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, void>::SharedAllocationRecordCommon(Kokkos::HostSpace const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, void (*)(Kokkos::Impl::SharedAllocationRecord<void, void>*)) (Kokkos_HostSpace.hpp:178)
==3560139==    by 0x41513E: Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, double> >::SharedAllocationRecord(Kokkos::HostSpace const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long) (Kokkos_SharedAlloc.hpp:419)
==3560139==    by 0x4144C9: Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, double> >::allocate(Kokkos::HostSpace const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long) (Kokkos_SharedAlloc.hpp:434)
==3560139==    by 0x4132B8: Kokkos::Impl::SharedAllocationRecord<void, void>* Kokkos::Impl::ViewMapping<Kokkos::ViewTraits<double [16][10]>, void>::allocate_shared<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::HostSpace, Kokkos::Serial>(Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<cha
==3560139==    by 0x4124F2: Kokkos::View<double [16][10]>::View<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, std::enable_if<!Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char
==3560139==    by 0x4116EC: Kokkos::View<double [16][10]>::View<char [2]>(char const (&) [2], std::enable_if<Kokkos::Impl::is_view_label<char [2]>::value, unsigned long const>::type, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) (Kokkos_ViewLegacy.hpp:1158)
==3560139==    by 0x40F666: testQR() (testQR.cpp:10)
==3560139==    by 0x40F8C2: main (testQR.cpp:27)
==3560139==
==3560139== Invalid write of size 8
==3560139==    at 0x413C62: int KokkosBatched::SerialLeftHouseholderInternal::invoke<double>(int, double*, double*, int, double*) (KokkosBatched_Householder_Serial_Internal.hpp:55)
==3560139==    by 0x4129B9: int KokkosBatched::SerialQR_Internal::invoke<double>(int, int, double*, int, int, double*, int, double*) (KokkosBatched_QR_Serial_Internal.hpp:68)
==3560139==    by 0x4118B0: int KokkosBatched::SerialQR<KokkosBlas::Algo::Level3::Unblocked>::invoke<Kokkos::View<double [16][10]>, Kokkos::View<double [10]>, Kokkos::View<double [10]> >(Kokkos::View<double [16][10]> const&, Kokkos::View<double [10]> const&, Kokkos::View<double [10]> const&) (KokkosBatched_QR_Serial_Impl.hpp:34)
==3560139==    by 0x40F5DF: testQR()::{lambda(int)#1}::operator()(int) const (testQR.cpp:21)
==3560139==    by 0x410015: std::enable_if<is_void_v<void>, void>::type Kokkos::Impl::ParallelFor<testQR()::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::exec<void>() const (Kokkos_Serial_Parallel_Range.hpp:37)
==3560139==    by 0x40FEBF: Kokkos::Impl::ParallelFor<testQR()::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::execute() const (Kokkos_Serial_Parallel_Range.hpp:56)
==3560139==    by 0x40FC85: void Kokkos::parallel_for<Kokkos::RangePolicy<Kokkos::Serial>, testQR()::{lambda(int)#1}, void>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::RangePolicy<Kokkos::Serial> const&, testQR()::{lambda(int)#1} const&) (Kokkos_Parallel.hpp:146)
==3560139==    by 0x40FA98: void Kokkos::parallel_for<testQR()::{lambda(int)#1}>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, testQR()::{lambda(int)#1} const&) (Kokkos_Parallel.hpp:167)
==3560139==    by 0x40F786: testQR() (testQR.cpp:17)
==3560139==    by 0x40F8C2: main (testQR.cpp:27)
==3560139==  Address 0x8991890 is 0 bytes after a block of size 208 alloc'd
==3560139==    at 0x484615B: operator new(unsigned long, std::align_val_t, std::nothrow_t const&) (vg_replace_malloc.c:663)
==3560139==    by 0x5C8C0A9: Kokkos::HostSpace::impl_allocate(char const*, unsigned long, unsigned long, Kokkos_Profiling_SpaceHandle) const (Kokkos_HostSpace.cpp:79)
==3560139==    by 0x5C8C284: Kokkos::HostSpace::allocate(char const*, unsigned long, unsigned long) const (Kokkos_HostSpace.cpp:58)
==3560139==    by 0x5C8C767: checked_allocation_with_header<Kokkos::HostSpace> (Kokkos_SharedAlloc.hpp:203)
==3560139==    by 0x5C8C767: Kokkos::Impl::SharedAllocationRecordCommon<Kokkos::HostSpace>::SharedAllocationRecordCommon(Kokkos::HostSpace const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, void (*)(Kokkos::Impl::SharedAllocationRecord<void, void>*)) (Kokkos_SharedAlloc_timpl.hpp:62)
==3560139==    by 0x4150F8: Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, void>::SharedAllocationRecordCommon(Kokkos::HostSpace const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, void (*)(Kokkos::Impl::SharedAllocationRecord<void, void>*)) (Kokkos_HostSpace.hpp:178)
==3560139==    by 0x41513E: Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, double> >::SharedAllocationRecord(Kokkos::HostSpace const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long) (Kokkos_SharedAlloc.hpp:419)
==3560139==    by 0x4144C9: Kokkos::Impl::SharedAllocationRecord<Kokkos::HostSpace, Kokkos::Impl::ViewValueFunctor<Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, double> >::allocate(Kokkos::HostSpace const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long) (Kokkos_SharedAlloc.hpp:434)
==3560139==    by 0x413584: Kokkos::Impl::SharedAllocationRecord<void, void>* Kokkos::Impl::ViewMapping<Kokkos::ViewTraits<double [10]>, void>::allocate_shared<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::HostSpace, Kokkos::Serial>(Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
==3560139==    by 0x4126C6: Kokkos::View<double [10]>::View<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, std::enable_if<!Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
==3560139==    by 0x4117B8: Kokkos::View<double [10]>::View<char [2]>(char const (&) [2], std::enable_if<Kokkos::Impl::is_view_label<char [2]>::value, unsigned long const>::type, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) (Kokkos_ViewLegacy.hpp:1158)
==3560139==    by 0x40F6A2: testQR() (testQR.cpp:11)
==3560139==    by 0x40F8C2: main (testQR.cpp:27)

... snip
@lucbv
Copy link
Contributor

lucbv commented Sep 6, 2024

Thanks for reporting, we will have a look at this

@lucbv lucbv self-assigned this Sep 6, 2024
@lucbv lucbv added the bug label Sep 6, 2024
@cwsmith
Copy link
Author

cwsmith commented Sep 6, 2024

Hi @lucbv . Thanks for looking into this.

It seems like the problem is related to handling rectangular matrices. If the MatrixViewType is defined to be 10x10:

typedef Kokkos::View<double[16][10]> MatrixViewType;

there are no errors under valgrind.

Digging into the code a bit, but without a full understanding of it, I see that this loop over matrix rows:

for (int m_atl = 0; m_atl < m; ++m_atl) {
// part 2x2 into 3x3
A_part3x3.partWithABR(A_part2x2, 1, 1);
const int m_A22 = m - m_atl - 1;
const int n_A22 = n - m_atl - 1;

that successively removes one row and one column to form the 3x3 partitioned matrix A_part3x3 via the call A_part3x3.partWithABR(A_part2x2, 1, 1);.

In the original case of the 16x10 matrix, running valgrind with the gdbserver I see that the first invalid read occurs in SerialLeftHouseholderInternal::invoke(...) when m_atl=10 in SerialQR_Internal::invoke(...). This seems to make sense as all 10 columns have been removed.

The test case in the repo for QR appears to only run with square matrices:

test without column pivoting:

MatrixViewType a("a", N, BlkSize, BlkSize);

test 'WithColumnPivoting':

@lucbv
Copy link
Contributor

lucbv commented Sep 6, 2024

Okay, thanks for digging a bit into this, I will run the code in valgrind / gdb as well and hopefully can reproduce and report my observation. The algorithm indeed uses a partitioning in the matrix to perform some operations but it should still work for rectangular matrices. Once I find something promising I will let you know about it : )

lucbv added a commit to lucbv/kokkos-kernels that referenced this issue Sep 19, 2024
The serial QR algorithms does not have unit-tests and is failing
for non square matrices. See issue kokkos#2328.
This first commit fixes the issue with rectangular matrices and
adds a basic test for that use case. Next will work on adding a
test that exercises the interfaces on multiple matrices of different
sizes within a parallel_for. Finally equivalent tests will be added
for the square case as well.
@lucbv
Copy link
Contributor

lucbv commented Sep 19, 2024

The PR above, #2342, has a fix for the rectangular matrices and introduces more tests for the Serial QR feature. The tests are not fully implemented yet but the fix seems to be okay if you want to give it a try.

@cwsmith
Copy link
Author

cwsmith commented Sep 20, 2024

This is great. Thank you @lucbv.

Using the PR branch (9121f0a) I ran the reproducer under valgrind again and the SerialQR errors are gone.

Running the expanded version of the reproducer:
https://github.com/SCOREC/meshFields/blob/20a68919b338003ff8792ce7d2cc6c5df3f13613/test/testQR.cpp
under valgrind reports an invalid read in the call to SerialTrsv. Some additional details on the first invalid read and the valgrind log are below.

stack at first Trsv invalid read

Running under gdb reports the following values of variables at the point of the first reported invalid read.

Reading symbols from /opt/scorec/spack/rhel9/v0201_4/install/linux-rhel9-x86_64/gcc-12.3.0/libiconv-1.17-oylnknwv5m57zkfjde7op6ne3pqdkhxg/lib/libiconv.so.2...
0x0000000000427b8f in KokkosBatched::SerialTrsvInternalUpper<KokkosBlas::Algo::Level2::Unblocked>::invoke<double, double> (use_unit_diag=false, m=16, alpha=1, A=0x8991f00, as0=1, as1=16, b=0x8993940, bs0=1)
    at /space/cwsmith/meshFields/buildKokkosKernelsSerial/install/include/KokkosBatched_Trsv_Serial_Internal.hpp:161
161           if (!use_unit_diag) *beta1 = *beta1 / A[p * as0 + p * as1];
(ins)(gdb) where
#0  0x0000000000427b8f in KokkosBatched::SerialTrsvInternalUpper<KokkosBlas::Algo::Level2::Unblocked>::invoke<double, double> (use_unit_diag=false, m=16, alpha=1, A=0x8991f00, as0=1, as1=16, b=0x8993940, bs0=1)
    at /space/cwsmith/meshFields/buildKokkosKernelsSerial/install/include/KokkosBatched_Trsv_Serial_Internal.hpp:161
#1  0x0000000000423408 in KokkosBatched::SerialTrsv<KokkosBatched::Uplo::Upper, KokkosBlas::Trans::NoTranspose, KokkosBatched::Diag::NonUnit, KokkosBlas::Algo::Level2::Unblocked>::invoke<double, Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >, Kokkos::View<double [16], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > > (alpha=1, A=..., b=...)
    at /space/cwsmith/meshFields/buildKokkosKernelsSerial/install/include/KokkosBatched_Trsv_Serial_Impl.hpp:191
#2  0x000000000041a7c8 in operator() (__closure=0x1ffefef0d0)
    at /space/cwsmith/meshFields/meshFields/test/testQR.cpp:127
#3  0x000000000041bf36 in Kokkos::Impl::ParallelFor<testSolveQR()::<lambda(int)>, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::exec<void>(void) const (this=0x1ffefef0d0)
    at /space/cwsmith/meshFields/buildKokkosSerial/install/include/Serial/Kokkos_Serial_Parallel_Range.hpp:37
#4  0x000000000041bb86 in Kokkos::Impl::ParallelFor<testSolveQR()::<lambda(int)>, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::execute(void) const (this=0x1ffefef0d0)
    at /space/cwsmith/meshFields/buildKokkosSerial/install/include/Serial/Kokkos_Serial_Parallel_Range.hpp:56
#5  0x000000000041b71e in Kokkos::parallel_for<Kokkos::RangePolicy<Kokkos::Serial>, testSolveQR()::<lambda(int)> >(const std::string &, const Kokkos::RangePolicy<Kokkos::Serial> &, const struct {...} &) (str=..., policy=..., functor=...)
    at /space/cwsmith/meshFields/buildKokkosSerial/install/include/Kokkos_Parallel.hpp:146
#6  0x000000000041b4c3 in Kokkos::parallel_for<testSolveQR()::<lambda(int)> >(const std::string &, size_t, const struct {...} &) (str=..., work_count=1, functor=...)
    at /space/cwsmith/meshFields/buildKokkosSerial/install/include/Kokkos_Parallel.hpp:167
#7  0x000000000041afbc in testSolveQR () at /space/cwsmith/meshFields/meshFields/test/testQR.cpp:113
#8  0x000000000041b2ed in main (argc=1, argv=0x1ffefef608) at /space/cwsmith/meshFields/meshFields/test/testQR.cpp:140
(ins)(gdb) p p
$1 = 15
(ins)(gdb) p m
$2 = 16

Given this loop from p=m-1:0

and the use of p twice to compute the index into A here (where the invalid read occurs):
if (!use_unit_diag) *beta1 = *beta1 / A[p * as0 + p * as1];
, it looks that the assumption that A is square is made here as well.

valgrind log

==265289== Memcheck, a memory error detector
==265289== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. 
==265289== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==265289== Command: ./QRTests
==265289== 
==265289== Invalid read of size 8
==265289==    at 0x427B8F: int KokkosBatched::SerialTrsvInternalUpper<KokkosBlas::Algo::Level2::Unblocked>::invoke<double, double>(bool, int, double, double const*, int, int, double*, int) (KokkosBatched_Trsv_Serial_Internal.hpp:161)
==265289==    by 0x423407: int KokkosBatched::SerialTrsv<KokkosBatched::Uplo::Upper, KokkosBlas::Trans::NoTranspose, KokkosBatched::Diag::NonUnit, KokkosBlas::Algo::Level2::Unblocked>::invoke<double, Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >, Kokkos::View<double [16], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > >(double, Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&, Kokkos::View<double [16], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&) (KokkosBatched_Trsv_Serial_Impl.hpp:191)
==265289==    by 0x41A7C7: testSolveQR()::{lambda(int)#1}::operator()(int) const (testQR.cpp:127)
==265289==    by 0x41BF35: std::enable_if<is_void_v<void>, void>::type Kokkos::Impl::ParallelFor<testSolveQR()::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::exec<void>() const (Kokkos_Serial_Parallel_Range.hpp:37)
==265289==    by 0x41BB85: Kokkos::Impl::ParallelFor<testSolveQR()::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::execute() const (Kokkos_Serial_Parallel_Range.hpp:56)
==265289==    by 0x41B71D: void Kokkos::parallel_for<Kokkos::RangePolicy<Kokkos::Serial>, testSolveQR()::{lambda(int)#1}, void>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::RangePolicy<Kokkos::Serial> const&, testSolveQR()::{lambda(int)#1} const&) (Kokkos_Parallel.hpp:146)
==265289==    by 0x41B4C2: void Kokkos::parallel_for<testSolveQR()::{lambda(int)#1}>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, testSolveQR()::{lambda(int)#1} const&) (Kokkos_Parallel.hpp:167)
==265289==    by 0x41AFBB: testSolveQR() (testQR.cpp:113)
==265289==    by 0x41B2EC: main (testQR.cpp:140)
==265289==  Address 0x89926f8 is 16 bytes after a block of size 40 free'd
==265289==    at 0x484893D: operator delete(void*, unsigned long) (vg_replace_malloc.c:1101)
==265289==    by 0x423D0C: Kokkos::Impl::HostSharedPtr<Kokkos::Impl::SerialInternal>::cleanup() (Kokkos_HostSharedPtr.hpp:120)
==265289==    by 0x41DDA1: Kokkos::Impl::HostSharedPtr<Kokkos::Impl::SerialInternal>::~HostSharedPtr() (Kokkos_HostSharedPtr.hpp:92)
==265289==    by 0x41CC4F: Kokkos::Serial::~Serial() (Kokkos_Serial.hpp:95)
==265289==    by 0x423E87: Kokkos::Impl::ViewCtorProp<void, Kokkos::Serial>::~ViewCtorProp() (Kokkos_ViewCtor.hpp:122)
==265289==    by 0x423EA7: Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Kokkos::HostSpace, Kokkos::Serial>::~ViewCtorProp() (Kokkos_ViewCtor.hpp:182)
==265289==    by 0x425342: Kokkos::View<double [10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::View<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, std::enable_if<!Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::has_pointer, Kokkos::LayoutLeft>::type const&) (Kokkos_ViewLegacy.hpp:1048)
==265289==    by 0x41F9BC: Kokkos::View<double [10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::View<char [3]>(char const (&) [3], std::enable_if<Kokkos::Impl::is_view_label<char [3]>::value, unsigned long const>::type, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) (Kokkos_ViewLegacy.hpp:1158)
==265289==    by 0x41AC8F: testSolveQR() (testQR.cpp:89)
==265289==    by 0x41B2EC: main (testQR.cpp:140)
==265289==  Block was alloc'd at
==265289==    at 0x4844F95: operator new(unsigned long) (vg_replace_malloc.c:483)
==265289==    by 0x5C94902: HostSharedPtr<Kokkos::Serial::Serial()::<lambda(Kokkos::Impl::SerialInternal*)> > (Kokkos_HostSharedPtr.hpp:47)
==265289==    by 0x5C94902: Kokkos::Serial::Serial() (Kokkos_Serial.cpp:160)
==265289==    by 0x4251A9: Kokkos::View<double [10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::View<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, std::enable_if<!Kokkos::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::has_pointer, Kokkos::LayoutLeft>::type const&) (Kokkos_ViewLegacy.hpp:1002)
==265289==    by 0x41F9BC: Kokkos::View<double [10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::View<char [3]>(char const (&) [3], std::enable_if<Kokkos::Impl::is_view_label<char [3]>::value, unsigned long const>::type, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) (Kokkos_ViewLegacy.hpp:1158)
==265289==    by 0x41AC8F: testSolveQR() (testQR.cpp:89)
==265289==    by 0x41B2EC: main (testQR.cpp:140)
==265289== 
==265289== Invalid read of size 8
==265289==    at 0x427BDC: int KokkosBatched::SerialTrsvInternalUpper<KokkosBlas::Algo::Level2::Unblocked>::invoke<double, double>(bool, int, double, double const*, int, int, double*, int) (KokkosBatched_Trsv_Serial_Internal.hpp:163)
==265289==    by 0x423407: int KokkosBatched::SerialTrsv<KokkosBatched::Uplo::Upper, KokkosBlas::Trans::NoTranspose, KokkosBatched::Diag::NonUnit, KokkosBlas::Algo::Level2::Unblocked>::invoke<double, Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >, Kokkos::View<double [16], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > >(double, Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&, Kokkos::View<double [16], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&) (KokkosBatched_Trsv_Serial_Impl.hpp:191)
==265289==    by 0x41A7C7: testSolveQR()::{lambda(int)#1}::operator()(int) const (testQR.cpp:127)
==265289==    by 0x41BF35: std::enable_if<is_void_v<void>, void>::type Kokkos::Impl::ParallelFor<testSolveQR()::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::exec<void>() const (Kokkos_Serial_Parallel_Range.hpp:37)
==265289==    by 0x41BB85: Kokkos::Impl::ParallelFor<testSolveQR()::{lambda(int)#1}, Kokkos::RangePolicy<Kokkos::Serial>, Kokkos::Serial>::execute() const (Kokkos_Serial_Parallel_Range.hpp:56)
==265289==    by 0x41B71D: void Kokkos::parallel_for<Kokkos::RangePolicy<Kokkos::Serial>, testSolveQR()::{lambda(int)#1}, void>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, Kokkos::RangePolicy<Kokkos::Serial> const&, testSolveQR()::{lambda(int)#1} const&) (Kokkos_Parallel.hpp:146)
==265289==    by 0x41B4C2: void Kokkos::parallel_for<testSolveQR()::{lambda(int)#1}>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, testSolveQR()::{lambda(int)#1} const&) (Kokkos_Parallel.hpp:167)
==265289==    by 0x41AFBB: testSolveQR() (testQR.cpp:113)
==265289==    by 0x41B2EC: main (testQR.cpp:140)
==265289==  Address 0x8992680 is 6 bytes after a block of size 74 free'd
==265289==    at 0x484893D: operator delete(void*, unsigned long) (vg_replace_malloc.c:1101)
==265289==    by 0x5C94C5D: deallocate (new_allocator.h:158)
==265289==    by 0x5C94C5D: deallocate (alloc_traits.h:496)
==265289==    by 0x5C94C5D: _M_destroy (basic_string.h:300)
==265289==    by 0x5C94C5D: _M_dispose (basic_string.h:294)
==265289==    by 0x5C94C5D: ~basic_string (basic_string.h:803)
==265289==    by 0x5C94C5D: profile_fence_event<Kokkos::Serial, Kokkos::Serial::impl_static_fence(const std::string&)::<lambda()> > (Kokkos_Profiling.hpp:219)
==265289==    by 0x5C94C5D: impl_static_fence (Kokkos_Serial.hpp:147)
==265289==    by 0x5C94C5D: Kokkos::Impl::ExecSpaceDerived<Kokkos::Serial>::static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (Kokkos_ExecSpaceManager.hpp:131)
==265289==    by 0x5C85684: Kokkos::Impl::ExecSpaceManager::static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (Kokkos_Core.cpp:243)
==265289==    by 0x41E8BB: void Kokkos::deep_copy<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks>(Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&, Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks> const&, std::enable_if<((is_void_v<Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::specialize>)&&(is_void_v<Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks>::specialize>))&&((((unsigned int)Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::rank)!=(0))||(((unsigned int)Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks>::rank)!=(0))), void>::type*) (Kokkos_CopyViews.hpp:1709)
==265289==    by 0x41AC57: testSolveQR() (testQR.cpp:87)
==265289==    by 0x41B2EC: main (testQR.cpp:140)
==265289==  Block was alloc'd at
==265289==    at 0x4844F95: operator new(unsigned long) (vg_replace_malloc.c:483)
==265289==    by 0x5C943FB: void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.0] (basic_string.tcc:225)
==265289==    by 0x5C94C36: basic_string (basic_string.h:552)
==265289==    by 0x5C94C36: profile_fence_event<Kokkos::Serial, Kokkos::Serial::impl_static_fence(const std::string&)::<lambda()> > (Kokkos_Profiling.hpp:219)
==265289==    by 0x5C94C36: impl_static_fence (Kokkos_Serial.hpp:147)
==265289==    by 0x5C94C36: Kokkos::Impl::ExecSpaceDerived<Kokkos::Serial>::static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (Kokkos_ExecSpaceManager.hpp:131)
==265289==    by 0x5C85684: Kokkos::Impl::ExecSpaceManager::static_fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (Kokkos_Core.cpp:243)
==265289==    by 0x41E8BB: void Kokkos::deep_copy<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks>(Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> > const&, Kokkos::View<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks> const&, std::enable_if<((is_void_v<Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::specialize>)&&(is_void_v<Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks>::specialize>))&&((((unsigned int)Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace> >::rank)!=(0))||(((unsigned int)Kokkos::ViewTraits<double [16][10], Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Experimental::EmptyViewHooks>::rank)!=(0))), void>::type*) (Kokkos_CopyViews.hpp:1709)
==265289==    by 0x41AC57: testSolveQR() (testQR.cpp:87)
==265289==    by 0x41B2EC: main (testQR.cpp:140)
==265289== 
==265289== 
==265289== HEAP SUMMARY:
==265289==     in use at exit: 176 bytes in 1 blocks
==265289==   total heap usage: 230 allocs, 229 frees, 115,919 bytes allocated
==265289== 
==265289== LEAK SUMMARY:
==265289==    definitely lost: 0 bytes in 0 blocks
==265289==    indirectly lost: 0 bytes in 0 blocks
==265289==      possibly lost: 0 bytes in 0 blocks
==265289==    still reachable: 176 bytes in 1 blocks
==265289==         suppressed: 0 bytes in 0 blocks
==265289== Rerun with --leak-check=full to see details of leaked memory
==265289== 
==265289== For lists of detected and suppressed errors, rerun with: -s
==265289== ERROR SUMMARY: 80 errors from 2 contexts (suppressed: 0 from 0)

@lucbv
Copy link
Contributor

lucbv commented Sep 20, 2024

Okay, I will try to wrap up the PR and get that tested and merged, then I can move on to trsv, hopefully it's not more complicated than the QR fix but writing proper tests is what takes time!

@lucbv
Copy link
Contributor

lucbv commented Sep 22, 2024

So I have not looked at it in detail but my guess is that we are assuming the triangular matrix to be stored in a square matrix, size mxm. Since yours is coming for the QR factorization of a rectangular matrix we need to fix the code so that it works for a mxn input matrix, basically we will ignore the non-square part of the input. Should not be too bad hopefully. I will create a PR once I have confirmed that this is the issue and we have a fix for you...

lucbv added a commit to lucbv/kokkos-kernels that referenced this issue Oct 8, 2024
The serial QR algorithms does not have unit-tests and is failing
for non square matrices. See issue kokkos#2328.
This first commit fixes the issue with rectangular matrices and
adds a basic test for that use case. Next will work on adding a
test that exercises the interfaces on multiple matrices of different
sizes within a parallel_for. Finally equivalent tests will be added
for the square case as well.
lucbv added a commit to lucbv/kokkos-kernels that referenced this issue Oct 8, 2024
The serial QR algorithms does not have unit-tests and is failing
for non square matrices. See issue kokkos#2328.
This first commit fixes the issue with rectangular matrices and
adds a basic test for that use case. Next will work on adding a
test that exercises the interfaces on multiple matrices of different
sizes within a parallel_for. Finally equivalent tests will be added
for the square case as well.

Signed-off-by: Luc <[email protected]>
lucbv added a commit to lucbv/kokkos-kernels that referenced this issue Nov 4, 2024
The serial QR algorithms does not have unit-tests and is failing
for non square matrices. See issue kokkos#2328.
This first commit fixes the issue with rectangular matrices and
adds a basic test for that use case. Next will work on adding a
test that exercises the interfaces on multiple matrices of different
sizes within a parallel_for. Finally equivalent tests will be added
for the square case as well.

Signed-off-by: Luc <[email protected]>
lucbv added a commit to lucbv/kokkos-kernels that referenced this issue Nov 6, 2024
The serial QR algorithms does not have unit-tests and is failing
for non square matrices. See issue kokkos#2328.
This first commit fixes the issue with rectangular matrices and
adds a basic test for that use case. Next will work on adding a
test that exercises the interfaces on multiple matrices of different
sizes within a parallel_for. Finally equivalent tests will be added
for the square case as well.

Signed-off-by: Luc <[email protected]>
lucbv added a commit to lucbv/kokkos-kernels that referenced this issue Dec 4, 2024
The serial QR algorithms does not have unit-tests and is failing
for non square matrices. See issue kokkos#2328.
This first commit fixes the issue with rectangular matrices and
adds a basic test for that use case. Next will work on adding a
test that exercises the interfaces on multiple matrices of different
sizes within a parallel_for. Finally equivalent tests will be added
for the square case as well.
Fixing unused variable error
It looks like the Left NoTranspose ApplyQ is not doing the correct
thing. Will have a look at that next.

Signed-off-by: Luc <[email protected]>
lucbv added a commit to lucbv/kokkos-kernels that referenced this issue Dec 4, 2024
The serial QR algorithms does not have unit-tests and is failing
for non square matrices. See issue kokkos#2328.
This first commit fixes the issue with rectangular matrices and
adds a basic test for that use case. Next will work on adding a
test that exercises the interfaces on multiple matrices of different
sizes within a parallel_for. Finally equivalent tests will be added
for the square case as well.
Fixing unused variable error
It looks like the Left NoTranspose ApplyQ is not doing the correct
thing. Will have a look at that next.

Signed-off-by: Luc <[email protected]>
Signed-off-by: Luc Berger-Vergiat <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants