Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Belos: Belos_CustomSolverFactory failure, possibly due to failure to properly destroy memory; clang/10 serial build with c++17 and debugging #11137

Closed
ndellingwood opened this issue Oct 12, 2022 · 3 comments
Labels
pkg: Belos type: bug The primary issue is a bug in Trilinos code or tests

Comments

@ndellingwood
Copy link
Contributor

Bug Report

@trilinos/belos

Description

In a Serial build of Trilinos with clang/10, c++17 support, and debugging (no MPI) the Belos_CustomSolverFactory test failed after passing initial checks with what appears to be failure to destroy created objects after the main routine completes (possibly a static object not being deleted?)

Failure output:

780: Test command: /ascldap/users/ndellin/trilinos/Trilinos-pristine/Build/Blake-clang10-serial/packages/belos/tpetra/test/LinearSolverFactory/Belos_CustomSolverFactory.exe
780: Test timeout computed to be: 1500
780: Teuchos::GlobalMPISession::GlobalMPISession(): started serial run
780:
780: ***
780: *** Unit test suite ...
780: ***
780:
780:
780: Sorting tests by group name then by the order they were added ... (time = 2.6e-05)
780:
780: Running unit tests ...
780:
780: 0. CustomSolverFactory_double_int_longlong_Kokkos_Compat_KokkosSerialWrapperNode_AddFactory_UnitTest ... [Passed] (0.00176 sec)
780: 1. CustomSolverFactory_std_complex0double0_int_longlong_Kokkos_Compat_KokkosSerialWrapperNode_AddFactory_UnitTest ... [Passed] (0.000462 sec)
780:
780: Total Time: 0.00239 sec
780:
780: Summary: total = 2, run = 2, passed = 2, failed = 0
780:
780: End Result: TEST PASSED
780:
780: ***
780: *** Warning! The following Teuchos::RCPNode objects were created but have
780: *** not been destroyed yet.  A memory checking tool may complain that these
780: *** objects are not destroyed correctly.
780: ***
780: *** There can be many possible reasons that this might occur including:
780: ***
780: ***   a) The program called abort() or exit() before main() was finished.
780: ***      All of the objects that would have been freed through destructors
780: ***      are not freed but some compilers (e.g. GCC) will still call the
780: ***      destructors on static objects (which is what causes this message
780: ***      to be printed).
780: ***
780: ***   b) The program is using raw new/delete to manage some objects and
780: ***      delete was not called correctly and the objects not deleted hold
780: ***      other objects through reference-counted pointers.
780: ***
780: ***   c) This may be an indication that these objects may be involved in
780: ***      a circular dependency of reference-counted managed objects.
780: ***
780:
780:   0: RCPNode (map_key_void_ptr=0x302ffa0)
780:        Information = {T=(anonymous namespace)::FooSolverFactory<double, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >, Tpetra::Operator<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >, ConcreteT=(anonymous namespace)::FooSolverFactory<double, Tpetra::MultiVector<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >, Tpetra::Operator<double, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >, p=0x302ffa0, has_ownership=1}
780:        RCPNode address = 0x302b9b0
780:        insertionNumber = 120
780:   1: RCPNode (map_key_void_ptr=0x30330b0)
780:        Information = {T=(anonymous namespace)::FooSolverFactory<std::complex<double>, Tpetra::MultiVector<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >, Tpetra::Operator<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >, ConcreteT=(anonymous namespace)::FooSolverFactory<std::complex<double>, Tpetra::MultiVector<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >, Tpetra::Operator<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >, p=0x30330b0, has_ownership=1}
780:        RCPNode address = 0x3034a70
780:        insertionNumber = 144

gdb backtrace:

(gdb) bt
#0  0x00007ffff5945387 in raise () from /lib64/libc.so.6
#1  0x00007ffff5946a78 in abort () from /lib64/libc.so.6
#2  0x00007ffff5f96efc in __gnu_cxx::__verbose_terminate_handler () at ../../.././libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007ffff5fa2206 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:48
#4  0x00007ffff5fa2271 in std::terminate () at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:58
#5  0x0000000000420a4f in __clang_call_terminate ()
#6  0x000000000042098b in Teuchos::RCPNodeHandle::~RCPNodeHandle (this=0x1995f58)
    at /ascldap/users/ndellin/trilinos/Trilinos-pristine/packages/teuchos/core/src/Teuchos_RCPNode.hpp:874
#7  0x000000000045206e in Teuchos::RCP<Belos::CustomSolverFactory<std::complex<double>, Tpetra::MultiVector<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >, Tpetra::Operator<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > > >::~RCP (this=0x1995f50)
    at /ascldap/users/ndellin/trilinos/Trilinos-pristine/packages/teuchos/core/src/Teuchos_RCP.hpp:305
#8  0x0000000000462585 in std::_Destroy<Teuchos::RCP<Belos::CustomSolverFactory<std::complex<double>, Tpetra::MultiVector<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >, Tpetra::Operator<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > > > > (__pointer=0x1995f50)
    at /home/projects/x86-64/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/gcc-10.2.0-wvxdxxf7kjnswvpumksnco4adgvdldxf/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/stl_construct.h:140
#9  0x000000000046254f in std::_Destroy_aux<false>::__destroy<Teuchos::RCP<Belos::CustomSolverFactory<std::complex<double>, Tpetra::MultiVector<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >, Tpetra::Operator<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > > >*> (__first=0x1995f50, __last=0x1995f68)
    at /home/projects/x86-64/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/gcc-10.2.0-wvxdxxf7kjnswvpumksnco4adgvdldxf/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/stl_construct.h:152
#10 0x000000000046250d in std::_Destroy<Teuchos::RCP<Belos::CustomSolverFactory<std::complex<double>, Tpetra::MultiVector<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >, Tpetra::Operator<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > > >*> (__first=0x1995f50, __last=0x1995f68)
    at /home/projects/x86-64/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/gcc-10.2.0-wvxdxxf7kjnswvpumksnco4adgvdldxf/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/stl_construct.h:184
#11 0x0000000000462161 in std::_Destroy<Teuchos::RCP<Belos::CustomSolverFactory<std::complex<double>, Tpetra::MultiVector<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >, Tpetra::Operator<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > > >*, Teuchos::RCP<Belos::CustomSolverFactory<std::complex<double>, Tpetra::MultiVector<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >, Tpetra::Operator<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > > > > (__first=0x1995f50, __last=0x1995f68)
    at /home/projects/x86-64/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/gcc-10.2.0-wvxdxxf7kjnswvpumksnco4adgvdldxf/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/alloc_traits.h:738
#12 0x000000000042057b in std::vector<Teuchos::RCP<Belos::CustomSolverFactory<std::complex<double>, Tpetra::MultiVector<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >, Tpetra::Operator<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > > >, std::allocator<Teuchos::RCP<Belos::CustomSolverFactory<std::complex<double>, Tpetra::MultiVector<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >, Tpetra::Operator<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > > > > >::~vector (
    this=0x1918c88 <Belos::Impl::SolverFactoryParent<std::complex<double>, Tpetra::MultiVector<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >, Tpetra::Operator<std::complex<double>, int, long long, Kokkos::Compat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >::factories_>)
    at /home/projects/x86-64/spack/opt/spack/linux-rhel7-x86_64/gcc-4.8.5/gcc-10.2.0-wvxdxxf7kjnswvpumksnco4adgvdldxf/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/bits/stl_vector.h:680
#13 0x00007ffff5948ce9 in __run_exit_handlers () from /lib64/libc.so.6
#14 0x00007ffff5948d37 in exit () from /lib64/libc.so.6
#15 0x00007ffff593155c in __libc_start_main () from /lib64/libc.so.6
#16 0x00000000004165b3 in _start ()

Steps to Reproduce

  1. SHA1: a76c1c4
  2. Configure script: Blake testbed
module load cmake/3.22.2 clang/10.0.1 openblas/0.3.13/gcc/10.2.0

cmake \
 -D CMAKE_INSTALL_PREFIX="${TRILINOS_INSTALL_DIR}" \
 -D CMAKE_CXX_STANDARD="17" \
 -D CMAKE_CXX_FLAGS="-g" \
 -D CMAKE_BUILD_TYPE=DEBUG \
\
 -D TPL_ENABLE_MPI=OFF \
\
 -D TPL_ENABLE_BLAS:STRING=ON \
  -D BLAS_LIBRARY_DIRS:FILEPATH=${BLAS_ROOT}/lib \
  -D BLAS_LIBRARY_NAMES:STRING="openblas" \
 -D TPL_ENABLE_LAPACK:STRING=ON \
  -D LAPACK_INCLUDE_DIRS:FILEPATH="${LAPACK_ROOT}/include" \
  -D LAPACK_LIBRARY_DIRS:FILEPATH=${LAPACK_ROOT}/lib \
  -D LAPACK_LIBRARY_NAMES:STRING="openblas" \
\
 -D Trilinos_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_EXAMPLES=ON \
 -D Trilinos_ENABLE_COMPLEX=ON \
\
 -D Trilinos_ENABLE_Ifpack2=ON \
  -D Ifpack2_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Amesos2=ON \
  -D Amesos2_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Kokkos=ON \
 -D Kokkos_ENABLE_SERIAL=ON \
 -D Kokkos_ARCH_SKX=ON \
  -D Kokkos_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_KokkosKernels=ON \
  -D KokkosKernels_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Tpetra=ON \
  -D Tpetra_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Sacado=ON \
  -D Sacado_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Stokhos=ON \
  -D Stokhos_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Zoltan2=ON \
  -D Zoltan2_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Intrepid2=ON \
  -D Intrepid2_ENABLE_TESTS=OFF \
 -D Trilinos_ENABLE_Belos=ON \
  -D Belos_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Anasazi=ON \
  -D Anasazi_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Teuchos=ON \
  -D Teuchos_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_MueLu=ON \
  -D MueLu_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Panzer=ON \
  -D Panzer_ENABLE_TESTS=ON \
 -D Trilinos_ENABLE_Phalanx=ON \
  -D Phalanx_ENABLE_TESTS=ON \
$TRILINOS_DIR

Some of the packages can be disabled, this is just a copy+paste of the full configure line used

@ndellingwood ndellingwood added type: bug The primary issue is a bug in Trilinos code or tests pkg: Belos labels Oct 12, 2022
@github-actions
Copy link

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

@github-actions github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Oct 14, 2023
@hkthorn hkthorn removed the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Oct 16, 2023
@hkthorn
Copy link
Contributor

hkthorn commented Oct 16, 2023

Found this issue locally while developing Belos.

@hkthorn
Copy link
Contributor

hkthorn commented Oct 17, 2023

This issue was resolved by #12406. Marking resolved.

@hkthorn hkthorn closed this as completed Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: Belos type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

2 participants