Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Belos: Fix Belos solver behavior to handle NaNs #12792

Merged
merged 2 commits into from
Mar 2, 2024

Conversation

hkthorn
Copy link
Contributor

@hkthorn hkthorn commented Mar 1, 2024

@trilinos/belos @trilinos/stratimikos

Motivation

In the event that a NaN is detected in the StatusTest residual norm classes,
an exception is thrown. This exception has not historically been handled
within the Belos solver managers and, if uncaught, will result in the exit
of any code using these solvers. In comparison, AztecOO will handle any
detected NaNs by outputting a warning, setting the solution vector to zeros,
and returning a code indicating that it ran into issues and is unconverged.
In practice, that approach is more robust in that it allows for the outer
numerical code to attempt to recover from the introduction of NaNs in the
linear solver.

This commit adds a special exception for NaNs, StatusTestNaNError, that
inherits off StatusTestError, which is thrown by any of the StatusTestResNorm
classes when a NaN is encountered. Each of the solver managers will handle
this special exception by outputting a warning, setting the solution vector
to zero, and returning Unconverged.

Stakeholder Feedback

This change will improve the robustness of the Trilinos solver stack to numerical challenges within the application codes it serves. In particular this work has been motivated by current efforts to improve the performance of Charon.

Testing

Mac OSX, GCC13

hkthorn added 2 commits March 1, 2024 12:34
The ThyraTpetraAdapters are an optional dependency, so they may not be
available.  Code that is dependent upon these adapters in Belos and
Ifpack2 cannot build without them, so it should be conditionally
compiled in.
In the event that a NaN is detected in the StatusTest residual norm classes,
an exception is thrown.  This exception has not historically been handled
within the Belos solver managers and, if uncaught, will result in the exit
of any code using these solvers. In comparison, AztecOO will handle any
detected NaNs by throwing a warning, setting the solution vector to zeros,
and returning a code indicating that it ran into issues and is unconverged.
In practice, that approach is more robust in that it allows for the outer
numerical code to attempt to recover from the introduction of NaNs in the
linear solver.

This commit adds a special exception for NaNs, StatusTestNaNError, that
inherits off StatusTestError, which is thrown by any of the StatusTestResNorm
classes when a NaN is encountered.  Each of the solver managers will handle
this special exception by outputting a warning, setting the solution vector
to zero, and returning Unconverged.
@hkthorn hkthorn self-assigned this Mar 1, 2024
@hkthorn hkthorn requested review from a team as code owners March 1, 2024 19:40
@hkthorn hkthorn requested review from jhux2, rppawlo and cgcgcg March 1, 2024 19:43
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: Trilinos_PR_gcc-8.3.0

  • Build Num: 3748
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-8.3.0-openmpi-1.10.1-openmp_release-debug_static_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12792
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any&&!sandybridge
TRILINOS_SOURCE_REPO https://github.com/hkthorn/Trilinos
TRILINOS_SOURCE_SHA 9977ba0
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a469209

Build Information

Test Name: Trilinos_PR_gcc-8.3.0-serial

  • Build Num: 2247
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-v2-gnu-8.3.0-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12792
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/hkthorn/Trilinos
TRILINOS_SOURCE_SHA 9977ba0
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a469209

Build Information

Test Name: Trilinos_PR_gcc-8.3.0-debug

  • Build Num: 2237
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-8.3.0-openmpi-1.10.1-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12792
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/hkthorn/Trilinos
TRILINOS_SOURCE_SHA 9977ba0
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a469209

Build Information

Test Name: Trilinos_PR_clang-11.0.1

  • Build Num: 2236
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-clang-11.0.1-openmpi-1.10.1-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12792
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/hkthorn/Trilinos
TRILINOS_SOURCE_SHA 9977ba0
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a469209

Build Information

Test Name: Trilinos_PR_python3

  • Build Num: 3411
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-7.2.0-anaconda3-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_pr-framework
PR_LABELS pkg: Belos
PULLREQUESTNUM 12792
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL ascic
TRILINOS_SOURCE_REPO https://github.com/hkthorn/Trilinos
TRILINOS_SOURCE_SHA 9977ba0
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a469209

Build Information

Test Name: Trilinos_PR_cuda-11.4.2-uvm-off

  • Build Num: 3239
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12792
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL GPU
TRILINOS_SOURCE_REPO https://github.com/hkthorn/Trilinos
TRILINOS_SOURCE_SHA 9977ba0
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a469209

Build Information

Test Name: Trilinos_PR_intel-2021.3

  • Build Num: 1878
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-intel-2021.3-sems-openmpi-4.0.5_release-debug_shared_no-kokkos-arch_no-asan_no-complex_fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-off_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12792
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/hkthorn/Trilinos
TRILINOS_SOURCE_SHA 9977ba0
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a469209

Using Repos:

Repo: TRILINOS (hkthorn/Trilinos)
  • Branch: develop
  • SHA: 9977ba0
  • Mode: TEST_REPO

Pull Request Author: hkthorn

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: Trilinos_PR_gcc-8.3.0

  • Build Num: 3748
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-8.3.0-openmpi-1.10.1-openmp_release-debug_static_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12792
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any&&!sandybridge
TRILINOS_SOURCE_REPO https://github.com/hkthorn/Trilinos
TRILINOS_SOURCE_SHA 9977ba0
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a469209

Build Information

Test Name: Trilinos_PR_gcc-8.3.0-serial

  • Build Num: 2247
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-v2-gnu-8.3.0-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12792
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/hkthorn/Trilinos
TRILINOS_SOURCE_SHA 9977ba0
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a469209

Build Information

Test Name: Trilinos_PR_gcc-8.3.0-debug

  • Build Num: 2237
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-8.3.0-openmpi-1.10.1-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12792
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/hkthorn/Trilinos
TRILINOS_SOURCE_SHA 9977ba0
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a469209

Build Information

Test Name: Trilinos_PR_clang-11.0.1

  • Build Num: 2236
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-clang-11.0.1-openmpi-1.10.1-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12792
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/hkthorn/Trilinos
TRILINOS_SOURCE_SHA 9977ba0
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a469209

Build Information

Test Name: Trilinos_PR_python3

  • Build Num: 3411
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-7.2.0-anaconda3-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_pr-framework
PR_LABELS pkg: Belos
PULLREQUESTNUM 12792
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL ascic
TRILINOS_SOURCE_REPO https://github.com/hkthorn/Trilinos
TRILINOS_SOURCE_SHA 9977ba0
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a469209

Build Information

Test Name: Trilinos_PR_cuda-11.4.2-uvm-off

  • Build Num: 3239
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12792
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL GPU
TRILINOS_SOURCE_REPO https://github.com/hkthorn/Trilinos
TRILINOS_SOURCE_SHA 9977ba0
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a469209

Build Information

Test Name: Trilinos_PR_intel-2021.3

  • Build Num: 1878
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-intel-2021.3-sems-openmpi-4.0.5_release-debug_shared_no-kokkos-arch_no-asan_no-complex_fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-off_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12792
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/hkthorn/Trilinos
TRILINOS_SOURCE_SHA 9977ba0
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA a469209


CDash Test Results for PR# 12792.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO REVIEWS HAVE BEEN PERFORMED ON THIS PULL REQUEST!

@trilinos-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

@hkthorn hkthorn changed the title Fix Belos solver behavior to handle NaNs Belos: Fix Belos solver behavior to handle NaNs Mar 1, 2024
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ csiefer2 ]!

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR...

@hkthorn hkthorn merged commit 2c85546 into trilinos:develop Mar 2, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants