Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Intrepid2: Optimizations for pullbacks (among others) (#12607)
@trilinos/intrepid2 ## Motivation When moving from Intrepid to Intrepid2, Sierra noticed substantial performance degradation in one of their performance tests. The most egregious offender was `HGRADtransformGRAD`, which ultimately calls `ArrayTools::Internal::matvecProduct()`. The functor called by the latter involved Kokkos subviews and a fair amount of run-time branching. This PR reimplements that functor to eliminate both the use of subviews and the run-time branching, in favor of compile-time template specializations of the functor, combined with logic that selects the appropriate specialization prior to the launch of the parallel kernel. In our testing with parameters intended to reproduce Sierra's performance tests, this results in a roughly 3x speedup, with performance that outperforms the Intrepid implementation on a serial CPU by about 10%. (On parallel runs, we can expect increasing performance gains, since Intrepid is a purely serial implementation.) `matvecProduct()`, which is optimized here, is called by a many methods in Intrepid2, including most of the pullbacks. We have not studied the performance of pullbacks beyond `HGRADtransformGRAD()` (which is an alias for `HCURLtransformVALUE()` ), but given the nature of the optimizations, we do expect that all these will see substantially improved performance. This commit also includes a fix for an issue in which some Intrepid2 tests could not be built with Apple clang due to clang not having an implementation of std::beta().
- Loading branch information