QUDA v1.0.0
Version 1.0.0 - 10 January 2020
-
Add support for CUDA 10.2: QUDA 1.0.0 is supported on CUDA 7.5-10.2
using either GCC or clang compilers. CUDA 10.x and either GCC >=
6.x or clang >= 6.x are highly recommended. -
Significant improvements to the CMake build system and removal of the
legacy configure build. -
Added more targeted compilation options to constrain which
precisions and reconstruct types are compiled. QUDA_PRECISION is a
cmake parameter that is a 4-bit number corresponding to which
precisions are enabled, with 1 = quarter, 2 = half, 4 = single and 8
= double, the default is 14 which enables double, single and half
precision. QUDA_RECONSTRUCT is a 3-bit number corresponding to
which reconstruct types are enabled, with 1 = reconstruct-8/9, 2 =
reconstruct-12/13 and 4 = reconstruct-18, the default is 7 which
enables all reconstruct types. -
Completely rewritten all dslash kernels using the accessor
framework. This dramatically reduces code complexity and improve
performance. -
New physics functionality added: gauge Laplace kernel, Gaussian
quark smearing, topological charge density. -
QUDA can now be built to either utilize texture-memory reads or to
use direct memory accessing (cmake option QUDA_TEX). The default
has textures on, though we note that since Pascal it can be
advantageous to disable textures and utilize direct reads. -
QUDA is no longer supported on the Fermi generation of GPUs (sm_20
and sm_21). Compilation and running should still be possible but
will require compilation with texture objects disabled. -
Added supported for quarter precision (QUDA_QUARTER_PRECISION) for
the linear operator and associated solvers. -
Implemented both CA-CG and CA-GCR communication avoid solvers, for
use either as stand-alone solvers or as a means to accelerate
multigrid. -
Continued evolution and optimization of the multigrid framework.
Regardless, we advise users to use the latest develop branch when
using multigrid, since it continues to be a fast-moving target with
continual focus on optimization and improvement. -
An implementation of the Thick Restarted Lanczos Method (TRLM) for
eigenvector solving of the normal operator. -
Lanczos-accelerated multigrid through the use of coarse-grid
deflation and / or using singular vectors to define the prolongator. -
Removal of the legacy contraction and co-variant derivative
algorithms, and replacement with accessor-based rewrites. -
Improved heavy-quark residual convergence which ensure correct
convergence for MILC heavy quark observables. -
Experimental support for Just-In-Time (JIT) compilation using Jitify.
-
Significantly improved unit testing framework using ctest.
-
QUDA can now be built to target Google's address sanitizer
(CMAKE_BUILD_TYPE option is SANITIZE) for improved debugging. -
QUDA can now download and install the USQCD libraries QMP and QIO
automatically as part of the compilation process. To enable this,
the option QUDA_DOWNLOAD_USQCD=ON should be set. Similarly to Eigen
installation this requires access to the outside internet. -
QUDA can now download and install the ARPACK library automatically
if the QUDA_DOWNLOAD_ARPACK option is enabled. -
Updated to CUB 1.8.
-
Multiple bug fixes and clean up to the library. Many of these are
listed here: https://github.com/lattice/quda/milestone/21?closed=1