enhanced-sampling.tex

\section{Assessing Uncertainty in Enhanced Sampling Simulations}
\label{sec:enhanced}

While recent advances in computational hardware have allowed MD simulations of systems with biological relevance to routinely reach timescales ranging from hundreds of \si{\nano\second} to \si{\micro\second}, in many cases this is still not long enough to obtain equilibrated (i.e., Boltzmann-weighted) structural populations.  Intrinsic timescales of the systems may be much longer.
Enhanced sampling methods can be used to obtain well-converged ensembles faster than conventional MD. In general, enhanced sampling methods work through a combination of modifying the underlying energy landscape and/or thermodynamic parameters to increase the rate at which energy barriers are crossed along with some form of reweighting to recover the unbiased ensemble \cite{Zuckerman2011}. However, such methods do not guarantee a converged ensemble, and care must be taken when using and evaluating enhanced sampling methods.

Generally speaking, uncertainty analysis is more challenging for data generated by an enhanced sampling method.
The family of enhanced equilibrium sampling methods, examples of which include replica exchange and variants \cite{Swendsen-1986,Sugita1999,Okamoto-2000}, local elevation \cite{Huber1994}, conformational flooding \cite{Grubmueller1995}, metadynamics \cite{Bussi2006a,Laio2008}, and adaptive biasing force \cite{Darve2001,Darve2008,Comer2015} to name a few, are complex and the resulting data may have a highly non-trivial correlation structure.
In replica exchange, for example, the ensemble at a temperature of interest will be based on multiple return visits of different sequentially correlated trajectories.

Before performing an enhanced-sampling simulation, consider carefully whether the technique is needed, and consult the literature for best practices in setting up a simulation.
Even a straightforward MD simulation requires considerable planning, and the complexity is much greater for enhanced techniques.

Given the subtleties of these sampling approaches, when possible, consider taking a ``bottom line'' approach, and assessing sampling based on multiple independent runs.
The variance among these runs, if the approach is not biased, will help to quantify the overall sampling.
Note that methods applicable to global assessment of multiple trajectories (Sec.\ \ref{sec:globalMultiTraj}) should be valid for analyzing multiple runs of an arbitrary method.
However, a caveat for the approach of Zhang et al.\ \cite{Zhang2010} is that some dynamics trajectory segments would be required to perform state construction by kinetic clustering.

\subsection{Replica Exchange Molecular Dynamics}
One of the most popular enhanced sampling methods is replica exchange MD (REMD) \citep{Sugita1999}; see also \cite{Swendsen-1986}. Broadly speaking, REMD consists of running parallel MD simulations on a number of non-interacting replicas of a system, each with a different Hamiltonian and/or thermodynamic parameters (e.g., temperature), and periodically exchanging system coordinates between replicas according to a Metropolis criterion which maintains Boltzmann-factor sampling for all replicas.


In order to assess the results of a REMD simulation, it is important to consider not just the overall convergence of the simulation to the correct Boltzmann-weighted ensemble of structures (via e.g., combined clustering, see Sec.~\ref{sec:quick}), but how efficiently the REMD simulation is doing so. These concepts are termed "thermodynamic efficiency" and "mixing efficiency" by Abraham and Gready \citep{Abraham2008}, and it is quite possible to achieve one without the other; both must be assessed. In order for sampling to be efficient, coordinates must be able to move freely in replica space.

In practical settings, several metrics are often used to assess these two efficiencies, a few of which we list below.  In these definitions, note that we refer to both "coordinate trajectories" and "replica trajectories". A "coordinate trajectory" follows an individual system's continuous trajectory as it traverses replica space (e.g., a system experiencing multiple temperatures as it is exchanged during a temperature REMD simulation). A "replica trajectory" is the sequence of configurations corresponding to a single replica under fixed Hamiltonian and thermodynamic conditions, (e.g., all structures at a temperature of 300 K in a temperature REMD simulation).  Thus, a replica trajectory consists of concatenated coordinate-trajectory segments and \textit{vice versa}.

Below are several checks that should be applied to REMD simulation data.

\begin{itemize}
  \item Exchange acceptance. The exchange acceptance rate (i.e., the number of exchanges divided by the number of exchange attempts) between neighboring replicas should be roughly equivalent to each other and to the target acceptance rate. A low exchange acceptance between neighboring replicas relative to the average exchange acceptance rate creates a bottleneck in replica space which in turn can lead to poor sampling of the overall configuration space.  In such cases, the replica spacing may need to be decreased or additional replicas used.  Conversely, a high exchange acceptance rate between neighboring replicas relative to the average exchange acceptance rate may indicate that more resources than necessary are being used to simulate replicates, and that good sampling can be achieved with fewer replicas or larger replica spacing.
  \item Replica round trips. The time taken for a coordinate trajectory to travel from the lowest replica to the highest and back is called the replica "round trip" time. Over the course of a REMD simulation, any given coordinate trajectory should make multiple round trips. The rationale behind this is that every replica should contribute to enhancing the sampling of every set of starting coordinates.
One can look at the average, minimum, and maximum round trip times among the coordinate trajectories: these should be similar for any given set of coordinates. See e.g., Fig.\ 6 in \citep{Roe2014}. %\textcolor{red}{PNP comment: not clear to me why these should be equivalent.  Isn't there a distribution of times?}
If they are not, it is likely due to one or more bottlenecks in replica space which can be identified by a relatively low exchange acceptance rate (see the previous bullet point).
  \item Replica residence time. The time a coordinate trajectory spends at a replica is called the "replica residence time". For replica sampling to be efficient, the replica residence time for each set of starting coordinates at each replica should be roughly equivalent. If it is not (i.e., if a set of starting coordinates is spending a much larger amount of time at certain replicas compared to the overall average) this can also indicate one or more bottlenecks in replica space. An example of this is shown in Fig.\ 7 in Ref.\ \citep{Roe2014}.
  \item Distributions of quantities calculated from coordinate trajectories. If all coordinates are moving freely in replica space, they should eventually converge to the same ensemble of structures. Comparing distributions of various quantities from coordinate trajectories can provide a measure of how converged the simulation is. For example, one can compare the distribution of RMSD values of coordinate trajectories to a common reference structure; see e.g., Fig.\ 8 in Ref.\ \citep{Henriksen2013}. Poor overlap can be an indication that replica efficiency is poor or the simulation is not yet converged.
\end{itemize}

All of the above quantities (replica residence time, round trip time, lifetimes etc) can be calculated with CPPTRAJ \citep{Roe2013}, which is freely available from \url{https://github.com/Amber-MD/cpptraj} or as part of AmberTools (\url{http://ambermd.org}).

It may also be useful to perform multiple REMD runs.  Using the standard uncertainty among runs can quantify uncertainty and provide the basis for a confidence interval with an appropriate coverage factor - see definitions in Sec.\ \ref{sec:scope}.  If the ensembles produced depend significantly on the set of starting configurations, that is a sign of incomplete sampling.

\subsection{Weighted Ensemble simulations}

The weighted ensemble (WE) method orchestrates an ensemble of trajectories that are intermittently pruned or replicated in order to enhance sampling of difficult-to-access regions of configuration space \cite{Huber-1996}.
The final set of trajectories can be visualized as a tree structure based on the occasional replication and pruning events.
WE is an unbiased method that can be used to sample rare transient behavior \cite{Zhang2010a} as well as steady states \cite{Bhatt2010a} including equilibrium \cite{Suarez2014}.

Like other enhanced sampling methods, WE's tree of trajectories has a complex correlation structure requiring care for uncertainty analysis.
It is important to understand the basic theory and limitations of the WE method, as is discussed in a
\href{https://westpa.github.io/westpa/overview.html}{WE overview document}.

From a practical standpoint, the safest way to assess uncertainty in WE simulations is to run multiple instances (which can be seeded from identical or different starting structures depending on the desired calculation) from which a variance and standard uncertainty in any observable can be calculated.
Note particularly that WE tracks the time evolution of observables as the system relaxes (perhaps quite slowly) to equilibrium or another steady state \cite{Zhang2010a}; hence, the variance computed in an observable from multiple runs should be based on values at the same time point.

When it is necessary to estimate uncertainty based on a single WE run, the user should treat the (ensemble-weighted) value of an observable measured over time much like an observable in a standard single MD simulation; this is because the correlations in ensemble averages are sequential in time.
First, as discussed in Sec.\ \ref{sec:quick}, the time trace of the observable should be inspected for relaxation to a nearly constant value about which fluctuations occur.
A transient/equilibration period should be removed in analogy to MD - see Sec.\ \ref{sec:equil} - and then best practices for single observable uncertainties should be followed as described in Sec.\ \ref{sec:specific}.
Despite this rather neat analogy to conventional MD, experience has shown that run-to-run variance in WE simulations of challenging systems can be  large, so multiple runs are  advised.  In the future, variance-reduction techniques may alleviate the need for multiple runs.