Quantitative comparison between autocorrelation function and block averaging #53

ocmadin · 2018-09-06T17:43:55Z

In section 7.3, there is a discussion of autocorrelation analysis and block averaging as methods for estimating the number of independent samples, but the discussion does not make any recommendations on whether it is better to use either method in specific cases.

Have there been any studies to quantitatively compare these two measures? For example, testing the minimum number of samples before each method becomes unreliable, or whether the extra information in a block averaging scheme makes a difference in uncertainty. We are working on a best practices document for property calculation from MD and are interested in the effect of choice of method.

mrshirts · 2018-09-06T20:11:33Z

I'll just echo this (Owen is a student in my lab)! We are willing to do some tests, but want to make sure that we aren't reduplicating work that other people have done previously!

agrossfield · 2018-09-06T20:13:53Z

I can’t speak for the whole team, but I was under the impression we left it vague because we don’t know if there’s a systematic answer. I tend to be more comfortable with block averaging, but I don’t think there’s a known “best practice” there yet. Alan

…

On Sep 6, 2018, at 4:11 PM, Michael Shirts ***@***.***> wrote: I'll just echo this (Owen is a student in my lab)! We are willing to do some tests, but want to make sure that we aren't reduplicating work that other people have done previously! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmzuckerman_Sampling-2DUncertainty_issues_53-23issuecomment-2D419225332&d=DwMCaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=JQya56nURLtms-YUu8nBfEpxKEwoqH50bOvVzor9BY8&s=1vfMvZ_2gfxqYbs0ASE0MTYDQQwFGXGyWEeaeBjyfss&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AM-5F-2D8q2xv8yXz6QqqSGrYnnT1qKvsIdcks5uYYF1gaJpZM4WdZ7m&d=DwMCaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=JQya56nURLtms-YUu8nBfEpxKEwoqH50bOvVzor9BY8&s=2PzFUjhYpqZ7UaNgqjb2HWA9hrwBPiPOHYjw6aNAMeI&e=>.

________________________________ Dr. Alan Grossfield Associate Professor Department of Biochemistry and Biophysics University of Rochester Medical Center 610 Elmwood Ave, Box 712 Rochester, NY 14642 Phone: 585 276 4193 http://membrane.urmc.rochester.edu https://orcid.org/0000-0002-5877-2789

dmzuckerman · 2018-09-06T20:17:13Z

I'm not aware of a head-to-head comparison, but perhaps @mangiapasta may know. I would be quite interested in your results if you obtain any.

One advantage of block-averaging is that it gives you an uncertainty estimate directly, without the need to separately compute the number of samples.

SeroNISTPI · 2018-09-06T20:26:16Z

I don't know offhand if there has been a comparison. I have a suspicion that under certain assumptions (e.g. finite correlation time) there may be simple relationships between estimates of the variances computed from each method, but don't quote me on that.

dwsideriusNIST · 2018-09-06T20:51:20Z

From a "this is purer" POV, autocorrelation analysis seems to be superior to block averaging. There's even a note in the original Flyvberg paper that points out how the block averaging technique is a balance between mathematical rigor and computational effort. I've had the feeling that block averaging became popular because it was cheap and could be done on the fly if the block size was preselected, whereas ACF analysis should be done in post-processing, especially if you use FFT to speed up the calculations.

…

On 9/6/18 4:26 PM, mangiapasta wrote: I don't know offhand if there has been a comparison. I have a suspicion that under certain assumptions (e.g. finite correlation time) there may be simple relationships between estimates of the variances computed from each method, but don't quote me on that. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#53 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFGApngkMjBdY5aEYhVNURvtXiZ_ZPddks5uYYTpgaJpZM4WdZ7m>.

ajschult · 2018-09-06T20:54:28Z

We tried computing uncertainties using the autocorrelation function recently and found the results to be quite poor in comparison to block averaging, especially when the autocorrelation fluctuates out to long times where it cannot be computing precisely. Block averaging has more trouble with such situations than those that decay quickly, but not nearly as much as the autocorrelation approach.

dmzuckerman · 2018-09-06T21:18:54Z

In regards to @dwsideriusNIST comment "block averaging became popular because it was cheap and
could be done on the fly if the block size was preselected"

The proper/informative way to do block averaging explicitly requires checking all possible block sizes and not pre-selecting the block size. This is key. Is it 'rigorous'? I think it's close enough for physicists. Certainly the tail challenge in ACF noted above and in prior discussions is quite tricky.

mrshirts · 2018-09-06T21:56:33Z

Glad to get all the discussion! This just SEEMS like a question that the statistics community must have answered at some point, isn't it? Understanding what you can conclude from timeseries is their bread and butter, right?

If not, some recognition of what is what seems to be highly needed . . . Especially how to determine the number of independent samples (either explicitly (autocorrelation) or implicitly (block averaging)) in the most reliable way possible under standard conditions. When one transitions from the "we have collected thousands of correlation time of data, everything works" to the "we have just a few correlation times", presumably one method breaks before the other. Below "we have just a few correlation times" I would imagine everything fails. But it's at which point before it fails (where "fails" needs to be defined a bit better) that's most interesting.

mrshirts · 2018-09-06T21:56:45Z

Also, block averaging vs. block bootstrapping is of interest . . .

dmzuckerman · 2018-09-06T22:20:36Z

i think block bootstrapping implicitly assumes you have multiple samples ... which is (part of) what one is trying to figure out.

fyi a caution on bootstrapping: http://statisticalbiophysicsblog.org/?p=213

dwsideriusNIST · 2018-09-07T12:14:22Z

in reply to @dmzuckerman "The proper/informative way to do block averaging explicitly requires checking all possible block sizes and not pre-selecting the block size. This is key. Is it 'rigorous'? I think it's close enough for physicists. Certainly the tail challenge in ACF noted above and in prior discussions is quite tricky."

I think this gets back to one of our purposes in writing the paper: education about the underlying assumptions / statistical foundation of uncertainty estimation techniques. I'm entirely on board with the need to "check your block sizes" in post-process, hence the inclusion of that exact point in our paper. But I'll also echo something @mangiapasta said earlier: "... you'd be surprised what people do ..." Particularly when using codes traceable to old editions of Allen&Tildesley or Frenkel&Smit that used pre-set block sizes.

richardjgowers · 2018-10-15T13:48:59Z

It's not a full investigation of the two approaches, but we briefly compared them in Figs 2&3 here:

https://www.tandfonline.com/doi/pdf/10.1080/08927022.2017.1375492?needAccess=true

We found that autocorrelation plots were much easier to read/automate reading than blocks. It's also a little strange to measure the autocorrelation by trying to find a blocksize where you don't see the effects of it (ie eq 25 in the Flyvberg paper).

I think @dwsideriusNIST is correct wrt why block averaging was a thing. When I was doing multiple blocks to find the smallest blocksize, I found that just directly calculating the autocorrelation (with FFTs) was faster anyway.

dmzuckerman · 2018-10-16T20:42:32Z

There are two goals in this type of error analysis: (1) deterimine autocorr time and (2) determine uncertainty. For (1) I guess autocorr plot is better. But for (2), which arguably is more bottom-line, I would put my money on blocking. In Fig. 3 of @richardjgowers paper linked above, the BSE seems to be convincingly estimated. I'm sure both approaches struggle with insufficient data - no surprise there.

mrshirts · 2018-10-16T21:24:10Z

So I guess I'd leave for the next version of this document in 1-2 years; it would be super useful the field as whole to have some more quantitative answer to this question.

davidlmobley · 2018-11-07T20:40:38Z

I think Mike Gilson's group has done some work trying to look at some of this. Someone may want to rope him in for a discussion.

With respect to Michael's comment:

So I guess I'd leave for the next version of this document in 1-2 years; it would be super useful the field as whole to have some more quantitative answer to this question.

Note that this can be addressed in the repo as soon as anyone wants to, and then those changes just naturally roll into the next peer-reviewed version when they are ready. :)

ppernot · 2018-12-05T10:17:17Z

There is still another method based on time-series analysis which is commonly used in statistics for the analysis of Markov Chain Monte Carlo samples. It is based on the fit of the sample by an Auto Regressive process. See e.g. Thompson 2010. This might be an interesting addition to the discussion in Sect 7.3...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantitative comparison between autocorrelation function and block averaging #53

Quantitative comparison between autocorrelation function and block averaging #53

ocmadin commented Sep 6, 2018

mrshirts commented Sep 6, 2018

agrossfield commented Sep 6, 2018 via email

dmzuckerman commented Sep 6, 2018

SeroNISTPI commented Sep 6, 2018

dwsideriusNIST commented Sep 6, 2018 via email

ajschult commented Sep 6, 2018

dmzuckerman commented Sep 6, 2018

mrshirts commented Sep 6, 2018

mrshirts commented Sep 6, 2018

dmzuckerman commented Sep 6, 2018

dwsideriusNIST commented Sep 7, 2018

richardjgowers commented Oct 15, 2018

dmzuckerman commented Oct 16, 2018

mrshirts commented Oct 16, 2018

davidlmobley commented Nov 7, 2018

ppernot commented Dec 5, 2018

Quantitative comparison between autocorrelation function and block averaging #53

Quantitative comparison between autocorrelation function and block averaging #53

Comments

ocmadin commented Sep 6, 2018

mrshirts commented Sep 6, 2018

agrossfield commented Sep 6, 2018 via email

dmzuckerman commented Sep 6, 2018

SeroNISTPI commented Sep 6, 2018

dwsideriusNIST commented Sep 6, 2018 via email

ajschult commented Sep 6, 2018

dmzuckerman commented Sep 6, 2018

mrshirts commented Sep 6, 2018

mrshirts commented Sep 6, 2018

dmzuckerman commented Sep 6, 2018

dwsideriusNIST commented Sep 7, 2018

richardjgowers commented Oct 15, 2018

dmzuckerman commented Oct 16, 2018

mrshirts commented Oct 16, 2018

davidlmobley commented Nov 7, 2018

ppernot commented Dec 5, 2018