-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantitative comparison between autocorrelation function and block averaging #53
Comments
I'll just echo this (Owen is a student in my lab)! We are willing to do some tests, but want to make sure that we aren't reduplicating work that other people have done previously! |
I can’t speak for the whole team, but I was under the impression we left it vague because we don’t know if there’s a systematic answer. I tend to be more comfortable with block averaging, but I don’t think there’s a known “best practice” there yet.
Alan
… On Sep 6, 2018, at 4:11 PM, Michael Shirts ***@***.***> wrote:
I'll just echo this (Owen is a student in my lab)! We are willing to do some tests, but want to make sure that we aren't reduplicating work that other people have done previously!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmzuckerman_Sampling-2DUncertainty_issues_53-23issuecomment-2D419225332&d=DwMCaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=JQya56nURLtms-YUu8nBfEpxKEwoqH50bOvVzor9BY8&s=1vfMvZ_2gfxqYbs0ASE0MTYDQQwFGXGyWEeaeBjyfss&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AM-5F-2D8q2xv8yXz6QqqSGrYnnT1qKvsIdcks5uYYF1gaJpZM4WdZ7m&d=DwMCaQ&c=4sF48jRmVAe_CH-k9mXYXEGfSnM3bY53YSKuLUQRxhA&r=49qnaP-kgQR_zujl5kbj_PmvQeXyz1NAoiLoIzsc27zuRX32UDM2oX8NQCaAsZzH&m=JQya56nURLtms-YUu8nBfEpxKEwoqH50bOvVzor9BY8&s=2PzFUjhYpqZ7UaNgqjb2HWA9hrwBPiPOHYjw6aNAMeI&e=>.
________________________________
Dr. Alan Grossfield
Associate Professor
Department of Biochemistry and Biophysics
University of Rochester Medical Center
610 Elmwood Ave, Box 712
Rochester, NY 14642
Phone: 585 276 4193
http://membrane.urmc.rochester.edu
https://orcid.org/0000-0002-5877-2789
|
I'm not aware of a head-to-head comparison, but perhaps @mangiapasta may know. I would be quite interested in your results if you obtain any. One advantage of block-averaging is that it gives you an uncertainty estimate directly, without the need to separately compute the number of samples. |
I don't know offhand if there has been a comparison. I have a suspicion that under certain assumptions (e.g. finite correlation time) there may be simple relationships between estimates of the variances computed from each method, but don't quote me on that. |
From a "this is purer" POV, autocorrelation analysis seems to be
superior to block averaging. There's even a note in the original
Flyvberg paper that points out how the block averaging technique is a
balance between mathematical rigor and computational effort. I've had
the feeling that block averaging became popular because it was cheap and
could be done on the fly if the block size was preselected, whereas ACF
analysis should be done in post-processing, especially if you use FFT to
speed up the calculations.
…On 9/6/18 4:26 PM, mangiapasta wrote:
I don't know offhand if there has been a comparison. I have a
suspicion that under certain assumptions (e.g. finite correlation
time) there may be simple relationships between estimates of the
variances computed from each method, but don't quote me on that.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#53 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFGApngkMjBdY5aEYhVNURvtXiZ_ZPddks5uYYTpgaJpZM4WdZ7m>.
|
We tried computing uncertainties using the autocorrelation function recently and found the results to be quite poor in comparison to block averaging, especially when the autocorrelation fluctuates out to long times where it cannot be computing precisely. Block averaging has more trouble with such situations than those that decay quickly, but not nearly as much as the autocorrelation approach. |
In regards to @dwsideriusNIST comment "block averaging became popular because it was cheap and The proper/informative way to do block averaging explicitly requires checking all possible block sizes and not pre-selecting the block size. This is key. Is it 'rigorous'? I think it's close enough for physicists. Certainly the tail challenge in ACF noted above and in prior discussions is quite tricky. |
Glad to get all the discussion! This just SEEMS like a question that the statistics community must have answered at some point, isn't it? Understanding what you can conclude from timeseries is their bread and butter, right? If not, some recognition of what is what seems to be highly needed . . . Especially how to determine the number of independent samples (either explicitly (autocorrelation) or implicitly (block averaging)) in the most reliable way possible under standard conditions. When one transitions from the "we have collected thousands of correlation time of data, everything works" to the "we have just a few correlation times", presumably one method breaks before the other. Below "we have just a few correlation times" I would imagine everything fails. But it's at which point before it fails (where "fails" needs to be defined a bit better) that's most interesting. |
Also, block averaging vs. block bootstrapping is of interest . . . |
i think block bootstrapping implicitly assumes you have multiple samples ... which is (part of) what one is trying to figure out. fyi a caution on bootstrapping: http://statisticalbiophysicsblog.org/?p=213 |
in reply to @dmzuckerman "The proper/informative way to do block averaging explicitly requires checking all possible block sizes and not pre-selecting the block size. This is key. Is it 'rigorous'? I think it's close enough for physicists. Certainly the tail challenge in ACF noted above and in prior discussions is quite tricky." I think this gets back to one of our purposes in writing the paper: education about the underlying assumptions / statistical foundation of uncertainty estimation techniques. I'm entirely on board with the need to "check your block sizes" in post-process, hence the inclusion of that exact point in our paper. But I'll also echo something @mangiapasta said earlier: "... you'd be surprised what people do ..." Particularly when using codes traceable to old editions of Allen&Tildesley or Frenkel&Smit that used pre-set block sizes. |
It's not a full investigation of the two approaches, but we briefly compared them in Figs 2&3 here: https://www.tandfonline.com/doi/pdf/10.1080/08927022.2017.1375492?needAccess=true We found that autocorrelation plots were much easier to read/automate reading than blocks. It's also a little strange to measure the autocorrelation by trying to find a blocksize where you don't see the effects of it (ie eq 25 in the Flyvberg paper). I think @dwsideriusNIST is correct wrt why block averaging was a thing. When I was doing multiple blocks to find the smallest blocksize, I found that just directly calculating the autocorrelation (with FFTs) was faster anyway. |
There are two goals in this type of error analysis: (1) deterimine autocorr time and (2) determine uncertainty. For (1) I guess autocorr plot is better. But for (2), which arguably is more bottom-line, I would put my money on blocking. In Fig. 3 of @richardjgowers paper linked above, the BSE seems to be convincingly estimated. I'm sure both approaches struggle with insufficient data - no surprise there. |
So I guess I'd leave for the next version of this document in 1-2 years; it would be super useful the field as whole to have some more quantitative answer to this question. |
I think Mike Gilson's group has done some work trying to look at some of this. Someone may want to rope him in for a discussion. With respect to Michael's comment:
Note that this can be addressed in the repo as soon as anyone wants to, and then those changes just naturally roll into the next peer-reviewed version when they are ready. :) |
There is still another method based on time-series analysis which is commonly used in statistics for the analysis of Markov Chain Monte Carlo samples. It is based on the fit of the sample by an Auto Regressive process. See e.g. Thompson 2010. This might be an interesting addition to the discussion in Sect 7.3... |
In section 7.3, there is a discussion of autocorrelation analysis and block averaging as methods for estimating the number of independent samples, but the discussion does not make any recommendations on whether it is better to use either method in specific cases.
Have there been any studies to quantitatively compare these two measures? For example, testing the minimum number of samples before each method becomes unreliable, or whether the extra information in a block averaging scheme makes a difference in uncertainty. We are working on a best practices document for property calculation from MD and are interested in the effect of choice of method.
The text was updated successfully, but these errors were encountered: