-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resuming a calculation with hot chains (-nc) #54
base: master
Are you sure you want to change the base?
Conversation
How about leaving the default to False (because they take up a lot of disk space), and instead throwing an error if resume=True and we are sampling with resume=False? The only issue with that I see is that it could be that the runs were initially done with resume=False (with writeHotChains=False), and then you cannot abort/resume them. But I think it's probably not fun for people who now suddenly have jobs that use up a lot of memory. Perhaps we can also set a more useful error message than the one that sparked this adjustment? |
I agree with @vhaasteren that just changing the default
I'm not sure about interpreting |
Excellent suggestions @kdolum, I agree with all of it. I like the warnings you suggest |
Thanks, @vhaasteren and @kdolum , this sounds good. I was initially thinking in terms of 4 temperatures, but if people use >10 saving extra chains by default might indeed cause problems. In case we are avoiding drastic changes, I have one more suggestion. IIn particular, to show a warning instead of an error when attempting to resume the run without the old chains. Perhaps it can also avoid practical issues for users. I incorporated it in my latest commit. What do you think? |
I think it should be an error if you try to resume a parallel-tempering run with no hot chain information. What are we supposed to use as the initial samples for the hot chains? I guess it currently uses the initial sample given to the sampler (which I imagine is otherwise ignored on resuming). That seems unfortunate. What is the purpose of resuming in this case, instead of just doing a new run to gather more samples? |
Adding a "special request" variable, at least for my case, will not make a difference for me because it's not in |
I like the extra variable that suppresses the error that @kdolum suggests. And, we can totally add it to enterprise_extensions. I'd approve such a PR It would have been neater to not write the hot chain files at all, but the sampler already has this behavior. Let's not change it unless we have to. Perhaps add one more check for |
Hi @vhaasteren , @kdolum , I got distracted to other work, but now I introduced this new variable that we discussed. Shall we close the PR now? |
I'm confused by the logic here. It looks like what you did is that if the old chain file is the wrong length but ignoreHotChainsWhenResuming is set, and this is not the cold chain, then we ignore the error and start the chain file anew. But it looks like it will still print something about how it is resuming.
|
@kdolum , I can rename the variable, but the discussion of (1, 3) resuming with zero/non-zero length cold chains, (4) selecting a sample, -- still all boils down to the same case of starting the burn in in hot chains again. So, this seems as micro-managing something that is not important, and/or discriminating between incorrect use cases. One either cares about hot chains and resumes correctly, or not, so it should not make any difference to users. @vhaasteren , what do you think? |
Because
writeHotChains=False
by default, and it is not set as a keyword argument insetup_sampler()
inenterprise extensions
,I suggest resetting the default argument for now, to avoid problems with the results.
Tagging @vhaasteren , @astrolamb