-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question about the biggest N #37
Comments
Thanks for your interest! bigKRLS should, when it runs out of RAM, switch
the computation to disk (using swap, I.e., putting some on ROM). However
those calculations are considerably slower and it is unlikely you would
consider the speed trade off tolerable on 16gb RAM. There aren’t too many
hyperparameters but they matter some; I recommend fitting at N=3000 to
benchmark your machine and then increasing keeping the quadratic memory
footprint in mind.
On Mon, Jan 20, 2020 at 3:11 PM StevenLi-DS ***@***.***> wrote:
Thanks for the package! I'm trying to use it in my study with a sample
size around 26,000 (10 covariates). However, in the following paper:
Messy Data, Robust Inference? Navigating Obstacles to Inference with
bigKRLS:
bigKRLS can handle datasets up to approximately N = 14,000 on a personal
machine before
reaching the 8 GB cutoff
Thus, I'm concerned about whether I should continue. Does this mean the
program will stop running if I fit a dataset with N > 14,000? I have a
laptop with 16 GB RAM. Will it be OK?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#37?email_source=notifications&email_token=AES2KXFDISUYYT3YU4XPG6TQ6YVRRA5CNFSM4KJLA3T2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHPF4BA>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AES2KXAXCO5HXVIMIWFC65TQ6YVRRANCNFSM4KJLA3TQ>
.
--
Pete Mohanty, PhD
Data Scientist at Google
|
I tried with N = 5,000, and |
Close but not quite. Saving the full model output involves several NxN
matrices (such as variance-covariance). I am away from my laptop but
anticipate the output is \propto 5 N^2.
On Mon, Jan 20, 2020 at 8:36 PM StevenLi-DS ***@***.***> wrote:
I tried with N = 5,000, and save.bigKRLS() generated a folder of files
that takes 1.5 GB.
Assume my total N = 25,000, then I would need an 1.5*(25000/5000)^2 = 37.5
GB RAM to run the model right?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#37?email_source=notifications&email_token=AES2KXDPUKWGL4SQEDHZXA3Q6Z3UHA5CNFSM4KJLA3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJOOR7I#issuecomment-576514301>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AES2KXDQUX6LPQEQWGSJGBDQ6Z3UHANCNFSM4KJLA3TQ>
.
--
Pete Mohanty, PhD
Data Scientist at Google
|
@rdrr1990 any hit? |
Hi Stephen,
I'm not quite certain about the source of the error, but it doesn't look like it's an issue with the bigKRLS package. I spent a little while digging through LAPACK's documentation, and it looks like error is something raised by LAPACK's eigenvalue solver.<http://www.netlib.org/lapack/explore-html/d2/d8a/group__double_s_yeigen_gaeed8a131adf56eaa2a9e5b1e0cce5718.html> The error code specifically seems to be from this line:
IF( n.GT.0 .AND. vu.LE.vl )
info = -8
The conditions mean that the order of the matrix is greater than zero and the upper bound on the eigenvalue solver is less than the lower bound. Those bounds aren't something that we set or even have the option to set (via Armadillo or Rcpp), so this is either a bug with Armadillo or a numerical/memory problem caused by an oversized input matrix. My guess is the latter, but it's difficult to be sure without a reproducible example. As an experiment, you might try running the bigKRLS estimation routine on a random subset of your data at a size that will clearly run (say, n=10,000 or so). If the estimation routine runs, then the issue is likely related to the size of the input matrix. If you see an error, though, then there's probably some other issue going on.
- Robert
On Wed, Jan 22, 2020 at 9:05 PM StevenLi-DS <[email protected]<mailto:[email protected]>> wrote:
@rdrr1990<https://github.com/rdrr1990> any hit?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#37?email_source=notifications&email_token=ABNR7ZCEX52WN5EJLWKJZN3Q7D3NHA5CNFSM4KJLA3T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJVYJ4Y#issuecomment-577471731>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABNR7ZG65GPUWJAPVVLQERDQ7D3NHANCNFSM4KJLA3TQ>.
…--
Postdoctoral Fellow, Perry World House, University of Pennsylvania
Website<https://rbshaffer.github.io/>
|
Hi @rbshaffer. Thanks for the reply. I thought it was due to my sample size. I've tried with a sample of my dataset with more than N = 13,000, which had no issue at all. I will try it again and see if there is anyway I can share the dataset. |
Thanks for the package! I'm trying to use it in my study with a sample size around 26,000 (10 covariates). However, in the following paper:
Messy Data, Robust Inference? Navigating Obstacles to Inference with bigKRLS:
Thus, I'm concerned about whether I should continue. Does this mean the program will stop running if I fit a dataset with N > 14,000? I have a laptop with 16 GB RAM. Will it be OK?
The text was updated successfully, but these errors were encountered: