Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

futuremice scales poorly on 64/128 cores machines #566

Closed
vkhodygo opened this issue Jul 8, 2023 · 10 comments
Closed

futuremice scales poorly on 64/128 cores machines #566

vkhodygo opened this issue Jul 8, 2023 · 10 comments

Comments

@vkhodygo
Copy link
Contributor

vkhodygo commented Jul 8, 2023

Describe the bug
futuremice runs just fine when I do 10 or 20 imputations at the same time. When I increase the number to say 50 or 100 while keeping all other parameters the same, it just sits there indefinitely.

To Reproduce
Not sure that any code would be suitable here.

Expected behavior
Running 10, 20, 50, 100 imputations at once should take roughly the same amount of time.

@vkhodygo vkhodygo added the bug label Jul 8, 2023
@stefvanbuuren
Copy link
Member

This sounds more like a resource problem than a bug.

@vkhodygo
Copy link
Contributor Author

@stefvanbuuren How is that so?

@stefvanbuuren
Copy link
Member

It would be useful if we can have a reprex somehow, otherwise it is very hard for us to chase. Did you try setting m = 200 with a small problem?

@vkhodygo
Copy link
Contributor Author

I can give it a go, but that might take some time.

@vkhodygo
Copy link
Contributor Author

@stefvanbuuren I tried to come up with an example that closely matches my data, but that's cumbersome. Instead, I used the code from the futuremice vignette:

library(mice)
version()

set.seed(123)

n_features = 20
small_covmat <- diag(n_features)
small_covmat[small_covmat == 0] <- 0.5
small_data <- MASS::mvrnorm(10000, 
                      mu = c(1:n_features) * 0,
                      Sigma = small_covmat)

small_data_with_missings <- ampute(small_data, prop = 0.8, mech = "MCAR")$amp

n_streams <- 5
start_time <- Sys.time()

imp <- futuremice(small_data_with_missings,
                  parallelseed = 123,
                  n.core = n_streams,
                  m = n_streams,
                  maxit = 1,
                  method = "rf",
                  ntrees=10)

end_time <- Sys.time()
end_time - start_time
$ Rscript main.R

Attaching package:miceThe following object is masked frompackage:stats:

    filter

The following objects are masked frompackage:base:

    cbind, rbind

[1] "mice 3.16.0 2023-05-24 /home/software/.local/easybuild/software/R/4.2.0-foss-2021b/lib/R/library"
Time difference of 24.37422 secs

This is what I get when n_streams == 50:

$ Rscript main.R

Attaching package:miceThe following object is masked frompackage:stats:

    filter

The following objects are masked frompackage:base:

    cbind, rbind

[1] "mice 3.16.0 2023-05-24 /home/software/.local/easybuild/software/R/4.2.0-foss-2021b/lib/R/library"
Time difference of 2.550763 mins

and when n_features = 100:

$ Rscript main.R

Attaching package:miceThe following object is masked frompackage:stats:

    filter

The following objects are masked frompackage:base:

    cbind, rbind

[1] "mice 3.16.0 2023-05-24 /home/software/.local/easybuild/software/R/4.2.0-foss-2021b/lib/R/library"
Time difference of 2.003762 mins

Real-life numbers are much-much worse since I work mostly with categories, and their number is significantly higher. As this number goes up, literally every process starts spawning threads like there is no tomorrow. I understand that there is some overhead, but that's a bit too much. This automatically results in 100% load even when n_streams is low.

Just to show what I have to deal with: the same code with the actual data and n_streams == 5 needs about 25 minutes to finish the very first iteration on my laptop. On the cluster with two CPUs, 32 cores each, the code with n_streams == 25 has been running for an hour, and I have no idea when it'll be done.

@vkhodygo
Copy link
Contributor Author

@stefvanbuuren Got it done, at least something:
The same code with n_streams == 5 on the cluster:

Time difference of 2.386232 hours

and with n_streams == 25:

Time difference of 12.78157 hours

I'd blame Intel MKL or something futureverse/future#405 , but those are AMD machines.

@stefvanbuuren
Copy link
Member

Real-life numbers are much-much worse since I work mostly with categories, and their number is significantly higher.

Perhaps the problem is not with futuremice() but caused by a large number of categories. If you have 1001 categories, then mice tries to create a 1000 dummy variables...

What happens if you specify method = "pmm"?

@vkhodygo
Copy link
Contributor Author

caused by a large number of categories. If you have 1001 categories, then mice tries to create a 1000 dummy variables...

The number of categories is high, that's true. However, this should not affect parallel and independent imputations.
I can run a handful of them just fine, but any further than that and it takes too much time.

Anyway, real data with pmm:

Error in (function (.x, .f, ..., .progress = FALSE)  :In index: 1.
Caused by error in `chol.default()`:
! the leading minor of order 1 is not positive
Calls: futuremice ... resolve.list -> signalConditionsASAP -> signalConditions
Execution halted

The reprex works just fine.

@stefvanbuuren
Copy link
Member

stefvanbuuren commented Jul 19, 2023

Thanks. On my desktop with 9 free cores, I found your reprex executes uses 15 seconds (n = 5), 14 seconds (n = 9), 21 seconds (n = 18), 47 seconds (n = 50) and 1.37 minutes (n = 100). I think this is as it should be.

I am not sure what causes the leading minor of order 1 is not positive error, but I have seen this error appearing when there are a lot of collinear variables. mice() tries very hard to remove these during the iteration by means of the internal remove.lindep() function. This checking process is - however - inefficient and can sometimes take >99% of the processor time.

Random forests (method = "rf") is quite robust against collinear variables, so remove.lindep() may be an overkill for your setup. It is possible to bypass remove.lindep() by adapting your call to mice(..., eps = 0). Could you gives this a try on the real data?

#225 #306

@stefvanbuuren
Copy link
Member

stefvanbuuren commented Jul 19, 2023

It could also help to simplify your model, e.g., by using quickpred() to select the most important predictors for each variable that you want to impute.

@amices amices locked and limited conversation to collaborators Jul 19, 2023
@stefvanbuuren stefvanbuuren converted this issue into discussion #570 Jul 19, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants