Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a Matrix with more than 2,147,483,647 elements #71

Open
amedeedaboville opened this issue Sep 4, 2016 · 2 comments
Open

Creating a Matrix with more than 2,147,483,647 elements #71

amedeedaboville opened this issue Sep 4, 2016 · 2 comments

Comments

@amedeedaboville
Copy link

Hi,
The matrix dimensions in BidMat use the Int type, which seems to be limiting the number of elements I can fit in a matrix. After 2147483647 elements there are calculations in the Matrix class about the total number of elements that overflow and return a java.lang.NegativeArraySizeException.

As a background, my dataset is at most 15 billion numbers, as I am doing a PCA of ~ 1 million rows x 15,000 columns.. My use case is comparing the speed of this implementation to randomized SVD in BidMach/Mat and randomized SVD in numpy.
With 1.5 billion elements BidMach performed excellently (10s for the BidMach SVD with dim=50 on a beefy machine) but I cannot go further with it because I can't fit any more values in my matrices.

Cheers,
Amedee

@jcanny
Copy link
Contributor

jcanny commented Oct 6, 2016

Hi, the 2B limit for basic matrices is pretty fundamental. Its not really a BIDMat decision - since we build on Java, we inherit Java's limitation for array size to 2B elements.

But, you can use TMats. TMat is a new matrix type in BIDMat that contains tiles. Each tile will be limited to 2B elements, but your overall matrix can be larger. TMat's support many of the standard array ops, so you may not to do anything to use them in your code. If something you need is not implemented, let us know.

@several27
Copy link

@jcanny So what would be the best way to convert the matrix (or load it from a file) using TMat? Is there any automated process of doing that, a function that would divide the huge matrix automatically into smaller ones and put them into TMat? I looked into it, but there seem to be no documentation for this matrix type and it looks like one has to specify all of the parameters (number of matrices, sizes etc) manually to create TMat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants