-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use arrow::read_parquet instead of nanoparquet #462
Comments
@BenoitLondon Thank you for the benchmark. As you asked the why question: long story short of #315 , we wanted Parquet support by default. At first, R 1.0.0 with We are somehow reluctant in introducing options for choosing which package to use. We are still cleaning those up from the pre-1.0 era. I don't mind switching back to |
Can you share the code for the benchmark? Some notes:
Not really a good benchmark, but I just ran arrow and nanoparquet on the mentioned 33 milion row data set (10x It would be great to have a proper benchmark, but nevertheless I'll update note in the nanoparquet README, because it is acually competitive in terms of speed. I suspect that it is also competitive in terms of memory, but we'd need a better way to measure that. |
Oh thanks guys for the explanations, very much appreciated! median is the median time of 3 iterations so yeah in the small dataset case nano is 8 times faster than arrow. I m very happy to use nanoparquet if there s no downside (my use case is basically writing /reading biggish files (1-5 GB) in R and also reading in python or Julia so I wanted compatibility and speed and low ram usage if possible) Thanks again. |
I've found in my benchmarks nanoparquet to be much less efficient than arrow in term of speed and RAM usage
speed and RAM usage when reading big files are not very good .
on nanoparquet repo they say :
rio uses arrow for feather already so I'm not sure why we rely on nanoparquet for parquet
If you keep nanoparquet as default maybe we could have an option to use arrow instead?
The text was updated successfully, but these errors were encountered: