-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any plan to make a pigzlib like zlib? #50
Comments
I have considered adding multi-threaded compression to zlib. However I'm not sure what sort of interface people would be looking for. What do you imagine the interface would look like? |
It will be great if zlib will be multi-threaded. It will be better if the multi-threaded APIs are similar with current zlib. |
Similar how? More importantly, different how? How would the multi-threading be controlled? I would like to have a specific design for the interface with some level of consensus from potential users before implementing it. |
I mostly access zlib as is through gzgetc and gzgets, which don’t have terribly intuitive liftovers. More intuitive would be a pointer to read from like a file, but at that point you might as well just use popen with a shell call to pigz. |
"liftover"? |
I meant conversion or adaptation. (IE, making a pigz-version of gzwrite/gzread/gzgetc/gzgets.) I mostly am imagining that making those successive calls wouldn’t often benefit from parallelization unless you were filling a sufficient large buffer and then dispatching its compression in parallel as needed. I guess it mostly just depends on how the implementation works. I do imagine I’d want to set the number of threads at file handle creation and leave the arguments to functions the same as their serial counterparts. |
Parallel compression needs much more memory than single-thread compression, both for large data buffers and for the multiple compression engines themselves. gzwrite does not need to do anything right away. You could send it small amounts of data and it could accumulate it in a buffer until it has enough to send to a compression engine in a thread. The user would need to say how many threads they want, and how much memory to use, implying an acceptable latency on accumulating data for chunks of compression. |
By the way, this would only be for compression. Decompression would be single-thread. |
I'd expect at least an API that would be a drop-in replacement for zlib, function by function. So, instead of But I'm not entirely sure full compatibility is possible, like for low level primitives. Then on top of that you could add some extra APIs to control/monitor resource usage, but that can be a second feature. This way, adoption of pigzlib would be quite easy and straightforward. |
It might make sense to have the library offer both a drop-in replacement, and more customize-able functions that would let developers specify things such as thread count. It would be pretty awesome to let OpenSSH use multi-threaded compression with a simple addition to the compilation process, for example. I think this is a must. |
or think about RSYNC with PIGZ-compression - would be awsome! |
A zlib like library for pigz is wanted
The text was updated successfully, but these errors were encountered: