Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use System lz4, zstd, and blosc #569

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Conversation

BwL1289
Copy link

@BwL1289 BwL1289 commented Aug 27, 2024

This PR adds the capability to build Python bindings against system-provided blosc, zstd, lz4, and blosc linked zlib-ng instead of the vendored libraries included in numcodecs.

This PR addresses #464, #264, and by extension the comment in #314.

@BwL1289
Copy link
Author

BwL1289 commented Aug 30, 2024

@martindurant bumping this if it's not too inconvenient for you. Let me know how I can help support to get it merged.

@martindurant
Copy link
Member

I may not be the one you want to review this one. Actually I don't understand: does this build wheels which can then be used without the given system libraries?

Previously, I have been trying to argue, yes, that vendoring is unnecessary, but because we can get the same libraries via conda or wheels (e.g., cramjam, which now supports blosc2 also, all in static libs; here is a discussion on zlib-ng).

@BwL1289
Copy link
Author

BwL1289 commented Aug 30, 2024

This allows to build numcodecs against already installed system libraries instead of the vendored ones.

Using vendored libs when you're compiling & using those libraries yourself in other places may lead to strange and hard to debug bugs due to binary incompatibilities as not only API and ABI have to be the same in such cases and objects must be fully identical in RAM which includes hidden fields (which are affected by defines).

Looping in @joshmoore if that's OK.

Thanks so much.

@martindurant
Copy link
Member

Using vendored libs when you're compiling ...

So using the statically linked cramjam or similar is a solution, no?

I still don't understand whether the proposed workflow here can be used for distribution. The comment above suggests "no" to me.

@martindurant
Copy link
Member

By "statically inked", I mean this is what cramjam does internally, so for numcodecs it would be a simple dependency, no compile step at all.

@BwL1289
Copy link
Author

BwL1289 commented Aug 30, 2024

@martindurant do you know when numcodecs will switch to cramjam? Until then, I think this PR can help fill the gap for people needing an interim solution.

It may not be good for distribution, but it's necessary for people who are building numcodecs, its dependencies, and libraries in general from sources. This PR doesn't remove the ability to build with vendored dependency sources, so it's purely additive.

@martindurant
Copy link
Member

I don't know, and I can't say for sure it will happen - I just think it should. This is further complicate by the changes required by zarr3.

@joshmoore
Copy link
Member

Thanks, @BwL1289. 👍 for having this as a new flag that leaves everything untouched unless someone invokes it. The only possible downside I can imagine is the (slight) increase in complexity of the build logic, but it looks generally well done. From my side, no objections. Pinging @jakirkham and @dstansby for any potential concerns.

(This should also not dissuade anyone from picking up the cramjam flag, of course 😄)

@BwL1289
Copy link
Author

BwL1289 commented Aug 31, 2024

Thank you @joshmoore, much appreciated.

And agreed re: cramjam.

I think the slight increase in build complexity is a worthwhile tradeoff in the interim for folks who need the functionality.

Happy to make improvements wherever possible.

@dstansby
Copy link
Contributor

Could you expand a bit on what the use case is of this? I think all the use cases described in #464 would be fixed by depending on Python packages such that we don't need to build the libraries ourselves.

On a project level I'm reluctant about this because it removes the tight coupling between numcodecs versions and compressor library versions, and opens the possibility of numcodecs binaries existing in the wild that have arbiratry versions of compression libraries included, which then makes it harder for us to debug what loreoks like a 'standard' numcodecs package.

On a practical level I'm reluctant about this because it is untested, and adds yet more custom code to setup.py to maintain (and we are short on maintenance time!)... the second worry would be much less of a worry if this was tested, so I think a condition to merging this should be having a test run in GitHub actions that checks the new code in setup.py works.

FWIW I've also been playing around with un-bundling sources, starting with blosc: #569. I've just opened that as a proof of concept draft, but if there's demand for that work to continue then I'm happy to push it forward (I guess using cranjam + blosc as the two external libraries numcodecs depends on?).

@QuLogic
Copy link
Contributor

QuLogic commented Oct 12, 2024

FWIW, I've been running with unbundled libraries on Fedora for several years, and have not had any failures in tests. Being able to do this without patches would be welcome, but I can continue to rebase those patches for my own use.

As for custom code in setup.py, I know it's a big change, but the move to Meson in Matplotlib made this sort of thing easier, as distributed wheels would use bundled subprojects, while distributors could disable them and link against existing libraries.

But of course, moving to not needing the external libraries would be even simpler.

@doronbehar
Copy link

On a practical level I'm reluctant about this because it is untested, and adds yet more custom code to setup.py to maintain (and we are short on maintenance time!)... the second worry would be much less of a worry if this was tested, so I think a condition to merging this should be having a test run in GitHub actions that checks the new code in setup.py works.

Indeed a GitHub action would have been great. I can say as a downstream maintainer of numcodecs on NixOS that I'd love to see this get merged, as it saves disk space for users of our distribution and opens many other customization possibilities for us. Even without a GitHub action, we should be able to report to you issues with this patch.

On 0.13.1 this is what works for us:
diff --git c/setup.py w/setup.py
index b6db079..d468b10 100644
--- c/setup.py
+++ w/setup.py
@@ -9,6 +9,7 @@ import cpuinfo
 from Cython.Distutils.build_ext import new_build_ext as build_ext
 from setuptools import Extension, setup
 from setuptools.errors import CCompilerError, ExecError, PlatformError
+import pkgconfig
 
 # determine CPU support for SSE2 and AVX2
 cpu_info = cpuinfo.get_cpu_info()
@@ -17,6 +18,7 @@ have_sse2 = 'sse2' in flags
 have_avx2 = 'avx2' in flags
 disable_sse2 = 'DISABLE_NUMCODECS_SSE2' in os.environ
 disable_avx2 = 'DISABLE_NUMCODECS_AVX2' in os.environ
+use_system_libraries = 'NUMCODECS_USE_SYSTEM_LIBS' in os.environ
 
 # setup common compile arguments
 have_cflags = 'CFLAGS' in os.environ
@@ -49,8 +51,8 @@ def error(*msg):
     print('[numcodecs]', *msg, **kwargs)
 
 
-def blosc_extension():
-    info('setting up Blosc extension')
+def _blosc_extension_with_vendored_libs():
+    info('setting up Blosc extension from vendored sources')
 
     extra_compile_args = base_compile_args.copy()
     define_macros = []
@@ -125,8 +127,68 @@ def blosc_extension():
     return extensions
 
 
-def zstd_extension():
-    info('setting up Zstandard extension')
+def _blosc_extension_with_system_libs():
+    info('setting up Blosc extension with system libraries')
+
+    extra_compile_args = base_compile_args.copy()
+
+    blosc_package_configuration = pkgconfig.parse("blosc")
+
+    define_macros = blosc_package_configuration["define_macros"]
+    include_dirs = blosc_package_configuration["include_dirs"]
+    libraries = blosc_package_configuration["libraries"]
+    library_dirs = blosc_package_configuration["library_dirs"]
+
+    # remove minizip because Python.h 3.8 tries to include crypt.h
+    include_dirs = [d for d in include_dirs if 'minizip' not in d]
+
+    # define_macros += [('CYTHON_TRACE', '1')]
+
+    # SSE2
+    if have_sse2 and not disable_sse2:
+        info('compiling Blosc extension with SSE2 support')
+        extra_compile_args.append('-DSHUFFLE_SSE2_ENABLED')
+        if os.name == 'nt':
+            define_macros += [('__SSE2__', 1)]
+    else:
+        info('compiling Blosc extension without SSE2 support')
+
+    # AVX2
+    if have_avx2 and not disable_avx2:
+        info('compiling Blosc extension with AVX2 support')
+        extra_compile_args.append('-DSHUFFLE_AVX2_ENABLED')
+        if os.name == 'nt':
+            define_macros += [('__AVX2__', 1)]
+    else:
+        info('compiling Blosc extension without AVX2 support')
+
+    sources = ['numcodecs/blosc.pyx']
+
+    # define extension module
+    extensions = [
+        Extension(
+            'numcodecs.blosc',
+            sources=sources,
+            include_dirs=include_dirs,
+            define_macros=define_macros,
+            extra_compile_args=extra_compile_args,
+            libraries=libraries,
+            library_dirs=library_dirs,
+        ),
+    ]
+
+    return extensions
+
+
+def blosc_extension():
+    if use_system_libraries:
+        return _blosc_extension_with_system_libs()
+    else:
+        return _blosc_extension_with_vendored_libs()
+
+
+def _zstd_extension_with_vendored_sources():
+    info('setting up Zstandard extension from vendored sources')
 
     zstd_sources = []
     extra_compile_args = base_compile_args.copy()
@@ -167,8 +229,46 @@ def zstd_extension():
     return extensions
 
 
-def lz4_extension():
-    info('setting up LZ4 extension')
+def _zstd_extension_with_system_libs():
+    info('setting up Zstandard extension with system libraries')
+
+    extra_compile_args = base_compile_args.copy()
+
+    zstd_package_configuration = pkgconfig.parse("libzstd")
+    include_dirs = zstd_package_configuration["include_dirs"]
+    define_macros = zstd_package_configuration["define_macros"]
+    libraries = zstd_package_configuration["libraries"]
+    library_dirs = zstd_package_configuration["library_dirs"]
+
+    # define_macros += [('CYTHON_TRACE', '1')]
+
+    sources = ['numcodecs/zstd.pyx']
+
+    # define extension module
+    extensions = [
+        Extension(
+            'numcodecs.zstd',
+            sources=sources,
+            include_dirs=include_dirs,
+            define_macros=define_macros,
+            extra_compile_args=extra_compile_args,
+            libraries=libraries,
+            library_dirs=library_dirs,
+        ),
+    ]
+
+    return extensions
+
+
+def zstd_extension():
+    if use_system_libraries:
+        return _zstd_extension_with_system_libs()
+    else:
+        return _zstd_extension_with_vendored_sources()
+
+
+def _lz4_extension_with_vendored_sources():
+    info('setting up LZ4 extension from vendored sources')
 
     extra_compile_args = base_compile_args.copy()
     define_macros = []
@@ -195,6 +295,45 @@ def lz4_extension():
     return extensions
 
 
+def _lz4_extension_with_system_libs():
+    info('setting up LZ4 extension with system libraries')
+
+    extra_compile_args = base_compile_args.copy()
+
+    lz4_package_configuration = pkgconfig.parse("liblz4")
+    include_dirs = lz4_package_configuration["include_dirs"]
+    define_macros = lz4_package_configuration["define_macros"]
+    libraries = lz4_package_configuration["libraries"]
+    library_dirs = lz4_package_configuration["library_dirs"]
+
+    include_dirs += ['numcodecs']
+    # define_macros += [('CYTHON_TRACE', '1')]
+
+    sources = ['numcodecs/lz4.pyx']
+
+    # define extension module
+    extensions = [
+        Extension(
+            'numcodecs.lz4',
+            sources=sources,
+            include_dirs=include_dirs,
+            define_macros=define_macros,
+            extra_compile_args=extra_compile_args,
+            libraries=libraries,
+            library_dirs=library_dirs,
+        ),
+    ]
+
+    return extensions
+
+
+def lz4_extension():
+    if use_system_libraries:
+        return _lz4_extension_with_system_libs()
+    else:
+        return _lz4_extension_with_vendored_sources()
+
+
 def vlen_extension():
     info('setting up vlen extension')
     import numpy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants