-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use System lz4, zstd, and blosc #569
base: main
Are you sure you want to change the base?
Conversation
…st system libraries
@martindurant bumping this if it's not too inconvenient for you. Let me know how I can help support to get it merged. |
I may not be the one you want to review this one. Actually I don't understand: does this build wheels which can then be used without the given system libraries? Previously, I have been trying to argue, yes, that vendoring is unnecessary, but because we can get the same libraries via conda or wheels (e.g., cramjam, which now supports blosc2 also, all in static libs; here is a discussion on zlib-ng). |
This allows to build numcodecs against already installed system libraries instead of the vendored ones. Using vendored libs when you're compiling & using those libraries yourself in other places may lead to strange and hard to debug bugs due to binary incompatibilities as not only API and ABI have to be the same in such cases and objects must be fully identical in RAM which includes hidden fields (which are affected by defines). Looping in @joshmoore if that's OK. Thanks so much. |
So using the statically linked cramjam or similar is a solution, no? I still don't understand whether the proposed workflow here can be used for distribution. The comment above suggests "no" to me. |
By "statically inked", I mean this is what cramjam does internally, so for numcodecs it would be a simple dependency, no compile step at all. |
@martindurant do you know when numcodecs will switch to cramjam? Until then, I think this PR can help fill the gap for people needing an interim solution. It may not be good for distribution, but it's necessary for people who are building numcodecs, its dependencies, and libraries in general from sources. This PR doesn't remove the ability to build with vendored dependency sources, so it's purely additive. |
I don't know, and I can't say for sure it will happen - I just think it should. This is further complicate by the changes required by zarr3. |
Thanks, @BwL1289. 👍 for having this as a new flag that leaves everything untouched unless someone invokes it. The only possible downside I can imagine is the (slight) increase in complexity of the build logic, but it looks generally well done. From my side, no objections. Pinging @jakirkham and @dstansby for any potential concerns. (This should also not dissuade anyone from picking up the cramjam flag, of course 😄) |
Thank you @joshmoore, much appreciated. And agreed re: cramjam. I think the slight increase in build complexity is a worthwhile tradeoff in the interim for folks who need the functionality. Happy to make improvements wherever possible. |
Could you expand a bit on what the use case is of this? I think all the use cases described in #464 would be fixed by depending on Python packages such that we don't need to build the libraries ourselves. On a project level I'm reluctant about this because it removes the tight coupling between numcodecs versions and compressor library versions, and opens the possibility of numcodecs binaries existing in the wild that have arbiratry versions of compression libraries included, which then makes it harder for us to debug what loreoks like a 'standard' numcodecs package. On a practical level I'm reluctant about this because it is untested, and adds yet more custom code to setup.py to maintain (and we are short on maintenance time!)... the second worry would be much less of a worry if this was tested, so I think a condition to merging this should be having a test run in GitHub actions that checks the new code in setup.py works. FWIW I've also been playing around with un-bundling sources, starting with blosc: #569. I've just opened that as a proof of concept draft, but if there's demand for that work to continue then I'm happy to push it forward (I guess using cranjam + blosc as the two external libraries numcodecs depends on?). |
FWIW, I've been running with unbundled libraries on Fedora for several years, and have not had any failures in tests. Being able to do this without patches would be welcome, but I can continue to rebase those patches for my own use. As for custom code in But of course, moving to not needing the external libraries would be even simpler. |
Indeed a GitHub action would have been great. I can say as a downstream maintainer of On 0.13.1 this is what works for us:diff --git c/setup.py w/setup.py
index b6db079..d468b10 100644
--- c/setup.py
+++ w/setup.py
@@ -9,6 +9,7 @@ import cpuinfo
from Cython.Distutils.build_ext import new_build_ext as build_ext
from setuptools import Extension, setup
from setuptools.errors import CCompilerError, ExecError, PlatformError
+import pkgconfig
# determine CPU support for SSE2 and AVX2
cpu_info = cpuinfo.get_cpu_info()
@@ -17,6 +18,7 @@ have_sse2 = 'sse2' in flags
have_avx2 = 'avx2' in flags
disable_sse2 = 'DISABLE_NUMCODECS_SSE2' in os.environ
disable_avx2 = 'DISABLE_NUMCODECS_AVX2' in os.environ
+use_system_libraries = 'NUMCODECS_USE_SYSTEM_LIBS' in os.environ
# setup common compile arguments
have_cflags = 'CFLAGS' in os.environ
@@ -49,8 +51,8 @@ def error(*msg):
print('[numcodecs]', *msg, **kwargs)
-def blosc_extension():
- info('setting up Blosc extension')
+def _blosc_extension_with_vendored_libs():
+ info('setting up Blosc extension from vendored sources')
extra_compile_args = base_compile_args.copy()
define_macros = []
@@ -125,8 +127,68 @@ def blosc_extension():
return extensions
-def zstd_extension():
- info('setting up Zstandard extension')
+def _blosc_extension_with_system_libs():
+ info('setting up Blosc extension with system libraries')
+
+ extra_compile_args = base_compile_args.copy()
+
+ blosc_package_configuration = pkgconfig.parse("blosc")
+
+ define_macros = blosc_package_configuration["define_macros"]
+ include_dirs = blosc_package_configuration["include_dirs"]
+ libraries = blosc_package_configuration["libraries"]
+ library_dirs = blosc_package_configuration["library_dirs"]
+
+ # remove minizip because Python.h 3.8 tries to include crypt.h
+ include_dirs = [d for d in include_dirs if 'minizip' not in d]
+
+ # define_macros += [('CYTHON_TRACE', '1')]
+
+ # SSE2
+ if have_sse2 and not disable_sse2:
+ info('compiling Blosc extension with SSE2 support')
+ extra_compile_args.append('-DSHUFFLE_SSE2_ENABLED')
+ if os.name == 'nt':
+ define_macros += [('__SSE2__', 1)]
+ else:
+ info('compiling Blosc extension without SSE2 support')
+
+ # AVX2
+ if have_avx2 and not disable_avx2:
+ info('compiling Blosc extension with AVX2 support')
+ extra_compile_args.append('-DSHUFFLE_AVX2_ENABLED')
+ if os.name == 'nt':
+ define_macros += [('__AVX2__', 1)]
+ else:
+ info('compiling Blosc extension without AVX2 support')
+
+ sources = ['numcodecs/blosc.pyx']
+
+ # define extension module
+ extensions = [
+ Extension(
+ 'numcodecs.blosc',
+ sources=sources,
+ include_dirs=include_dirs,
+ define_macros=define_macros,
+ extra_compile_args=extra_compile_args,
+ libraries=libraries,
+ library_dirs=library_dirs,
+ ),
+ ]
+
+ return extensions
+
+
+def blosc_extension():
+ if use_system_libraries:
+ return _blosc_extension_with_system_libs()
+ else:
+ return _blosc_extension_with_vendored_libs()
+
+
+def _zstd_extension_with_vendored_sources():
+ info('setting up Zstandard extension from vendored sources')
zstd_sources = []
extra_compile_args = base_compile_args.copy()
@@ -167,8 +229,46 @@ def zstd_extension():
return extensions
-def lz4_extension():
- info('setting up LZ4 extension')
+def _zstd_extension_with_system_libs():
+ info('setting up Zstandard extension with system libraries')
+
+ extra_compile_args = base_compile_args.copy()
+
+ zstd_package_configuration = pkgconfig.parse("libzstd")
+ include_dirs = zstd_package_configuration["include_dirs"]
+ define_macros = zstd_package_configuration["define_macros"]
+ libraries = zstd_package_configuration["libraries"]
+ library_dirs = zstd_package_configuration["library_dirs"]
+
+ # define_macros += [('CYTHON_TRACE', '1')]
+
+ sources = ['numcodecs/zstd.pyx']
+
+ # define extension module
+ extensions = [
+ Extension(
+ 'numcodecs.zstd',
+ sources=sources,
+ include_dirs=include_dirs,
+ define_macros=define_macros,
+ extra_compile_args=extra_compile_args,
+ libraries=libraries,
+ library_dirs=library_dirs,
+ ),
+ ]
+
+ return extensions
+
+
+def zstd_extension():
+ if use_system_libraries:
+ return _zstd_extension_with_system_libs()
+ else:
+ return _zstd_extension_with_vendored_sources()
+
+
+def _lz4_extension_with_vendored_sources():
+ info('setting up LZ4 extension from vendored sources')
extra_compile_args = base_compile_args.copy()
define_macros = []
@@ -195,6 +295,45 @@ def lz4_extension():
return extensions
+def _lz4_extension_with_system_libs():
+ info('setting up LZ4 extension with system libraries')
+
+ extra_compile_args = base_compile_args.copy()
+
+ lz4_package_configuration = pkgconfig.parse("liblz4")
+ include_dirs = lz4_package_configuration["include_dirs"]
+ define_macros = lz4_package_configuration["define_macros"]
+ libraries = lz4_package_configuration["libraries"]
+ library_dirs = lz4_package_configuration["library_dirs"]
+
+ include_dirs += ['numcodecs']
+ # define_macros += [('CYTHON_TRACE', '1')]
+
+ sources = ['numcodecs/lz4.pyx']
+
+ # define extension module
+ extensions = [
+ Extension(
+ 'numcodecs.lz4',
+ sources=sources,
+ include_dirs=include_dirs,
+ define_macros=define_macros,
+ extra_compile_args=extra_compile_args,
+ libraries=libraries,
+ library_dirs=library_dirs,
+ ),
+ ]
+
+ return extensions
+
+
+def lz4_extension():
+ if use_system_libraries:
+ return _lz4_extension_with_system_libs()
+ else:
+ return _lz4_extension_with_vendored_sources()
+
+
def vlen_extension():
info('setting up vlen extension')
import numpy |
This PR adds the capability to build Python bindings against system-provided
blosc
,zstd
,lz4
, and blosc linkedzlib-ng
instead of the vendored libraries included innumcodecs
.This PR addresses #464, #264, and by extension the comment in #314.