-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement fletcher32 #412
implement fletcher32 #412
Conversation
The errors showing up are for non-numpy inputs to JSON and msgpack - nothing to do with this PR. I'll fill in the docs and such shortly. |
This looks amazing Martin! 🚀 Thanks so much for doing it. One question: have your verified that this implementation is interoperable with the hdf5 and netcdf4 implementation? Like, if netcdf4 writes a chunk with fletcher32, does this codec successfully decode it? |
No, not yet. The tests just come from examples on wikipedia. Do you think we should bundle a small hdf file, or perhaps just extract a bytes buffer into a test? |
Co-authored-by: Ryan Abernathey <[email protected]>
I would try to get a short snippet of actual bytes. I'd probably use the approach from fsspec/kerchunk#274 of generating a tiny netcdf file using xarray with fletcher32 on, extracting a chunk manually using kerchunk, printing the bytes to the terminal, and then copy-pasting that to this PR. Here is an example of such a string of bytes. b'x\xda\xb3\xf1\x8b\xd3gdb\x00\x02\xf1\xa3\x00' This was generated from the data encoding = {
'zlib': True,
'compression': 'zlib',
'shuffle': True,
'complevel': 8,
'fletcher32': True,
'contiguous': False,
'chunksizes': (1, 4)
} Question: in what order is the fletcher checksum applied? Before or after zlib? |
Well it was a good idea to check! Their implementation is not the same, so I thought it best to just embed it directly. This ought to be faster too, if that's important. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually RuntimeError
would be more consistent with what other codecs do when decompression fails.
OK, @jakirkham , it didn't turn out to be too bad. The algorithm is obviously the same as the original. I didn't understand why the class should be moved to a different pure-python module, though. lz4, blosc, zstd and vlen all have Codecs in their respective pyx files. (I must say, they are surprisingly complex!) |
Thanks Martin! 🙏
Yeah we have some technical debt to work through for sure. Since these are now in the same file, agree this matches the existing pattern. Though at some point we might want to split these apart to simplify things. We need not do that here. |
Co-authored-by: Ryan Abernathey <[email protected]>
Do we have an existing issue to track the test failures we are seeing in this PR? As @martindurant says, they are not related to the new codec. But they will need to be fixed asap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very useful contribution to numcodecs. Thanks a lot Martin! Provided we understand why the tests are failing and have a plan to fix that elsewhere, I'm happy to see this merged.
This has lingered for a while, but I think it looks good. We should get this in. I have no idea what's going on the the tests. Maybe they are fixed by #417? Martin do you want to try to rebase? |
Codecov Report
@@ Coverage Diff @@
## main #412 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 54 55 +1
Lines 2095 2121 +26
=========================================
+ Hits 2095 2121 +26
|
@martindurant @rabernat @jakirkham: did a release of this get discussed at any point while I was off galavanting? |
No discussion I am aware of |
Raised an issue ( #437 ) to discuss |
Fixes #410 cc @rabernat
TODO: