-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parametrized tests for netcdf encoding options #274
base: main
Are you sure you want to change the base?
Conversation
Thanks for making sure this got tested. |
Awesome! Great to hear that you can see a solution. Feel free to push to my branch if you want to just fix it in this PR. |
Already did :) |
😆 you're too fast Martin! A great next step would be to actually implement the real checksum filter in numcodecs, rather than a dummy passthrough. I found implementations here: https://github.com/njaladan/hashpy/blob/master/hashpy/fletcherNbit.py |
Agree - but kerchunk is all about fast and functional! https://en.wikipedia.org/wiki/Fletcher%27s_checksum#Fletcher-32 https://gist.github.com/AJPoulter-Soton/9d0d2505af64f0719bdee59b9a4533ba ? |
I can't imagine that computational cost of fletcher32 would be comparable to the actual decompression step (not to mention the i/o itself). |
return buff[:-4] | ||
|
||
def encode(self, buf): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't you want to append 4 empty bytes here, just in case someone tried to use it in a round-trip context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'm not sure. Maybe raising NotImplemented is the correct thing to do, as the output can't be used by anything that really does the check.
Incoming new codec: zarr-developers/numcodecs#412 (so there would be no need for one here once that is in) |
I can change this to point to the new fletcher codec in numcodecs. Is the null version, skipping the CPU time to do the comparison, useful? I suppose fletcher ought to be pretty cheap to compute. |
I personally consider it quite dangerous to pass through the data without actually checking the checksum. Since we now have a fast fletcher32 codec in numcodecs, I see no reason to keep the passthrough codec. Profiling would be easy to do if we want some actual numbers. |
We'll need a numcodecs release anyway |
This PR adds a test to verify that kerchunk can decode all possible flavors of netcdf4.
This reveals that fletcher32 decoding is broken, despite #34 and #35. Attempting to decode data written with
zlib=True
andfletcher32=True
always results inxref pydata/xarray#7388