-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug with storing raw arrays #76
Comments
ooof, that is annoying. couple of things: |
The immediate part was just to sort out a possible corruption at the storage level, but we discovered the issue on an array stored for a long time. The example above is made to reproduce with synthetic data, but on the original array where we discovered the bug, we got a pretty big difference in the data: Compressed to 0.06% the size
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-11-53e61b81e11e> in <module>()
6 cci.upload_raw_array("20200409_cc_test", array_in)
7 array_out = cci.download_raw_array("20200409_cc_test")
----> 8 np.testing.assert_array_equal(array_in, array_out)
9 print('success !!')
/home/jlg/tomdlt/miniconda3/envs/py27/lib/python2.7/site-packages/numpy/testing/_private/utils.pyc in assert_array_equal(x, y, err_msg, verbose)
902 __tracebackhide__ = True # Hide traceback for py.test
903 assert_array_compare(operator.__eq__, x, y, err_msg=err_msg,
--> 904 verbose=verbose, header='Arrays are not equal')
905
906
/home/jlg/tomdlt/miniconda3/envs/py27/lib/python2.7/site-packages/numpy/testing/_private/utils.pyc in assert_array_compare(comparison, x, y, err_msg, verbose, header, precision, equal_nan, equal_inf)
825 verbose=verbose, header=header,
826 names=('x', 'y'), precision=precision)
--> 827 raise AssertionError(msg)
828 except ValueError:
829 import traceback
AssertionError:
Arrays are not equal
Mismatch: 18%
Max absolute difference: 60.51625653
Max relative difference: 36.06609662
x: array([[[64.188949, 61.733371, 62.085814, ..., 62.085814, 62.085814,
63.139827],
[60.672661, 57.098978, 57.459062, ..., 57.459062, 57.098978,...
y: array([[[64.188949, 64.188949, 64.188949, ..., 64.188949, 64.188949,
64.188949],
[64.188949, 64.188949, 64.188949, ..., 64.188949, 64.188949,... |
Interestingly, the histograms of values are identical, which means that it is probably just a reordering of values, coming from an incorrect handling of non-contiguous arrays. hist_in = np.histogram(array_in.ravel(), bins=100)
hist_out = np.histogram(array_out.ravel(), bins=100)
np.testing.assert_array_equal(hist_in[0], hist_out[0])
np.testing.assert_array_equal(hist_in[1], hist_out[1]) |
This is confirmed by the following test, which passes (assuming that np.testing.assert_array_equal(array_in, array_out.reshape(array_out.shape[::-1]).T) It seems that the array is stored with the correct data and shape, but the strides/ordering are lost in the process. An easy fix would be to use |
just to follow up from our conversation. this can be fixed in this code block: A more solid solution is to remove PY2 support. We also need to add a test for large arrays. |
With @gxlilyBerkeley, we identified a bug when storing raw arrays, which was quite annoying.
If we upload some arrays and download them immediately, they might be changed in the process. We narrowed down the bug to the following conditions:
Here is a reproducing example:
The text was updated successfully, but these errors were encountered: