-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1025489: inconsistent timestamp downscaling #1868
Comments
thanks @jwyang-qraft for reaching out! we will look into the issue |
As others have pointed this out our code sees The real problem is that the data you are requesting from Snowflake cannot be represented in Arrow. I got around the issue by explicitly casting the column down to us precision like: Now I agree that us automatically trying to fit the data into a lower precision only leads to issues in the long run, as it boxes us into 2 options:
I much prefer option number 2, as it makes the need for precision loss explicit and allows the users to evaluate if this is okay, or if some computation needs to be moved into Snowflake before the data is pulled out. However; it's important to note that both of these options are technically backwards incompatible, so a major bump will be necessary either way. |
What is the maximum supported range of the timestamp value ? Is there an acceptable range? |
@sfc-gh-areddy please see https://arrow.apache.org/docs/python/timestamps.html for explanation.
So for nanosecond precision you can see the constants defined in 64 bit 2s complement min and max numbers are:
Plugging these numbers into datetime object (will approximately, because there's not nanoseconds support for datetime objects):
Then similarly you can do the same calculation for microseconds, but I get an overflow.
|
@sfc-gh-mkeller thanks for the detailed explanation of the bug. We experienced the bug ourself and I think none of your proposed solutions is a good fit for us.
With your Box solution number 2 this would mean:
Both solutions seem quite inconvenient and I think there are many use cases in which people don't care so much about timestamp resolution. So I would suggest two different kind of solution:
|
I like your option 2 Dennis, it provides true convenience to the users without confusion. I just want to catch anyone new to this issue, we will stop doing this magic of downscaling timestamps like |
Python version
Python 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0]
Operating system and processor architecture
Linux-5.4.0-165-generic-x86_64-with-glibc2.31
Installed packages
What did you do?
What did you expect to see?
I tried to fetch a column with TIMESTAMP_NTZ(9) dtype and the max datetime is '9999-12-31 00:00:00.000' and minimum is '1987-01-30 23:59:59.000'.
I get following error when I select from that column.
Because '9999-12-31 00:00:00.000' doesn't fit in int64 with ns precision, it seems like it is downcast to us precision on a batch basis in
snowflake-connector-python/src/snowflake/connector/nanoarrow_cpp/ArrowIterator/CArrowTableIterator.cpp
Line 562 in 6a2a5b6
I am guessing downcasting is not applied to all batches and it results in different data types between batches which pyarrow does not allow.
Can you set logging to DEBUG and collect the logs?
The text was updated successfully, but these errors were encountered: