-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.experimental.concat_on_disk
fails to properly infer indptr
dtype for the final object
#1709
Comments
What do you mean by hardcoding? We do have a test for this feature, so I'm definitely curious what could be going on here. I see you're on an AWS machine but if you could start a debugger and just look at that line with your example, would be amazing. |
hardcoding
I forked the repo and changed these lines to *always* set int64 dtypes.
debugger
Happy to take a look, will follow up this week!
…On Wed, Oct 16, 2024 at 06:41 Ilan Gold ***@***.***> wrote:
Honestly, I can't tell why the existing code doesn't work from first
principles, but I can confirm that when I hardcoded int64 for the output
object, everything completed successfuly.
What do you mean by hardcoding? We do have a test for this feature, so I'm
definitely curious what could be going on here. I see you're on an AWS
machine but if you could start a debugger and just look at that line with
your example, would be amazing.
—
Reply to this email directly, view it on GitHub
<#1709 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASZ3RL2JQ62PEDKVWU63TTZ3ZUIFAVCNFSM6AAAAABPV3IC5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJWHA3TONBRGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Thanks so much @jacobkimmel. I would be opening to adding something to force this usage, but I'm hesitant since I don't understand what is going on. We have a settings object coming out which would make this super easy. BTW so you don't think I'm stonewalling, the reason we don't do int64 by default isn't performance or something (indptr is relatively small), but more than CUDA doesn't handle int64 at all :( So we try our best to keep it int32 |
This issue has been automatically marked as stale because it has not had recent activity. |
We can add a setting for this now that the settings object is out. |
Please make sure these conditions are met
Report
See:
anndata/src/anndata/experimental/merge.py
Line 223 in 8e9eb88
.experimental.concat_on_disk
is an awesome feature. Thanks for writing it!I found in practice that the dtype inference for
indptr
in the final object can fail in curious ways. It seems to be overly aggressive with casting toint32
, which leads to a traceback when merging objects large enough to requireint64
.Honestly, I can't tell why the existing code doesn't work from first principles, but I can confirm that when I hardcoded
int64
for the output object, everything completed successfuly.Code:
Traceback:
Versions
IPython 8.28.0
anndata 0.11.0rc3.dev3+g8e9eb88.d20241010
session_info 1.0.0
asttokens NA
cython_runtime NA
dateutil 2.9.0.post0
decorator 5.1.1
executing 2.1.0
h5py 3.12.1
jedi 0.19.1
natsort 8.4.0
numpy 2.1.2
packaging 24.1
pandas 2.2.3
parso 0.8.4
prompt_toolkit 3.0.48
pure_eval 0.2.3
pygments 2.18.0
pytz 2024.2
scipy 1.14.1
six 1.16.0
stack_data 0.6.3
traitlets 5.14.3
wcwidth 0.2.13
Python 3.12.7 | packaged by conda-forge | (main, Oct 4 2024, 16:05:46) [GCC 13.3.0]
Linux-6.2.0-1018-aws-x86_64-with-glibc2.35
Session information updated at 2024-10-10 03:56
The text was updated successfully, but these errors were encountered: