-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
anndata write or anndata read seem to be losing uns["log1p"]["base"]
when value for key base
is None
#865
Comments
Notice this problem also and agree to have this feature to avoid one more explicit step to avoid errors of detecting log1p but no base in it. |
This issue has been automatically marked as stale because it has not had recent activity. |
I'm still affected by this issue. Without a setting for base, import scanpy as sc
adata = sc.datasets.blobs()
sc.pp.log1p(adata)
adata.uns['log1p'] Output: {'base': None} Now, writing adata to file and reading from file: adata.write('adata.h5ad')
adata = sc.read('adata.h5ad')
adata.uns['log1p'] Output: {} So, the entry for base is gone. Now, sc.tl.rank_genes_groups(adata, 'blobs') throws this error: WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[16], line 1
----> 1 sc.tl.rank_genes_groups(adata, 'blobs')
File /opt/conda/lib/python3.9/site-packages/scanpy/tools/_rank_genes_groups.py:590, in rank_genes_groups(adata, groupby, use_raw, groups, reference, n_genes, rankby_abs, pts, key_added, copy, method, corr_method, tie_correct, layer, **kwds)
580 adata.uns[key_added] = {}
581 adata.uns[key_added]['params'] = dict(
582 groupby=groupby,
583 reference=reference,
(...)
587 corr_method=corr_method,
588 )
--> 590 test_obj = _RankGenes(adata, groups_order, groupby, reference, use_raw, layer, pts)
592 if check_nonnegative_integers(test_obj.X) and method != 'logreg':
593 logg.warning(
594 "It seems you use rank_genes_groups on the raw count data. "
595 "Please logarithmize your data before calling rank_genes_groups."
596 )
File /opt/conda/lib/python3.9/site-packages/scanpy/tools/_rank_genes_groups.py:93, in _RankGenes.__init__(self, adata, groups, groupby, reference, use_raw, layer, comp_pts)
82 def __init__(
83 self,
84 adata,
(...)
90 comp_pts=False,
91 ):
---> 93 if 'log1p' in adata.uns_keys() and adata.uns['log1p']['base'] is not None:
94 self.expm1_func = lambda x: np.expm1(x * np.log(adata.uns['log1p']['base']))
95 else:
KeyError: 'base' Output of -----
anndata 0.9.1
scanpy 1.9.3
-----
PIL 9.2.0
anyio NA
arrow 1.2.3
asciitree NA
asttokens NA
astunparse 1.6.3
attr 23.1.0
babel 2.12.1
backcall 0.2.0
bottleneck 1.3.7
brotli NA
certifi 2023.05.07
cffi 1.15.1
chardet 5.1.0
charset_normalizer 2.1.1
cloudpickle 2.2.1
colorama 0.4.6
comm 0.1.3
cycler 0.10.0
cython_runtime NA
cytoolz 0.12.0
dask 2023.4.1
dateutil 2.8.2
debugpy 1.6.7
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.6
entrypoints 0.4
executing 1.2.0
fasteners 0.17.3
fastjsonschema NA
fqdn NA
gmpy2 2.1.2
google NA
h5py 3.8.0
idna 3.4
igraph 0.10.4
importlib_resources NA
ipykernel 6.23.0
ipython_genutils 0.2.0
isoduration NA
jedi 0.18.2
jinja2 3.0.3
joblib 1.2.0
json5 NA
jsonpointer 2.0
jsonschema 4.17.3
jupyter_events 0.6.3
jupyter_server 2.5.0
jupyterlab_server 2.22.1
kiwisolver 1.4.4
leidenalg 0.9.1
llvmlite 0.39.1
louvain 0.8.0
lz4 4.3.2
markupsafe 2.1.2
matplotlib 3.7.1
mpl_toolkits NA
mpmath 1.3.0
msgpack 1.0.5
natsort 8.3.1
nbformat 5.8.0
numba 0.56.4
numcodecs 0.11.0
numexpr 2.7.3
numpy 1.23.5
opt_einsum v3.3.0
packaging 23.1
pandas 1.5.0
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
platformdirs 3.5.0
plotly 5.14.1
prometheus_client NA
prompt_toolkit 3.0.38
psutil 5.9.5
ptyprocess 0.7.0
pure_eval 0.2.2
pvectorc NA
pyarrow 10.0.1
pydev_ipython NA
pydevconsole NA
pydevd 2.9.5
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.15.1
pyparsing 3.0.9
pyrsistent NA
pythonjsonlogger NA
pytz 2023.3
requests 2.29.0
rfc3339_validator 0.1.4
rfc3986_validator 0.1.1
scipy 1.10.1
send2trash NA
session_info 1.0.0
setuptools 67.7.2
six 1.16.0
sklearn 1.2.2
sniffio 1.3.0
socks 1.7.1
sparse 0.14.0
sphinxcontrib NA
stack_data 0.6.2
sympy 1.11.1
tblib 1.7.0
texttable 1.6.7
threadpoolctl 3.1.0
tlz 0.12.0
toolz 0.12.0
torch 2.0.0
tornado 6.3
tqdm 4.65.0
traitlets 5.9.0
typing_extensions NA
unicodedata2 NA
uri_template NA
urllib3 1.26.15
wcwidth 0.2.6
webcolors 1.13
websocket 1.5.1
yaml 5.4.1
zarr 2.14.2
zipp NA
zmq 25.0.2
zoneinfo NA
zope NA
-----
IPython 8.13.2
jupyter_client 8.2.0
jupyter_core 5.3.0
jupyterlab 3.6.3
notebook 6.5.4
-----
Python 3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:45:29) [GCC 10.4.0]
Linux-5.15.0-71-generic-x86_64-with-glibc2.31
-----
Session information updated at 2023-06-14 10:31 |
Wait, so why won't this be fixed? |
I have noticed that on Scanpy, when setting
andata.uns["log1p"]["base"] = None
and then the object is written to disk and then read again, thenbase
is no longer a key inandata.uns["log1p"]
. This has implications in a number of downstream Scanpy methods when writing to disk in the middle and then reading back again, as maybe parts of scanpy seek to do:in various places. Maybe the underlying problem is that
sc.pp.log1p(adata)
is not marking base asmath.e
inuns
. I wonder if the write/read process might be prunning other keys that haveNone
values?The text was updated successfully, but these errors were encountered: