Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix view behavior for AwkwardArrays #1070

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Fix view behavior for AwkwardArrays #1070

wants to merge 12 commits into from

Conversation

grst
Copy link
Contributor

@grst grst commented Jul 24, 2023

@grst grst force-pushed the ak-array-views branch from 830a1bc to b23eda3 Compare July 24, 2023 18:33
@grst grst marked this pull request as ready for review July 24, 2023 18:33
@grst grst requested review from ivirshup and ilan-gold July 24, 2023 18:34
@grst
Copy link
Contributor Author

grst commented Jul 24, 2023

I think this does the trick

@codecov
Copy link

codecov bot commented Jul 24, 2023

Codecov Report

Merging #1070 (fdeff90) into main (2c8759d) will decrease coverage by 2.20%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1070      +/-   ##
==========================================
- Coverage   84.75%   82.56%   -2.20%     
==========================================
  Files          36       36              
  Lines        5149     5132      -17     
==========================================
- Hits         4364     4237     -127     
- Misses        785      895     +110     
Flag Coverage Δ
gpu-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
anndata/_io/specs/methods.py 87.40% <ø> (-0.31%) ⬇️
anndata/_core/aligned_mapping.py 93.99% <100.00%> (+0.32%) ⬆️
anndata/_core/views.py 82.58% <100.00%> (-7.18%) ⬇️

... and 5 files with indirect coverage changes

📢 Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scverse).

Copy link
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t this have some tests?

anndata/_core/views.py Outdated Show resolved Hide resolved
@grst
Copy link
Contributor Author

grst commented Jul 25, 2023

Shouldn’t this have some tests?

This is already covered by existing tests in test_views and test_awkward. The current version is not broken, but will break at some point in the future.

Co-authored-by: Angus Hollands <[email protected]>
array = ak.with_parameter(self, _PARAM_NAME, None)
array = ak.with_parameter(array, "__list__", None)
array = ak.with_parameter(array, _PARAM_NAME, None)
array = ak.with_parameter(array, "__record__", None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use with_name, so that we can traverse into the layout to find the record-node

Suggested change
array = ak.with_parameter(array, "__record__", None)
array = ak.with_name(array, None)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine that after our dicsussion @grst could not get that working (see #1040 (comment))

Is there a reason traversing is better than what we have (assuming @grst could not get the with_name working)? We only ever deal with this at a "top level."

@agoose77
Copy link

@grst I get the sense that it might be helpful to have a meeting to discuss this. Would you be able to find time for a zoom at some point?

@grst
Copy link
Contributor Author

grst commented Jul 27, 2023

Agreed, I sent an email with a poll to find a timeslot.

@grst grst changed the title AwkwardArrayViews: Use __record__ instead of __list__ to store behavior Fix view behavior for AwkwardArrays Sep 1, 2023
@grst
Copy link
Contributor Author

grst commented Sep 1, 2023

@ivirshup, @ilan-gold, @agoose77, ready for re-review. Feels good to get rid of all that code 💥

Currently the tests fail for uns, but that seems to be a more general problem, see #1118.

@agoose77
Copy link

agoose77 commented Sep 1, 2023

@grst it is possible to modify the original array if you don't perform any row-based slices, i.e array["x"] = x, then this would modify array.

I can't recall to what extent we touched on this on the call. If the process of pulling out the Awkward Array always returns a shallow copy, this wouldn't be a problem.

@grst
Copy link
Contributor Author

grst commented Sep 4, 2023

Hi @agoose77,

returning the shallow copy (or slice) works as expected. The remaining problem is that there's no handler implemented for the uns slot while subsetting AnnData -- regardless of whether it's an awkward array or a numpy array. I don't think we want to always return a shallow copy on __getitem__ -- I think there are legitimate use-cases where one wants to modify records.

@ivirshup
Copy link
Member

ivirshup commented Sep 5, 2023

The remaining problem is that there's no handler implemented for the uns slot while subsetting AnnData

Is that blocking for this PR, or tangential? I'd like to make a release candidate by the end of the week, so would like to merge soon if possible.

@grst
Copy link
Contributor Author

grst commented Sep 6, 2023

If you don't mind, I can mark the respective tests as xfail. I think it's ok as

@ivirshup ivirshup added this to the 0.10.0 milestone Sep 7, 2023
@ivirshup
Copy link
Member

ivirshup commented Sep 7, 2023

I can mark the respective tests as xfail

Which tests are these?

@ivirshup
Copy link
Member

ivirshup commented Sep 7, 2023

So just to clarify what this PR does. For this block:

import anndata as ad, awkward as ak, numpy as np

a = ad.AnnData(
    np.ones((3, 3)),
    obsm={"awk": ak.Array([{"a": 1}, {"a": 2}, {"a": 3}])}
)
v = a[:2]
v.obsm["awk"]["b"] = [5, 6]

display(v)
display(v.obsm["awk"])

On main:

<ipython-input-10-0f89ad3994c2>:8: ImplicitModificationWarning: Trying to modify attribute `.obsm` of view, initializing view as actual.
  v.obsm["awk"]["b"] = [5, 6]
AnnData object with n_obs × n_vars = 2 × 3
    obsm: 'awk'
<Array [{a: 1, b: 5}, {a: 2, b: 6}] type='2 * {a: int64, b: int64}'>

On this PR:

View of AnnData object with n_obs × n_vars = 2 × 3
    obsm: 'awk'
<Array [{a: 1}, {a: 2}] type='2 * {a: int64}'>

This is the desired behavior?


Is uns failed for other view tests? I don't know that I really see the need to specially include, then xfail these cases.

@ivirshup ivirshup removed this from the 0.10.0 milestone Sep 7, 2023
grst added 2 commits September 8, 2023 08:51
The test didn't catch that when modifying an awkward array in a
AnnDataView, the changes did not persist within that view.

The issue was that I had been retreiving the awkward array
from the view once, and then did tests on it. Instead, to properly
test this, I need to retreive the Awkward array from the AnnData view
every time. It turns out doing so gives me a fresh copy of the origina
awkward array and modifications only affect that copy - not the original data.
I solved the issue described in the previous commit by adding
logic to `AlignedViewMixin` that can copy objects of a certain type
(for now only awkward arrays) upon view creation.
@grst
Copy link
Contributor Author

grst commented Sep 8, 2023

Thanks for catching this! I thought my tests cover this, but it turns out I retrieved the awkward array from the view only once and did all tests on that copy. Instead, retreiving v.obsm["awk"] multiple times always returns a fresh copy of the sliced awkward array which makes it impossible to modify it.

To solve this, it is necessary to perform a shallow copy of the awkward array on view creation (which is a cheap operation). I added this behavior here:
https://github.com/scverse/anndata/pull/1070/files#diff-af8b8ed3c983f1ed77b81030eb9d80df8136e0640324f180354e6e0d3b49e7e7R139-R159

Maybe this could also be useful for pandas data frames with copy-on-write behavior in the future?

@ivirshup
Copy link
Member

ivirshup commented Sep 8, 2023

I think caching the "view"s could be a good idea. I may need to think for a bit on how this effects my current thoughts on how to fix the garbage collection problem:

Currently, this won't work with the solution proposed there, since the AlignedMappings are created on the fly to prevent circular references from being formed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Future changes to Awkward Array behavior class resolution
5 participants