Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix shape for datasets of references to iterators #1238

Draft
wants to merge 5 commits into
base: dev
Choose a base branch
from

Conversation

h-mayorquin
Copy link
Contributor

@h-mayorquin h-mayorquin commented Jan 21, 2025

Motivation

Fix #1237

This is a draft that @rly started working on that I share here for further reference.

@h-mayorquin h-mayorquin changed the title Fix shape for datasets of references Fix shape for datasets of references to iterators Jan 21, 2025
Copy link

codecov bot commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 37.50000% with 5 lines in your changes missing coverage. Please review.

Project coverage is 90.82%. Comparing base (ff4a0aa) to head (aab88ab).

Files with missing lines Patch % Lines
src/hdmf/build/objectmapper.py 37.50% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1238      +/-   ##
==========================================
- Coverage   90.87%   90.82%   -0.05%     
==========================================
  Files          42       42              
  Lines        9524     9529       +5     
  Branches     1921     1923       +2     
==========================================
  Hits         8655     8655              
- Misses        576      580       +4     
- Partials      293      294       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@h-mayorquin
Copy link
Contributor Author

Checking the errors it seems that we were too quick to dismiss the multi-dimensional case, @rly .

@rly
Copy link
Contributor

rly commented Jan 22, 2025

I haven't looked deeply, but I suspect that because VectorData shape allows for 1D, 2D, 3D, or 4D data (ref), and there are datasets of references in NWB such as the electrode_group column in the Units table that extend VectorData but do NOT specify that the dataset should be 1-dimensional (ref), that spec_shape is [[None], [None, None], [None, None, None], [None, None, None, None]] inherited from VectorData, and the new check is triggered.

  1. We should amend the NWB schema to restrict certain table columns like Units.electrode_group to be 1-D.
  2. Instead of checking the spec, we should check whether the dataset of references being written has more than 1 dimension. We should probably put that here: https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/backends/hdf5/h5tools.py#L1252 . Note that the dataset being created has shape (len(data), ) which means we already assume they are 1-D datasets.

Related TODO items:

  1. Warn if creating a spec with a dataset of references that allows more than one dimension.

I'll take a look at this in the next couple days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Image iterator reveals that list of references gets an incorrect data shape
2 participants