Bump Xarray to 2025.1.1 and icechunk to 0.1.0a10 in upstream #375

norlandrhagen · 2025-01-09T21:10:12Z

Tests passing

norlandrhagen · 2025-01-09T21:17:31Z

Hey there @mpiannucci, @TomNicholas and I just bumped the icechunk and Xarray versions and we're seeing some failures on the upstream CI. Wondering if you have any insight!

mpiannucci · 2025-01-09T22:19:16Z

zarr 3 comes with a number of changes

chunk_shape is now chunks
codecs is no longer an argument to create_array, instead you now specify filters (array to array), compressors (bytes to bytes) and optionally a single serializer (that converts from bytes to array and vice versa). So the whole codec pipeline needs to be redone, but it should make everything simpler.

redcliff · 2025-01-15T19:48:57Z

Hey, can you bump icechunk version to 0.1.0a11, which has added azure blob support? I'm interested in trying out writing virtual zarr store into azure blob but got blocked by the same issue causing the CI failures. Is there an estimate on when we can expect this PR to be merged?

abarciauskas-bgse · 2025-01-15T21:31:50Z

@norlandrhagen just FYI: This is the branch I am using for icechunk until we can come up with a more complete design for refactoring for zarr-python 3.0

TomNicholas · 2025-01-15T22:05:12Z

Thanks @abarciauskas-bgse!

@redcliff we're in a tricky spot right now, with a bunch of backwards-incompatible and mutually-exclusive changes in our dependencies Zarr-python v3, Icechunk, and Kerchunk. Whilst it should be possible to make things work with some hacky branches right now, if you would rather avoid that rabbit hole then it will likely take us a few weeks to get everything working again on up-to-date released branches.

Can I ask what file format you're hoping to Virtualize? That can affect which dependencies you need which affects how easy it is to get things working right now.

ghidalgo3 · 2025-01-15T23:04:23Z

@TomNicholas, @redcliff and I would like to virtualize NetCDF4 and HDF5 files types. They are the primary n-dimensional array datatypes we host on Planetary Computer.

We can wait until the dependencies stabilize. My end goal is for us to be able to do the following:

Bring archival data onto Planetary Computer's Azure Blob Storage as NetCDF and HDF5 files.
For each file, create an adjacent virtualized icechunk store by calling virtualizarr.dataset_to_icechunk (requires icechunk>=0.1.0a11 for Azure Blob Storage support). That's the store we would encourage our users to read from.
Document for our users how to open the data using xarray.open_zarr through the icechunk reader.

If I understand the sequence of events correctly

zarr==3.0.0 (no prerelease!) was released days ago and it introduced breaking changes from zarr==3.0.0b3.
The upstream environment for VirtualiZarr indirectly depended on zarr==3.0.0b3 through IceChunk==0.1.0a8.
That indirect dependency from IceChunk has version constraint zarr>=3 which means that on January 9 when zarr==3.0.0 was released it immediately started being consumed by VirtualiZarr, but you ran into the breaking changes and now need to adapt to them.

Can we help in any way?

TomNicholas · 2025-01-16T00:04:59Z

Hey @ghidalgo3!

Yes that's all correct, but with the additional complication that we can't even pin any Zarr-python version >=3.0.0 in main yet (not even the pre-release) because some of our readers (and tests) are still coupled to kerchunk, an optional dependency that currently requires zarr-python<3.0.0.

However in your case if you use the newer HDFReader instead of the kerchunk-reliant HDF5Reader, you should be able to use @abarciauskas-bgse 's branch to get things mostly working again with Icechunk (as that's presumably what she's doing already).

Can we help in any way?

Someone needs to rewrite the codec pipeline code to work with the released version of Zarr-python, and as you guys so kindly wrote the v2-compatible version of that then you would be great people to update it 😄

For the purposes of testing that you could just pin Zarr-python>=3.0.0 in that branch, even though that will currently break kerchunk-reliant tests, as depending on >=3.0.0.0 is the end state we're aiming for anyway.

abarciauskas-bgse · 2025-01-16T00:13:18Z

I'm using the dmrpp reader so I'm not 100% sure my branch will work with the HDFVirtualBackend reader

bump xarray to 2025.1.1 and icechunk to 0.1.0a10 in upstream

72ae8b0

norlandrhagen added CI Continuous Integration dependencies Updates a dependency labels Jan 9, 2025

norlandrhagen temporarily deployed to test-release January 9, 2025 21:10 — with GitHub Actions Inactive

norlandrhagen mentioned this pull request Jan 10, 2025

Zarr reader #271

Open

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump Xarray to 2025.1.1 and icechunk to 0.1.0a10 in upstream #375

Bump Xarray to 2025.1.1 and icechunk to 0.1.0a10 in upstream #375

norlandrhagen commented Jan 9, 2025

norlandrhagen commented Jan 9, 2025

mpiannucci commented Jan 9, 2025

redcliff commented Jan 15, 2025

abarciauskas-bgse commented Jan 15, 2025

TomNicholas commented Jan 15, 2025 •

edited

Loading

ghidalgo3 commented Jan 15, 2025

TomNicholas commented Jan 16, 2025

abarciauskas-bgse commented Jan 16, 2025

Bump Xarray to 2025.1.1 and icechunk to 0.1.0a10 in upstream #375

Are you sure you want to change the base?

Bump Xarray to 2025.1.1 and icechunk to 0.1.0a10 in upstream #375

Conversation

norlandrhagen commented Jan 9, 2025

norlandrhagen commented Jan 9, 2025

mpiannucci commented Jan 9, 2025

redcliff commented Jan 15, 2025

abarciauskas-bgse commented Jan 15, 2025

TomNicholas commented Jan 15, 2025 • edited Loading

ghidalgo3 commented Jan 15, 2025

TomNicholas commented Jan 16, 2025

abarciauskas-bgse commented Jan 16, 2025

TomNicholas commented Jan 15, 2025 •

edited

Loading