Better support for Multiscale Zarr (Groups) #1071

vincentsarago · 2025-01-15T11:26:49Z

ref: https://github.com/zarr-developers/geozarr-spec/blob/main/geozarr-spec.md#multiscales

Problem

Right now to read a different scale we need to set the group= parameter but that needs to append before opening the Zarr file, meaning we cannot dynamically select the corresponding scale based on a required Zoom Level.

titiler/src/titiler/xarray/titiler/xarray/io.py

Lines 126 to 128 in 5fab604

    
           # Argument if we're opening a datatree 
        
           if group is not None: 
        
               xr_open_args["group"] = group

Solution

We could create a new endpoint similar to the tilejson endpoint which return a list of (group number, (minzoon, maxzoom)) for a specific TileMatrixSet, which then could be use by a frontend map client to construct the corect tile URL with the correct group value

fetch('/WebMercatorQuad/metadata.json?url=myzarURL.zarr&variable=yo')

{
  "scheme": "xyz",
  "tiles": [
    "/tiles/WebMercatorQuad/{z}/{x}/{y}@1x?url=myzarURL.zarr&variable=yo&group={group}"
  ],
  "minzoom": 0,  // Zoom of the low resolution group
  "maxzoom": 9,  // Zoom of the high resolution group
  "bounds": [-180, -90, 180, 90],
  "center": [0, 0, 0],
  "groups": [
    // [group_name, [minzoom, maxzoom]]
    [0, [0, 2]],
    [1, [2, 4]],
    [2, [4, 6]],
    [3, [6, 8]],
    [4, [8, 10]]
  ]
}

cc @maxrjones @hrodmn @abarciauskas-bgse @sharkinsspatial

The text was updated successfully, but these errors were encountered:

j08lue · 2025-01-15T12:24:53Z

Ideally, we would want to provide the same user experience as with GDAL and COGs, where the reader automatically selects the appropriate (overview) scale, no?

How would the proposed solution work for a client like a web map app or desktop GIS software? Would they add all scales to the map and let the mapping library handle turning layers on and off depending on the zoom level?

If there is any way to take this burden off the user, I think that would be great.

If it is too much automagic for generic TiTiler app, perhaps we would need a custom GeoZarrReader that implements this logic? 🤔

vincentsarago · 2025-01-15T12:56:23Z

Ideally, we would want to provide the same user experience as with GDAL and COGs, where the reader automatically selects the appropriate (overview) scale, no?

I would love that too but from what I understand the group needs to be decided before opening the file, while with GDAL you can access the overview once the file is opened.

How would the proposed solution work for a client like a web map app or desktop GIS software? Would they add all scales to the map and let the mapping library handle turning layers on and off depending on the zoom level?

No 🙁

If there is any way to take this burden off the user, I think that would be great.

Agreed

If it is too much automagic for generic TiTiler app, perhaps we would need a custom GeoZarrReader that implements this logic? 🤔

I would be nice if we don't have to know in advance if a Zarr is a GeoZarr or not!

maxrjones · 2025-01-15T13:57:04Z

Ideally, we would want to provide the same user experience as with GDAL and COGs, where the reader automatically selects the appropriate (overview) scale, no?

I would love that too but from what I understand the group needs to be decided before opening the file, while with GDAL you can access the overview once the file is opened.

We can use xarray.open_datatree to open a Zarr hierarchy rather than specifying a group in xarray.open_dataset.

abarciauskas-bgse · 2025-01-15T16:25:02Z

Just to clarify, when we say scale we don't mean scale in terms of the standard titiler parameter that indicates the size of the image tile being returned right? In other words, not the scale in /cog/tiles/{tileMatrixSetId}/{z}/{x}/{y}@{scale}x.{format}. But rather scale here we mean to be the group of a dataset associated with a "level" of resolution of the data. There's no standard or spec yet to clarify this 😅

In the original implementation, this was meant to work automagically WHEN there was a multiscale parameter set to True. In this case, the zoom level of the request would be associated with the level of the dataset opened. If the zoom level was higher than the available levels the highest level was chosen (i.e. the finest resolution). For example, if a dataset had 4 levels, zoom 0 level 0 was opened, zoom 1 level 1 was opened and so on, and zoom>=4, level 4 was opened.

I think the multiscale parameter has gone away, is that right? Would this solution make sense? A tilejson endpoint may also make sense for some use cases where the available levels (if any) need to be known in advance 🤔

vincentsarago · 2025-01-15T16:36:43Z

Just to clarify, when we say scale we don't mean scale in terms of the standard titiler parameter that indicates the size of the image tile being returned right? In other words, not the scale in /cog/tiles/{tileMatrixSetId}/{z}/{x}/{y}@{scale}x.{format}. But rather scale here we mean to be the group of a dataset associated with a "level" of resolution of the data. There's no standard or spec yet to clarify this 😅

Scale is just related to the size of the output tile size
1: 256x256
2: 512x512
3: 1024x1024
...

In the original implementation, this was meant to work automagically WHEN there was a multiscale parameter set to True. In this case, the zoom level of the request would be associated with the level of the dataset opened. If the zoom level was higher than the available levels the highest level was chosen (i.e. the finest resolution). For example, if a dataset had 4 levels, zoom 0 level 0 was opened, zoom 1 level 1 was opened and so on, and zoom>=4, level 4 was opened.

Yeah but this assumed that Zarr was Web Optimized and had group that correspond to zoom level.

I think the multiscale parameter has gone away, is that right?

Yes

Would this solution make sense? A tilejson endpoint may also make sense for some use cases where the available levels (if any) need to be known in advance 🤔

We could introduce back the Multiscale option in the /tiles endpoint and then try to determine which group to open. But in theory we could also just try to find out if a zarr has Multiscale group!

abarciauskas-bgse · 2025-01-15T16:54:20Z

Scale is just related to the size of the output tile size
1: 256x256
2: 512x512
3: 1024x1024

That is my understanding as well, but the scale of the original comment in this thread

Right now to read a different scale we need to set the group= parameter

I don't think is referring to the output tile size? That seems like a completely separate parameter than what we are discussing (which I would call level for now).

Yeah but this assumed that Zarr was Web Optimized and had group that correspond to zoom level.

This was intentional: only requests for datasets with multiple levels would have the multiscale parameter set to true. In other words, the client should know if a dataset has multiple levels or not.

To be clear though, I'm not proposing we go with this solution, just explaining how it worked originally.

maxrjones · 2025-01-16T00:29:19Z

Consider this a straw-man proposal, but I think our goal for better supporting Zarr that has a TileMatrixSet with overviews corresponding to TMS zoom resolutions, along with consolidated metadata so that one can find all this out with a single GET request (hereafter called Web-Optimized Zarr / WOZ), should be a dedicated extension (e.g., a woz/ route) that provides all the same endpoints as the cog/ route with the same or better performance. If a user wants to render Zarr that isn't WOZ, they can use the xarray extension. But the xarray code path will always be slower due to needing the guess formats and special case for non-optimized file structures.

vincentsarago · 2025-01-16T08:50:46Z

That is my understanding as well, but the scale of the original comment in this thread

@abarciauskas-bgse oh sorry I missed to see that you were talking about that 😓.

I don't think is referring to the output tile size? That seems like a completely separate parameter than what we are discussing (which I would call level for now).

Yeah

To be clear though, I'm not proposing we go with this solution, just explaining how it worked originally

yeah yeah 🙏

should be a dedicated extension (e.g., a woz/ route) that provides all the same endpoints as the cog/ route with the same or better performance.

@maxrjones, I would prefer not! Ideally the user shouldn't have to know if a zarr is Optimized or not.

Maybe we should re-think the XarrayReader in titiler to not work with Xarray DataArray:

# now 
with XarrayReader("dataset.zarr", variable="var", group="0", time=0) as da:
    img = da.tile(0, 0, 0)

# maybe
with ZarrReader("dataset.zarr") as ds:  # using `xarray.open_datatree`
    img = ds.tile(0, 0, 0, variable="var", group="0", time=0)

Not having to define the variable, group and time slice at the dataset opening will enable more dynamic configuration. I'm not quite sure of the performance implication, will this result in more data transfer?

maxrjones · 2025-01-16T13:27:53Z

@maxrjones, I would prefer not! Ideally the user shouldn't have to know if a zarr is Optimized or not.

How does this work for titiler with GeoTIFFs/COGs? I.e., what happens if a user provides a non-cloud-optimized GeoTIFF to a cog endpoint?

vincentsarago · 2025-01-16T13:31:21Z

How does this work for titiler with GeoTIFFs/COGs? I.e., what happens if a user provides a non-cloud-optimized GeoTIFF to a cog endpoint?

@maxrjones, GDAL automatically select the overview/raw data based on the output tile resolution. If there are no overviews nor internal chucking then it will fetch the data it needs but it will be slow because you might load the whole data in memory.

Technically speaking we prefixed the endpoint with /cog but it should maybe be renamed /raster because it can read a lot of data formats (not just COGs)

maxrjones · 2025-01-16T13:38:03Z

Thanks for the explanation, I incorrectly interpreted /cog as a fast-path for COGs. I understand your reluctance to have route-based fast-paths. I'd like to explore some ideas for speeding up tiling of Zarr's with overviews within the existing xarray namespace during the next sprint if that'd be alright with you.

vincentsarago added the enhancement New feature or request label Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support for Multiscale Zarr (Groups) #1071

Better support for Multiscale Zarr (Groups) #1071

vincentsarago commented Jan 15, 2025

j08lue commented Jan 15, 2025

vincentsarago commented Jan 15, 2025

maxrjones commented Jan 15, 2025

abarciauskas-bgse commented Jan 15, 2025

vincentsarago commented Jan 15, 2025

abarciauskas-bgse commented Jan 15, 2025 •

edited

Loading

maxrjones commented Jan 16, 2025 •

edited

Loading

vincentsarago commented Jan 16, 2025

maxrjones commented Jan 16, 2025

vincentsarago commented Jan 16, 2025

maxrjones commented Jan 16, 2025

Better support for Multiscale Zarr (Groups) #1071

Better support for Multiscale Zarr (Groups) #1071

Comments

vincentsarago commented Jan 15, 2025

Problem

Solution

j08lue commented Jan 15, 2025

vincentsarago commented Jan 15, 2025

maxrjones commented Jan 15, 2025

abarciauskas-bgse commented Jan 15, 2025

vincentsarago commented Jan 15, 2025

abarciauskas-bgse commented Jan 15, 2025 • edited Loading

maxrjones commented Jan 16, 2025 • edited Loading

vincentsarago commented Jan 16, 2025

maxrjones commented Jan 16, 2025

vincentsarago commented Jan 16, 2025

maxrjones commented Jan 16, 2025

abarciauskas-bgse commented Jan 15, 2025 •

edited

Loading

maxrjones commented Jan 16, 2025 •

edited

Loading