Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for Multiscale Zarr (Groups) #1071

Open
vincentsarago opened this issue Jan 15, 2025 · 11 comments
Open

Better support for Multiscale Zarr (Groups) #1071

vincentsarago opened this issue Jan 15, 2025 · 11 comments
Labels
enhancement New feature or request

Comments

@vincentsarago
Copy link
Member

ref: https://github.com/zarr-developers/geozarr-spec/blob/main/geozarr-spec.md#multiscales

Problem

Right now to read a different scale we need to set the group= parameter but that needs to append before opening the Zarr file, meaning we cannot dynamically select the corresponding scale based on a required Zoom Level.

# Argument if we're opening a datatree
if group is not None:
xr_open_args["group"] = group

Solution

We could create a new endpoint similar to the tilejson endpoint which return a list of (group number, (minzoon, maxzoom)) for a specific TileMatrixSet, which then could be use by a frontend map client to construct the corect tile URL with the correct group value

fetch('/WebMercatorQuad/metadata.json?url=myzarURL.zarr&variable=yo')

{
  "scheme": "xyz",
  "tiles": [
    "/tiles/WebMercatorQuad/{z}/{x}/{y}@1x?url=myzarURL.zarr&variable=yo&group={group}"
  ],
  "minzoom": 0,  // Zoom of the low resolution group
  "maxzoom": 9,  // Zoom of the high resolution group
  "bounds": [-180, -90, 180, 90],
  "center": [0, 0, 0],
  "groups": [
    // [group_name, [minzoom, maxzoom]]
    [0, [0, 2]],
    [1, [2, 4]],
    [2, [4, 6]],
    [3, [6, 8]],
    [4, [8, 10]]
  ]
}

cc @maxrjones @hrodmn @abarciauskas-bgse @sharkinsspatial

@vincentsarago vincentsarago added the enhancement New feature or request label Jan 15, 2025
@j08lue
Copy link
Member

j08lue commented Jan 15, 2025

Ideally, we would want to provide the same user experience as with GDAL and COGs, where the reader automatically selects the appropriate (overview) scale, no?

How would the proposed solution work for a client like a web map app or desktop GIS software? Would they add all scales to the map and let the mapping library handle turning layers on and off depending on the zoom level?

If there is any way to take this burden off the user, I think that would be great.

If it is too much automagic for generic TiTiler app, perhaps we would need a custom GeoZarrReader that implements this logic? 🤔

@vincentsarago
Copy link
Member Author

Ideally, we would want to provide the same user experience as with GDAL and COGs, where the reader automatically selects the appropriate (overview) scale, no?

I would love that too but from what I understand the group needs to be decided before opening the file, while with GDAL you can access the overview once the file is opened.

How would the proposed solution work for a client like a web map app or desktop GIS software? Would they add all scales to the map and let the mapping library handle turning layers on and off depending on the zoom level?

No 🙁

If there is any way to take this burden off the user, I think that would be great.

Agreed

If it is too much automagic for generic TiTiler app, perhaps we would need a custom GeoZarrReader that implements this logic? 🤔

I would be nice if we don't have to know in advance if a Zarr is a GeoZarr or not!

@maxrjones
Copy link
Member

Ideally, we would want to provide the same user experience as with GDAL and COGs, where the reader automatically selects the appropriate (overview) scale, no?

I would love that too but from what I understand the group needs to be decided before opening the file, while with GDAL you can access the overview once the file is opened.

We can use xarray.open_datatree to open a Zarr hierarchy rather than specifying a group in xarray.open_dataset.

@abarciauskas-bgse
Copy link
Contributor

Just to clarify, when we say scale we don't mean scale in terms of the standard titiler parameter that indicates the size of the image tile being returned right? In other words, not the scale in /cog/tiles/{tileMatrixSetId}/{z}/{x}/{y}@{scale}x.{format}. But rather scale here we mean to be the group of a dataset associated with a "level" of resolution of the data. There's no standard or spec yet to clarify this 😅

In the original implementation, this was meant to work automagically WHEN there was a multiscale parameter set to True. In this case, the zoom level of the request would be associated with the level of the dataset opened. If the zoom level was higher than the available levels the highest level was chosen (i.e. the finest resolution). For example, if a dataset had 4 levels, zoom 0 level 0 was opened, zoom 1 level 1 was opened and so on, and zoom>=4, level 4 was opened.

I think the multiscale parameter has gone away, is that right? Would this solution make sense? A tilejson endpoint may also make sense for some use cases where the available levels (if any) need to be known in advance 🤔

@vincentsarago
Copy link
Member Author

Just to clarify, when we say scale we don't mean scale in terms of the standard titiler parameter that indicates the size of the image tile being returned right? In other words, not the scale in /cog/tiles/{tileMatrixSetId}/{z}/{x}/{y}@{scale}x.{format}. But rather scale here we mean to be the group of a dataset associated with a "level" of resolution of the data. There's no standard or spec yet to clarify this 😅

Scale is just related to the size of the output tile size
1: 256x256
2: 512x512
3: 1024x1024
...

In the original implementation, this was meant to work automagically WHEN there was a multiscale parameter set to True. In this case, the zoom level of the request would be associated with the level of the dataset opened. If the zoom level was higher than the available levels the highest level was chosen (i.e. the finest resolution). For example, if a dataset had 4 levels, zoom 0 level 0 was opened, zoom 1 level 1 was opened and so on, and zoom>=4, level 4 was opened.

Yeah but this assumed that Zarr was Web Optimized and had group that correspond to zoom level.

I think the multiscale parameter has gone away, is that right?

Yes

Would this solution make sense? A tilejson endpoint may also make sense for some use cases where the available levels (if any) need to be known in advance 🤔

We could introduce back the Multiscale option in the /tiles endpoint and then try to determine which group to open. But in theory we could also just try to find out if a zarr has Multiscale group!

@abarciauskas-bgse
Copy link
Contributor

abarciauskas-bgse commented Jan 15, 2025

Scale is just related to the size of the output tile size
1: 256x256
2: 512x512
3: 1024x1024

That is my understanding as well, but the scale of the original comment in this thread

Right now to read a different scale we need to set the group= parameter

I don't think is referring to the output tile size? That seems like a completely separate parameter than what we are discussing (which I would call level for now).

Yeah but this assumed that Zarr was Web Optimized and had group that correspond to zoom level.

This was intentional: only requests for datasets with multiple levels would have the multiscale parameter set to true. In other words, the client should know if a dataset has multiple levels or not.

To be clear though, I'm not proposing we go with this solution, just explaining how it worked originally.

@maxrjones
Copy link
Member

maxrjones commented Jan 16, 2025

Consider this a straw-man proposal, but I think our goal for better supporting Zarr that has a TileMatrixSet with overviews corresponding to TMS zoom resolutions, along with consolidated metadata so that one can find all this out with a single GET request (hereafter called Web-Optimized Zarr / WOZ), should be a dedicated extension (e.g., a woz/ route) that provides all the same endpoints as the cog/ route with the same or better performance. If a user wants to render Zarr that isn't WOZ, they can use the xarray extension. But the xarray code path will always be slower due to needing the guess formats and special case for non-optimized file structures.

@vincentsarago
Copy link
Member Author

That is my understanding as well, but the scale of the original comment in this thread

@abarciauskas-bgse oh sorry I missed to see that you were talking about that 😓.

I don't think is referring to the output tile size? That seems like a completely separate parameter than what we are discussing (which I would call level for now).

Yeah

To be clear though, I'm not proposing we go with this solution, just explaining how it worked originally

yeah yeah 🙏


should be a dedicated extension (e.g., a woz/ route) that provides all the same endpoints as the cog/ route with the same or better performance.

@maxrjones, I would prefer not! Ideally the user shouldn't have to know if a zarr is Optimized or not.

Maybe we should re-think the XarrayReader in titiler to not work with Xarray DataArray:

# now 
with XarrayReader("dataset.zarr", variable="var", group="0", time=0) as da:
    img = da.tile(0, 0, 0)

# maybe
with ZarrReader("dataset.zarr") as ds:  # using `xarray.open_datatree`
    img = ds.tile(0, 0, 0, variable="var", group="0", time=0)

Not having to define the variable, group and time slice at the dataset opening will enable more dynamic configuration. I'm not quite sure of the performance implication, will this result in more data transfer?

@maxrjones
Copy link
Member

@maxrjones, I would prefer not! Ideally the user shouldn't have to know if a zarr is Optimized or not.

How does this work for titiler with GeoTIFFs/COGs? I.e., what happens if a user provides a non-cloud-optimized GeoTIFF to a cog endpoint?

@vincentsarago
Copy link
Member Author

How does this work for titiler with GeoTIFFs/COGs? I.e., what happens if a user provides a non-cloud-optimized GeoTIFF to a cog endpoint?

@maxrjones, GDAL automatically select the overview/raw data based on the output tile resolution. If there are no overviews nor internal chucking then it will fetch the data it needs but it will be slow because you might load the whole data in memory.

Technically speaking we prefixed the endpoint with /cog but it should maybe be renamed /raster because it can read a lot of data formats (not just COGs)

@maxrjones
Copy link
Member

Thanks for the explanation, I incorrectly interpreted /cog as a fast-path for COGs. I understand your reluctance to have route-based fast-paths. I'd like to explore some ideas for speeding up tiling of Zarr's with overviews within the existing xarray namespace during the next sprint if that'd be alright with you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants