Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Draft 'pymd' Box #100

Open
dukesook opened this issue Jun 3, 2024 · 8 comments
Open

Simplify Draft 'pymd' Box #100

dukesook opened this issue Jun 3, 2024 · 8 comments
Labels

Comments

@dukesook
Copy link

dukesook commented Jun 3, 2024

Concerning the Image Pyramid Entity Group Box 'pymd' in WG03N1157_23524 HEIF Ed3 Amd2 Prelim WD:

The following four variables seem redundant and could be removed to greatly simplify the 'pymd' box:

  1. tile_size_x
  2. tile_size_y
  3. tiles_in_layer_row_minus1
  4. tiles_in_layer_column_minus1
pymd ImageGrid
tiles_in_layer_row_minus1 rows_minus_one
tiles_in_layer_column_minus1 columns_minus_one
tile_size_x (output_width / columns_minus_one)
tile_size_y (output_height / rows_minus_one)

If the gridding is handled within the codec, then the codec specific boxes would be used instead. For example, the uncompressed codec would use the uncC box:

pymd uncC
tiles_in_layer_row_minus1 num_tile_rows_minus_one
tiles_in_layer_column_minus1 num_tile_cols_minus_one
tile_size_x (ispe image_width / num_tile_cols_minus_one)
tile_size_y (ispe image_height / num_tile_rows_minus_one)

I imagine these variables were originally placed here for convenience. However, there are already well defined mechanisms for accessing them. In my opinion, duplicating them in the pymd box adds complexity for very little gain.

@farindk
Copy link

farindk commented Jul 18, 2024

We could even leave away the layer_binning entry. It appears to be equivalent to ceil(base width/overview width) and it is thus redundant. That would eliminate all custom data from the pymd box, leaving just the list of pyramid image item IDs, sorted according to increasing resolution.

@farindk
Copy link

farindk commented Jul 23, 2024

Another reason for leaving away layer_binning is that it restricts us to power-of-two integer scaling factors only, while I think it would be perfectly fine to also have finer resolution granularity, like a factor $\sqrt 2$ per layer. For example, layers with image widths 1000, 707, 500.

@farindk
Copy link

farindk commented Aug 4, 2024

Is there any important reason for the restriction that every pyramid level should have the same tile size? I think this is overly restrictive. There are good reasons why tile sizes would vary for different layers. For example, the base layer might have the original uncompressed data, while overview layers use h265 with a tile size dictated by the encoder hardware). Or building the lower resolutions is much easier if this can be done for each base-level tile separately.

If it is important for an application to have the same tile size in each level, the encoder is free to do so even without enforcing this, and having similar tile sizes could be signaled as a flag (but it's also obvious when looking at the individual pyramid levels).

@PileofBones86
Copy link

FYI, the layer_binning isn't restricted to power-of-two scaling. It is restricted to integer factors. For instance, the examples in the document show 2x2 binning as a layer_binning of 2 and 4x4 binning as a layer_binning of 4. One can also choose to implement a 3x3 binning as a layer_binning of 3, etc.

@farindk
Copy link

farindk commented Sep 27, 2024

the layer_binning isn't restricted to power-of-two scaling

Right. This was a semantic typo. I mean integer scale factor. I've corrected it above.

@fmaze
Copy link

fmaze commented Nov 6, 2024

I imagine these variables were originally placed here for convenience

Indeed, the purpose was that the reader has all the information it needs in a single place. Depending on which kinds of overviews are composing the pyramid, the information may be more or less complicated to gather (grid, tiled coded image with constrained extents, uncompressed image). For instance, for a grid image the example above is not totally correct due to possible implicit cropping, To be accurate you would have to get the width/height of the input images.

I agree it is worth simplifying in some cases, what about using a flag so that this information would be conditionally present or not?

@farindk
Copy link

farindk commented Nov 6, 2024

A conditional flag is one option, but I think that the "difficulty" to get the information is no strong argument. Any decoder that is able to decode the images obviously also knows how the get the image sizes. And from there, all parameters can be trivially computed. Furthermore, it is not so clear why anyone other than the decoder would be interested in the number of tiles in each layer.

I can see two use-cases:

  • a multiplexing/demultiplexing application might want to know the size of each layer without being able to decode it. For example, a network server might want to know which resolution layer to stream. This could be solved by making the ispe property mandatory when an image is used in a pymd. The ispe is usually present anyway so that this will not be a big change and double (and thus possibly inconsistent) information is avoided.
  • it might be useful information whether each resolution layer uses the same tile size. This would be a good case for your conditional flag. If the flag is set, the pymd is restricted to use the same tile size in every layer and the tile size is indicated in the pymd header.

Thus, the specification would be:

  • ispe is mandatory for each layer image
  • the header is defined like this:
aligned(8) class ImagePyramidEntityGroup
extends EntityToGroupBox ('pymd', version = 0, flags) {
        if (flags & 1) {
  	  unsigned int(16) tile_size_x;
	  unsigned int(16) tile_size_y;
        }
}

@cconcolato
Copy link
Contributor

AMD1 of HEIF has been revised. We invite experts to have a new look at the text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants