Change points to shapes for stereo-seq #253

LucaMarconato · 2025-01-03T15:16:13Z

Hi,
We are using Stereo-seq a lot. It's more common to operate on the bin but not the cell_circles for most of the stereoseq users, because the cell segmentation performed badly as a result of both the experiment technology and algorithm limitations, could you change the Points to Shapes by default ?

Originally posted by @wangjiawen2013 in #97

The text was updated successfully, but these errors were encountered:

LucaMarconato · 2025-01-03T15:24:18Z

@wangjiawen2013 thanks for the feedback! The change you suggest is quite straightforward (basically it amounts to replace this code here

spatialdata-io/src/spatialdata_io/readers/stereoseq.py

Line 279 in 92dbde1

points_element = PointsModel.parse(

with this one

spatialdata-io/src/spatialdata_io/readers/visium_hd.py

Line 252 in 92dbde1

else:

). The problem is that currently we do not support lazy loading for shapes as we do for points, and this would affect performance. There could be a new function argument to allow the parsing of bins as shapes, while keeping the default as points.

We have limited bandwidth as we are working on some parts of the core library at the moment and won't be able to work on this specific task soon, but if you would like to try making a PR, we will be happy to review it!

LucaMarconato · 2025-01-03T15:25:31Z

Also, please check out this feature scverse/spatialdata#578, and this new one (merged yesterday) scverse/spatialdata#811, as you could find them useful for performant handling of Stereo-seq data.

wangjiawen2013 · 2025-01-06T09:37:33Z

I tested and find that rasterize_bins is not a good idea for two reasons:

There are no col_key and row_key in stereoseq object, which corresponds to array_col and array_row in visium object. The x and y columns in stereoseq points element have different meanings with array_col and array_row in visium, they are not equivalence.
rasterize_bin cannot merge table.obs into an image (especially for discrete annotations), but we want render both table.obs and gene expression.

I'll define a new function argument to allow the parsing of bins as shapes, while keeping the default as points. As only bin 10-200 are used for most stereoseq users, the preformance should not be affected seriously by setting the appropriate bin size.

LucaMarconato · 2025-01-06T10:27:10Z

Thanks for sharing. If performance is not an issue, I'd indeed proceed as you described. Please notice that the approach shown here scverse/spatialdata#811 (i.e. calling rasterize(return_region_as_labels=True) would not be affected by the challenges you mentioned.

Finally a comment on the challenges you faced.

scverse/spatialdata#578 (comment)

Not having the row/col ready is definitely inconvenient, but these could be reconstructed. If performance starts being an issue in case you need to show the smallest bins you could consider this.

rasterize_bin cannot merge table.obs into an image (especially for discrete annotations), but we want render both table.obs and gene expression.

True, rasterize_bins() currently doesn't support these cases as the focus when we developed was on being used with a sparse matrix. Nevertheless these are important cases so we will take this feedback into account. A quick workaround for now is to put the obs into a new .X in a new table, or obs.cat.codes in case of a categorical column. Btw, we should add support for adata.layers too here (we recently added it for get_values() scverse/spatialdata#818 after your feedback in spatialdata-plot scverse/spatialdata-plot#326).

wangjiawen2013 · 2025-01-07T05:24:04Z

Glad to hear that we can use layers !

We have been cooperating with BGI (the inventor of stereoseq) for many years. bin50-bin200 are used frequently for production purpose, and bin1 is rarely used (it is used occasionally for testing purpose). The memory and running time are acceptable for bin 50-bin200 (or even bin10) when using scanpy/squidpy/seurat. So don't worry a lot about the performance. When we want to use bin1, we can use rasterize as you said. But at current stage, the convinence and functionality are more important than the performance.

The following is the envisaged reader function signature:

def stereoseq(
    path: str | Path,
    dataset_id: str | None = None,
    read_square_bin: bool = True,
    bin_to_shape:  list[int] | int | None = None,
    optional_tif: bool = False,
    imread_kwargs: Mapping[str, Any] = MappingProxyType({}),
    image_models_kwargs: Mapping[str, Any] = MappingProxyType({}),
) -> SpatialData

When bin_to_shape is None, bin_size>=10 will be treated as shapes. when it is a int or list[int] (such as 20 or [20, 50, 100]), bin_size 20, 50, 100 will be treated as shapes.
As you're more familiar with spatialdata, hope you can modify the reader. Though you have indicated the code needed to be changed, I find it's hard for me to understand it. I am still an ordinary user and still have a long way to be a developer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change points to shapes for stereo-seq #253

Change points to shapes for stereo-seq #253

LucaMarconato commented Jan 3, 2025

LucaMarconato commented Jan 3, 2025

LucaMarconato commented Jan 3, 2025

wangjiawen2013 commented Jan 6, 2025 •

edited

Loading

LucaMarconato commented Jan 6, 2025

wangjiawen2013 commented Jan 7, 2025 •

edited

Loading

Change points to shapes for stereo-seq #253

Change points to shapes for stereo-seq #253

Comments

LucaMarconato commented Jan 3, 2025

LucaMarconato commented Jan 3, 2025

LucaMarconato commented Jan 3, 2025

wangjiawen2013 commented Jan 6, 2025 • edited Loading

LucaMarconato commented Jan 6, 2025

wangjiawen2013 commented Jan 7, 2025 • edited Loading

wangjiawen2013 commented Jan 6, 2025 •

edited

Loading

wangjiawen2013 commented Jan 7, 2025 •

edited

Loading