Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change points to shapes for stereo-seq #253

Open
LucaMarconato opened this issue Jan 3, 2025 · 5 comments
Open

Change points to shapes for stereo-seq #253

LucaMarconato opened this issue Jan 3, 2025 · 5 comments

Comments

@LucaMarconato
Copy link
Member

Hi,
We are using Stereo-seq a lot. It's more common to operate on the bin but not the cell_circles for most of the stereoseq users, because the cell segmentation performed badly as a result of both the experiment technology and algorithm limitations, could you change the Points to Shapes by default ?

Originally posted by @wangjiawen2013 in #97

@LucaMarconato
Copy link
Member Author

@wangjiawen2013 thanks for the feedback! The change you suggest is quite straightforward (basically it amounts to replace this code here

points_element = PointsModel.parse(
with this one ). The problem is that currently we do not support lazy loading for shapes as we do for points, and this would affect performance. There could be a new function argument to allow the parsing of bins as shapes, while keeping the default as points.

We have limited bandwidth as we are working on some parts of the core library at the moment and won't be able to work on this specific task soon, but if you would like to try making a PR, we will be happy to review it!

@LucaMarconato
Copy link
Member Author

Also, please check out this feature scverse/spatialdata#578, and this new one (merged yesterday) scverse/spatialdata#811, as you could find them useful for performant handling of Stereo-seq data.

@wangjiawen2013
Copy link

wangjiawen2013 commented Jan 6, 2025

I tested and find that rasterize_bins is not a good idea for two reasons:

  1. There are no col_key and row_key in stereoseq object, which corresponds to array_col and array_row in visium object. The x and y columns in stereoseq points element have different meanings with array_col and array_row in visium, they are not equivalence.
  2. rasterize_bin cannot merge table.obs into an image (especially for discrete annotations), but we want render both table.obs and gene expression.

I'll define a new function argument to allow the parsing of bins as shapes, while keeping the default as points. As only bin 10-200 are used for most stereoseq users, the preformance should not be affected seriously by setting the appropriate bin size.

@LucaMarconato
Copy link
Member Author

Thanks for sharing. If performance is not an issue, I'd indeed proceed as you described. Please notice that the approach shown here scverse/spatialdata#811 (i.e. calling rasterize(return_region_as_labels=True) would not be affected by the challenges you mentioned.

Finally a comment on the challenges you faced.

scverse/spatialdata#578 (comment)

Not having the row/col ready is definitely inconvenient, but these could be reconstructed. If performance starts being an issue in case you need to show the smallest bins you could consider this.

rasterize_bin cannot merge table.obs into an image (especially for discrete annotations), but we want render both table.obs and gene expression.

True, rasterize_bins() currently doesn't support these cases as the focus when we developed was on being used with a sparse matrix. Nevertheless these are important cases so we will take this feedback into account. A quick workaround for now is to put the obs into a new .X in a new table, or obs.cat.codes in case of a categorical column. Btw, we should add support for adata.layers too here (we recently added it for get_values() scverse/spatialdata#818 after your feedback in spatialdata-plot scverse/spatialdata-plot#326).

@wangjiawen2013
Copy link

wangjiawen2013 commented Jan 7, 2025

Glad to hear that we can use layers !

We have been cooperating with BGI (the inventor of stereoseq) for many years. bin50-bin200 are used frequently for production purpose, and bin1 is rarely used (it is used occasionally for testing purpose). The memory and running time are acceptable for bin 50-bin200 (or even bin10) when using scanpy/squidpy/seurat. So don't worry a lot about the performance. When we want to use bin1, we can use rasterize as you said. But at current stage, the convinence and functionality are more important than the performance.

The following is the envisaged reader function signature:

def stereoseq(
    path: str | Path,
    dataset_id: str | None = None,
    read_square_bin: bool = True,
    bin_to_shape:  list[int] | int | None = None,
    optional_tif: bool = False,
    imread_kwargs: Mapping[str, Any] = MappingProxyType({}),
    image_models_kwargs: Mapping[str, Any] = MappingProxyType({}),
) -> SpatialData

When bin_to_shape is None, bin_size>=10 will be treated as shapes. when it is a int or list[int] (such as 20 or [20, 50, 100]), bin_size 20, 50, 100 will be treated as shapes.
As you're more familiar with spatialdata, hope you can modify the reader. Though you have indicated the code needed to be changed, I find it's hard for me to understand it. I am still an ordinary user and still have a long way to be a developer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants