Merge branch 'main' into feature121

MannLabs · Dec 10, 2024 · d0e53fc · d0e53fc
2 parents 32d7c36 + 7409b41
commit d0e53fc
Show file tree

Hide file tree

Showing 12 changed files with 69 additions and 65 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -1,4 +1,4 @@
-scPortrait - image-based single cell analysis at scale in Python
+scPortrait – image-based single cell analysis at scale in Python
 ================================================================
 
 scPortrait is a scalable toolkit to analyse single-cell image datasets. This Python implementation efficiently segments individual cells, generates single-cell datasets and provides tools for the efficient deep learning classification of their phenotypes for downstream applications.

diff --git a/docs/pages/notebooks/example_scPortrait_project.ipynb b/docs/pages/notebooks/example_scPortrait_project.ipynb
@@ -5,17 +5,17 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# A walk through the scPortrait Ecosystem\n",
+    "# A Walk Through The scPortrait Ecosystem\n",
     "\n",
     "This notebook will introduce you to the scPortrait ecosystem and give you a complete working example of how to use scPortrait. It will walk you through the following steps of the scPortrait workflow:\n",
     "\n",
-    "1. **segmentation**: Generates masks for the segmentation of input images into individual cells\n",
+    "1. **Segmentation**: Generates masks for the segmentation of input images into individual cells\n",
     "\n",
-    "2. **extraction**: The segmentation masks are applied to extract single-cell images for all cells in the input images. Images of individual cells are rescaled to [0, 1] per channel.\n",
+    "2. **Extraction**: The segmentation masks are applied to extract single-cell images for all cells in the input images. Images of individual cells are rescaled to [0, 1] per channel.\n",
     "\n",
-    "3. **featurization**: The image-based phenotype of each individual cell in the extracted single-cell dataset is featurized using the specified featurization method. Multiple featurization runs can be performed on the same dataset using different methods. Here we utilize the pretrained binary classifier from the original [SPARCS manuscript](https://doi.org/10.1101/2023.06.01.542416) that identifies individual cells defective in a biological process called \"autophagy\". \n",
+    "3. **Featurization**: The image-based phenotype of each individual cell in the extracted single-cell dataset is featurized using the specified featurization method. Multiple featurization runs can be performed on the same dataset using different methods. Here we utilize the pretrained binary classifier from the original [SPARCS manuscript](https://doi.org/10.1101/2023.06.01.542416) that identifies individual cells defective in a biological process called \"autophagy\". \n",
     "\n",
-    "4. **selection**: Cutting instructions for the isolation of selected individual cells by laser microdissection are generated. The cutting shapes are written to an ``.xml`` file which can be loaded on a leica LMD microscope for automated cell excision.\n",
+    "4. **Selection**: Cutting instructions for the isolation of selected individual cells by laser microdissection are generated. The cutting shapes are written to an ``.xml`` file which can be loaded on a Leica LMD7 microscope for automated cell excision.\n",
     "\n",
     "The data used in this notebook was previously stitched using the stitching workflow in [SPARCStools](https://github.com/MannLabs/SPARCStools). Please see the notebook [here](https://mannlabs.github.io/SPARCStools/html/pages/notebooks/example_stitching_notebook.html)."
    ]
@@ -768,14 +768,19 @@
     "fig.tight_layout()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alternatively, you can also visualize the input images as well as all other objects saved in a spatialdata object"
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "# alternatively you can also visualize the input images as well as all other objects saved in spatialdata object\n",
-    "\n",
     "project.view_sdata()"
    ]
   },
@@ -1048,7 +1053,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Looking at Segmentation Results\n",
+    "### Inspecting Segmentation Results\n",
     "\n",
     "The Segmentation Results are written to a hdf5 file called `segmentation.h5` located in the segmentation directory of our scPortrait project.\n",
     "\n",
@@ -1177,7 +1182,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Extracting single-cell images\n",
+    "## Extracting single cell images\n",
     "\n",
     "Once we have generated a segmentation mask, the next step is to extract single-cell images of segmented cells in the dataset.\n",
     "\n",
@@ -1289,7 +1294,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Look at extracted single-cell images\n",
+    "### Look at extracted single cell images\n",
     "\n",
     "The extracted single-cell images are written to a h5py file `single_cells.h5` located under `extraction\\data` within the project folder.\n",
     "\n",
@@ -1377,7 +1382,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Classification of extracted single-cells\n",
+    "## Classification of extracted single cells\n",
     "\n",
     "Next we can apply a pretained model to classify our cells within the scPortrait project. \n",
     "\n",
@@ -1458,7 +1463,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### looking at the generated results\n",
+    "### Looking at the generated results\n",
     "\n",
     "The results are written to a csv file which we can load with pandas.\n",
     "\n",
@@ -1651,7 +1656,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Exporting Cutting contours for excision on the LMD7\n",
+    "## Exporting Cutting contours for excision on a Leice LMD7\n",
     "\n",
     "scPortrait directly interfaces with our other open-source python library [py-lmd](https://github.com/MannLabs/py-lmd) to easily select and export cells for excision on a Leica LMD microscope. \n",
     "\n",

diff --git a/docs/pages/tools/parsing/example_parsing_notebook.ipynb b/docs/pages/tools/parsing/example_parsing_notebook.ipynb
@@ -5,7 +5,7 @@
    "id": "6fc618c5",
    "metadata": {},
    "source": [
-    "# Example Parsing Notebook to rename phenix experiments"
+    "# Example Notebook to parse and rename files from experiments imaged on an Opera Phenix microscope"
    ]
   },
   {

diff --git a/docs/pages/tools/stitching/example_stitching_notebook.ipynb b/docs/pages/tools/stitching/example_stitching_notebook.ipynb
@@ -8,6 +8,13 @@
     "# Example Stitching Notebook"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "scPortrait uses [Ashlar](https://labsyspharm.github.io/ashlar/) for stitching images. When stitching from `.tif` files, Ashlar reads channel and tile position information from filenames according to a predefined `pattern`. Hence, filenames matter when stitching from `.tif` files."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -43,12 +50,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### initializing the stitcher object"
+    "### Initializing the `Stitcher` object"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -64,12 +71,12 @@
     "slidename = \"stitching_test\"\n",
     "outdir = os.path.join(input_dir.replace(\"stitching_example\", \"example_projects/stitching\"), slidename)\n",
     "\n",
-    "row = str(2).zfill(2)  # specify the row of the well you want to stitch\n",
-    "well = str(4).zfill(2)  # specifc the well number you wish to stitch\n",
+    "row = str(2).zfill(2)  # specify the row of the well you want to stitch, here = 2\n",
+    "well = str(4).zfill(2)  # specifc the well number you wish to stitch, here = 4\n",
     "zstack_value = str(1).zfill(\n",
     "    3\n",
     ")  # specify the zstack you want to stitch. for multiple zstacks please make a loop and iterate through each of them.\n",
-    "timepoint = str(1).zfill(3)  # specifz the timepoint you wish to stitch\n",
+    "timepoint = str(1).zfill(3)  # specify the timepoint you wish to stitch\n",
     "\n",
     "pattern = f\"Timepoint{timepoint}_Row{row}_Well{well}_{{channel}}_zstack{zstack_value}_r{{row:03}}_c{{col:03}}.tif\"\n",
     "\n",
@@ -460,19 +467,19 @@
    "source": [
     "## Multi-threaded Stitching\n",
     "\n",
-    "Using the ParallelStitcher class stitching can be speed up by using multiple threads. The code to perform stitching remains more or less the same."
+    "The `ParallelStitcher` class can speed up stitching by using multiple threads. The code to start stitching remains the same, but `ParallelStitcher` takes an additional argument `threads`, specifying the number of parallel threads to use."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### initializing the stitcher object"
+    "### Initializing the `ParallelStitcher` object"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -489,12 +496,12 @@
     "outdir_parallel = os.path.join(input_dir.replace(\"stitching_example\", \"example_projects/stitching\"), slidename)\n",
     "\n",
     "\n",
-    "row = str(2).zfill(2)  # specify the row of the well you want to stitch\n",
-    "well = str(4).zfill(2)  # specifc the well number you wish to stitch\n",
+    "row = str(2).zfill(2)  # specify the row of the well you want to stitch, here = 2\n",
+    "well = str(4).zfill(2)  # specifc the well number you wish to stitch, here = 4\n",
     "zstack_value = str(1).zfill(\n",
     "    3\n",
     ")  # specify the zstack you want to stitch. for multiple zstacks please make a loop and iterate through each of them.\n",
-    "timepoint = str(1).zfill(3)  # specifz the timepoint you wish to stitch\n",
+    "timepoint = str(1).zfill(3)  # specify the timepoint you wish to stitch\n",
     "\n",
     "pattern = f\"Timepoint{timepoint}_Row{row}_Well{well}_{{channel}}_zstack{zstack_value}_r{{row:03}}_c{{col:03}}.tif\"\n",
     "\n",

diff --git a/requirements.txt b/requirements.txt
@@ -18,8 +18,7 @@ shapely
 rasterio
 imagesize
 imagecodecs
-xarray
-xarray-datatree
+xarray>=2024.10.0
 opencv-python
 scikit-image>=0.22.0
 scikit-fmm

diff --git a/requirements_dev.txt b/requirements_dev.txt
@@ -18,8 +18,7 @@ shapely
 rasterio
 imagesize
 imagecodecs
-xarray
-xarray-datatree
+xarray>=2024.10.0
 opencv-python
 scikit-image>=0.22.0
 scikit-fmm
@@ -34,7 +33,7 @@ spatialdata
 napari-spatialdata
 pyqt5
 lxml_html_clean
-ashlar
+ashlar>=1.19.0
 networkx
 py-lmd @ git+https://github.com/MannLabs/py-lmd.git@refs/pull/11/head#egg=py-lmd
 

diff --git a/src/scportrait/pipeline/_utils/sdata_io.py b/src/scportrait/pipeline/_utils/sdata_io.py
@@ -5,7 +5,6 @@
 from pathlib import Path
 from typing import Any, Literal, TypeAlias
 
-import datatree
 import numpy as np
 import xarray
 from alphabase.io import tempmmap
@@ -134,7 +133,7 @@ def _get_input_image(self, sdata: SpatialData) -> xarray.DataArray:
             ValueError: If input image not found
         """
         if self.input_image_status:
-            if isinstance(sdata.images[self.input_image_name], datatree.DataTree):
+            if isinstance(sdata.images[self.input_image_name], xarray.DataTree):
                 input_image = sdata.images[self.input_image_name]["scale0"].image
             elif isinstance(sdata.images[self.input_image_name], xarray.DataArray):
                 input_image = sdata.images[self.input_image_name].image

diff --git a/src/scportrait/pipeline/_utils/spatialdata_classes.py b/src/scportrait/pipeline/_utils/spatialdata_classes.py
@@ -4,10 +4,9 @@
 from typing import Any
 
 from dask.array import unique as DaskUnique
-from datatree import DataTree
 from spatialdata.models import C, Labels2DModel, X, Y, Z, get_axes_names
 from spatialdata.transformations.transformations import BaseTransformation
-from xarray import DataArray
+from xarray import DataArray, DataTree
 from xarray_schema.components import (
     AttrSchema,
     AttrsSchema,

diff --git a/src/scportrait/pipeline/_utils/spatialdata_helper.py b/src/scportrait/pipeline/_utils/spatialdata_helper.py
@@ -2,7 +2,6 @@
 
 from typing import TypeAlias, Union
 
-import datatree
 import numpy as np
 import pandas as pd
 import psutil
@@ -17,7 +16,7 @@
 from scportrait.pipeline._utils.segmentation import numba_mask_centroid
 
 # Type aliases
-DataElement: TypeAlias = datatree.DataTree | xarray.DataArray
+DataElement: TypeAlias = xarray.DataTree | xarray.DataArray
 ChunkSize: TypeAlias = tuple[int, ...]
 ChunkSizes: TypeAlias = list[tuple[int, ...]]
 
@@ -105,7 +104,7 @@ def get_chunk_size(element: DataElement) -> ChunkSize | ChunkSizes:
             x = x[0] if isinstance(x, tuple | list) or len(x) > 1 else x
             return (int(c), int(y), int(x))
 
-    elif isinstance(element, datatree.DataTree):
+    elif isinstance(element, xarray.DataTree):
         chunk_sizes: ChunkSizes = []
         for scale in element:
             if len(element[scale]["image"].shape) == 2:
@@ -140,7 +139,7 @@ def rechunk_image(element: DataElement, chunk_size: ChunkSize) -> DataElement:
     if isinstance(element, xarray.DataArray):
         element["image"].data = element["image"].data.rechunk(chunk_size)
         return element
-    elif isinstance(element, datatree.DataTree):
+    elif isinstance(element, xarray.DataTree):
         for scale in element:
             element[scale]["image"].data = element[scale]["image"].data.rechunk(chunk_size)
         return element

diff --git a/src/scportrait/pipeline/extraction.py b/src/scportrait/pipeline/extraction.py
@@ -6,7 +6,6 @@
 import timeit
 from functools import partial as func_partial
 
-import datatree
 import h5py
 import matplotlib.pyplot as plt
 import numpy as np
@@ -224,7 +223,6 @@ def _set_up_extraction(self):
             output_folder_name = self.DEFAULT_DATA_DIR
 
         self._setup_output(folder_name=output_folder_name)
-
         self._get_segmentation_info()
         self._get_input_image_info()
 
@@ -413,18 +411,6 @@ def _get_label_info(self, arg):
         # no additional labelling required
         return (index, save_index, cell_id, None, None)
 
-    def _get_sdata(self):
-        path = os.path.join(self.project_location, self.DEFAULT_SDATA_FILE)
-
-        self.sdata = SpatialData.read(path)
-
-        if isinstance(self.sdata.images[self.DEFAULT_INPUT_IMAGE_NAME], datatree.DataTree):
-            self.input_image = self.sdata.images[self.DEFAULT_INPUT_IMAGE_NAME]["scale0"].image
-        elif isinstance(self.sdata.images[self.DEFAULT_INPUT_IMAGE_NAME], xarray.DataArray):
-            self.input_image = self.sdata.images[self.DEFAULT_INPUT_IMAGE_NAME].image
-        else:
-            raise ValueError("Input image could not be found. Cannot proceed with extraction.")
-
     def _save_removed_classes(self, classes):
         # define path where classes should be saved
         filtered_path = os.path.join(
@@ -541,10 +527,8 @@ def _extract_classes(self, px_center, arg, return_failed_ids=False):
 
             # get the image data
             if image_index is None:
-                # image_data = self.input_image[:, window_y, window_x].compute()
                 image_data = self.image_data[:, window_y, window_x]
             else:
-                # image_data = self.input_image[image_index, :, window_y, window_x].compute()
                 image_data = self.image_data[image_index, :, window_y, window_x]
 
             image_data = (

diff --git a/src/scportrait/pipeline/project.py b/src/scportrait/pipeline/project.py
@@ -7,7 +7,6 @@
 from typing import Literal
 
 import dask.array as darray
-import datatree
 import numpy as np
 import psutil
 import xarray
@@ -382,7 +381,7 @@ def _check_sdata_status(self, print_status=False):
             self.centers_status = self.filehandler.centers_status
 
             if self.input_image_status:
-                if isinstance(self.sdata.images[self.DEFAULT_INPUT_IMAGE_NAME], datatree.DataTree):
+                if isinstance(self.sdata.images[self.DEFAULT_INPUT_IMAGE_NAME], xarray.DataTree):
                     self.input_image = self.sdata.images[self.DEFAULT_INPUT_IMAGE_NAME]["scale0"].image
                 elif isinstance(self.sdata.images[self.DEFAULT_INPUT_IMAGE_NAME], xarray.DataArray):
                     self.input_image = self.sdata.images[self.DEFAULT_INPUT_IMAGE_NAME].image