Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Karl5766 committed Dec 13, 2024
1 parent cfd4229 commit 2b1bdda
Show file tree
Hide file tree
Showing 10 changed files with 131 additions and 275 deletions.
16 changes: 16 additions & 0 deletions docs/GettingStarted/beta_amyloid_plaque_detection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -152,3 +152,19 @@ this. In a local test (without coiled, but still using gcs), we may set :code:`l

Here, the MAX_THRESHOLD variable specifies the threshold above which objects will be counted as plaque. This
threshold is applied over corrected ome zarr image based on the original image and bias image provided.

After running the above code, a numpy file will be written to :code:`f'{subject.COILED_CACHE_DIR_PATH}/final_lc.npy'`,
which contains the results in a N * 5 matrix of type np.float64. First three columns are the Z, Y, X locations of
the plaques, and the fourth and fifth column are sizes (in voxels) and a unique ID for the plaque. To retrieve:

.. code-block:: Python
from cvpl_tools.fsspec import RDirFileSystem
cdir_fs = RDirFileSystem(subject.COILED_CACHE_DIR_PATH)
with cdir_fs.open('final_lc.npy', mode='rb') as fd:
lc = np.load(cdir_fs)
print(f'First 10 rows of lc:\n')
print(lc[:10])
Alternatively, use :code:`cvpl_tools.fsspec.copyfile` function to obtain the .npy file locally.

8 changes: 4 additions & 4 deletions docs/GettingStarted/nnunet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,16 +80,16 @@ results in the following segmentation workflow we use:

2. Then select one of N scans, say M1

2. Downsample M1 and use a GUI to paint a binary mask, which contains 1 on regions of edges and 0 on plaques and
3. Downsample M1 and use a GUI to paint a binary mask, which contains 1 on regions of edges and 0 on plaques and
elsewhere

3. Split the M1 volume and its binary mask annotation vertically to Z slices, and train an nnUNet model on these slices
4. Split the M1 volume and its binary mask annotation vertically to Z slices, and train an nnUNet model on these slices

4. Above produces a model that can predict negative masks on any mousebrain scans of the same format; for the rest N-1
5. Above produces a model that can predict negative masks on any mousebrain scans of the same format; for the rest N-1
mouse brains, they are down-sampled and we use this model to predict on them to obtain their corresponding negative
masks

5. These masks are used to remove edge areas of the image before we apply thresholding to find plaque objects.
6. These masks are used to remove edge areas of the image before we apply thresholding to find plaque objects.
Algorithmically, we compute M' where :code:`M'[z, y, x] = M[z, y, x] * (1 - NEG_MASK[z, y, x]`) for each
voxel location (z, y, x); then, we apply threshold on M' and take connected component of value of 1 as individual
plaque objects; their centroid locations and sizes (in number of voxels) are summarized in a numpy table and
Expand Down
35 changes: 12 additions & 23 deletions docs/GettingStarted/ome_zarr.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,34 +18,24 @@ To view an ome-zarr file this way with :code:`cvpl_tools`, use the command
::

import cvpl_tools.ome_zarr.napari.add as napari_add_ome_zarr
napari_add_ome_zarr.subarray(viewer, "/absolute/path/to/your/ome.zarr", kwargs=dict(name="displayed_name_in_ui"))
napari_add_ome_zarr.subarray_from_path(viewer, "/absolute/path/to/your/ome.zarr", kwargs=dict(name="displayed_name_in_ui"))

- This will create a new layer named **displayed_name_in_ui** which displays your ome zarr array.
The underlying implementation uses Napari add_image's multiscale array display
:code:`subarray` uses Napari's :code:`add_image` function's :code:`multiscale` argument for this,
which loads different resolution based on viewer's current zoom factor
- To open a **.zip** file, specify ome.zarr.zip in the path (and optionally set use_zip=True to
open the file as zip regardless of its suffix)
- To open both the **ome.zarr** file and any label files located at **ome.zarr/labels/label_name**
as specified by the ome zarr standard, use cvpl_zarr.add_ome_zarr_group_from_path instead.
- To specify additional arguments passed to Napari's add_image function, pass your arguments in
the kwargs as a dictionary.
- To open an ome zarr image on Google cloud storage, use the following code instead

.. code-block:: Python
import cvpl_tools.ome_zarr.napari.add as napari_add_ome_zarr
import gcsfs
import zarr
gfs = gcsfs.GCSFileSystem(token=None)
store = gfs.get_mapper('gs://path_to_your.ome.zarr')
zarr_group = zarr.open(store, mode='r')
napari_add_ome_zarr.group(viewer, zarr_group, kwargs=dict(name="displayed_name_in_ui"))
- An extra argument :code:`is_label` can be passed into the function via :code:`kwargs` dictionary.
This is a boolean value that specifies whether to use :code:`viewer.add_labels`
(if :code:`True`) or :code:`viewer.add_image` (if :code:`False`) function. This is useful for
displaying instance segmentaion masks, where each segmented object has a distinct color.

Similarly, you can open a zip, or an image with multiple labels this way.
The zip loading feature does not work with remote location on gcs, because of the limitations of
ZipStore.


Reading and Writing ome zarr files
Expand All @@ -59,7 +49,7 @@ the following:
- image.OME.ZARR/ # The ome zarr image, which is a directory in Windows/Linux
- 0/ # The original image, in ZARR format
+ 0/ # Slice of image at first axis X=0
+ 0/ # Slice of image at first axis Z=0
+ 1/
...
.zarray # .zarray is meta attribute for the ZARR image
Expand All @@ -73,11 +63,11 @@ the following:
Above + denotes collapsed folder and - denotes expanded folder. A few things to note here:

- An image does not have to end with .OME.ZARR suffix
- The image is multiscaled if the maximum downsample level is 0 instead of 4, in which case there
- The image is not multiscaled if the maximum downsample level is 0 instead of 4, in which case there
will only be one 0/ folder
- You may confuse an ome zarr image with a single ZARR image. An ome zarr image
- An ome zarr image is easily confused with a ZARR image. An ome zarr image
is not a standard ZARR directory and contains no **.zarray** meta file. Loading an ome zarr
image as ZARR will crash, if you forget to specify **0/** subfolder as the path to load
image as ZARR will crash if you forget to specify **0/** subfolder as the path to load
- When saved as a zip file instead of a directory, the directory structure is the same except that
the root is zipped. Loading a zipped ome zarr, cvpl_tools uses :code:`ZipStore`'s features to
directly reading individual chunks without having to unpack
Expand Down Expand Up @@ -105,10 +95,9 @@ Specifying slices in path
:code:`cvpl_tools` allows specifying the channel, or x/y/z slices to use in the path string when
reading or viewing an ome zarr file for convenience.

The functions :code:`load_dask_array_from_path` in :code:`cvpl_tools.ome_zarr.io`, and
:code:`load_zarr_group_from_path` as well as :code:`load_ome_zarr_array_from_path` from
:code:`cvpl_tools.ome_zarr.napari.add` support specifying the slices in the following
syntax, much similar to torch or numpy array slicing:
The functions :code:`cvpl_tools.ome_zarr.io.load_dask_array_from_path`, and
:code:`cvpl_tools.ome_zarr.napari.add.group_from_path/subarray_from_path`
support specifying the slices in the following syntax, much similar to torch or numpy array slicing:

.. code-block:: Python
Expand Down
32 changes: 16 additions & 16 deletions docs/GettingStarted/result_caching.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ In many cases it's useful to cache some of the intermediate results instead of d
all at once. Think of the following cases where you may have encountered when writing a long-running image processing
workflow:

1. The cell density for each region in the scan is computed but the number does not match up with what's expected,
so you want to display a heatmap in a graphical viewer showing cell density. The final results you got is text
output in the console, requiring redoing the computation to display.
1. The cell density for each region in the scan is computed, but the number does not match up with what's expected,
so you want to debug by displaying a heatmap in a graphical viewer showing cell density. However,
the intermediate result is lost so you need to recompute the density in order to draw the heatmap.

2. Some error occurs and you need to find out why a step in the computation causes the issue, but it's rather
difficult to understand what went wrong without displaying some intermediate results to aid debugging.
Expand All @@ -21,7 +21,7 @@ issues, but requires saving all the results onto disk and chunked in a viewer-fr

In all cases above, caching all the intermediate results help reduce headaches and risks of unknown errors coming
from the difficulty of debugging in an image processing and distributed computing environment. The basic strategy
we use to overcome these is to cache all the results inside a directory tree. Each step saves all its
is to cache all the results inside a directory tree. Each step saves all its
intermediate and final results onto a node in the tree. The node's children are directories saved by its
sub-steps.

Expand All @@ -30,27 +30,24 @@ Here, the outputs of a processing step (function) may contain intermediate image

We describe the CacheDirectory interface in details below.

cache root directory
********************
Every cache directory tree starts with a CacheRootDirectory node at its root, which is the only node of that class in
the tree. In order to create a cache directory tree you need a url to the root directory location, as follows:
cache directory
***************
Every cache directory tree starts with a root directory. In order to create a cache directory tree you
need a url to the root directory location:

.. code-block:: Python
import cvpl_tools.tools.fs as tlfs
loc = f'path/to/root' # a remote url will work as well
query = tlfs.cdir_init(loc)
# Now a directory is created at the given path, so you can start writing cache files to it
# ...
tlfs.cidr_commit(loc)
This creates two directories 'path/to/root' on the first run,
the naming of the subfolder indicates that it is :code:`dir` a directory and :code:`cache` a
persistent cache instead of a temporary folder in that location.
This creates a directory 'path/to/root' on the first run.
The next time the program is run, it will not create new folders but return a query object with
:code:`query.commit=True`

cache directory
***************
A cache directory can be a child directory of a cache root directory or other cache directories.
:code:`query.commit=True`. A cache directory can be a child directory of a cache root directory
or other cache directories.

.. code-block:: Python
Expand All @@ -63,6 +60,9 @@ A cache directory can be a child directory of a cache root directory or other ca
print('subdirectory is already created')
else:
print('subdirectory created successfully')
# put your code here that writes to the subdir...
tlfs.cidr_commit(f'{loc}/subdir')
tlfs.cidr_commit(loc)
Tips
****
Expand Down
47 changes: 24 additions & 23 deletions docs/GettingStarted/segmentation_pipeline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,26 +3,24 @@
Segmentation Pipeline
#####################

Motivation: Microscope, Cell Counting, Atlas map and Object Segmentation
************************************************************************
Motivation
**********

In our use case, lightsheet microscopy of mouse brain produces several hundreds of GBs of
data, represented as a 3-dimensional array. This array is stored as an OME
ZARR file in our case. Distributed processing of the data is necessary to make the analysis time
data, represented as 3-dimensional arrays with an extra channel axis. This array is stored as an OME
ZARR file. Distributed computation is necessary to make the analysis time
trackable, and we choose to use Dask as the distributed computing library for our project.

As part of our research, we need to use automated method to count different types of visible objects in
the image. After counting cells, we use a map that maps from pixel location of
the image to brain regions (this is the **atlas map**) to obtain the density of cells in each region of
As part of our research, we need to use automated method to count objects in
the image. Afterwards, we use a map that maps from pixel location of
the image to brain region segmentation (this is the **atlas map**) to obtain the density of cells in each region of
the mouse brain. We need to test several methods and find one that would give the most
accurate counts, and for new incoming data volumes, we want to be able to quickly find a set of parameters
that works on the new data.

On the algorithm side, object counting consists of performing a sequence of steps to process an input image
into densities over regions of the brain, and is relatively simple to understand and implement on a Numpy
array with small dataset size. On larger datasets, we need to do long-running distributed computation
that are hard to debug and requires tens of minutes or hours if we need to rerun the computation
(often just to display the results again).
On the algorithm side, object counting is easy on a Numpy array with small dataset size.
On larger datasets, we need to long-running distributed computation
that are hard to debug and requires hours to run the computation.

SegProcess
**********
Expand All @@ -43,10 +41,10 @@ Consider a function that counts the number of cells in a 3d-block of brightness
cell_cnt = count_inst(inst) # count each contour as a cell
return cell_cnt
This code may seem complete, but it has a few issues:
This code is complete, but notice that:

1. Lack of interpretability. Often when we first run this function some bug will show up, for example
when the output cell count is unexpectedly large. Debugging this becomes a problem since we don't know
1. Lack of interpretability. Often the first run is when some bug shows up.
Debugging this becomes a problem without some way to look into intermediate results since we don't know
if one of the three steps in the cell_count function did not work as expected, or if the algorithm does
not work well on the input data for some reason. In either case, if we want to find the root cause of
the problem we very often end up adding code and rerun the pipeline to display the output of each step
Expand All @@ -55,10 +53,11 @@ to see if they match with our expectations.
2. Result is not cached. Ideally the pipeline is run once and we get the result, but more often than
not the result may need to be used in different places (visualization, analysis etc.). Caching these
results makes sure computation is done only once, which is necessary when we work with costly algorithms
on hundreds of GBs of data.
on hundreds of GBs of data (of course it's still best to first test on a slice of <1GB of data).

The basic idea to address 1) is to put visualization as part of the cell_count function, and to address
2) is to cache the result of each step into a file in a :code:`CacheDirectory`. It will provide:
2) is to cache the result of each step into a file in a :code:`CacheDirectory`. In more details, we
want a image processing pipeline that provide:

1. dask-support. Inputs are expected to be either numpy array, dask array, or
:code:`cvpl.im.ndblock.NDBlock` objects. In particular, dask.Array and NDBlock are suitable for
Expand Down Expand Up @@ -111,14 +110,16 @@ which takes arbitrary inputs and one parameter: :code:`context_args`, which will
provided, then the image will be cached via dask's persist() and its loaded copy will be returned

- storage_option (dict, optional): If provided, specifies the compression method to use for image chunks

- preferred_chunksize (tuple, optional): Re-chunk before save; this rechunking will be undone in load
- multiscale (int, optional): Specifies the number of downsampling levels on OME ZARR
- compressor (numcodecs.abc.Codec, optional): Compressor used to compress the chunks

- viewer_args (dict, optional): If provided, an image will be displayed as a new layer in Napari viewer

- viewer (napari.Viewer, optional): Only display if a viewer is provided
- is_label (bool, optional): defaults to False; if True, use viewer's add_labels() instead of
add_image() to display the array
add_image() to display the array

- layer_args (dict, optional): If provided, used along with viewer_args to specify add_image() kwargs

Expand All @@ -139,7 +140,7 @@ which takes arbitrary inputs and one parameter: :code:`context_args`, which will
- The :code:`viewer_args` parameter specifies the napari viewer to display the intermediate results. If not provided
(:code:`viewer_args=None`), then no computation will be done to visualize the image. Within the forward() method, you
should use :code:`viewer.add_labels()`, :code:`lc_interpretable_napari()` or :code:`temp_directory.cache_im()`
should use :code:`viewer.add_labels()` or :code:`tlfs.cache_im()`
while passing in :code:`viewer_args` argument to display your results:

.. code-block:: Python
Expand Down Expand Up @@ -174,9 +175,9 @@ Running the Pipeline
********************

See `Setting Up the Script <GettingStarted/setting_up_the_script>`_ to understand boilerplate code used below,
which is required to understand the following example.
required to understand the following example.

Now we have defined a :code:`process` function, the next step is to write our script that uses the pipeline
With a :code:`process` function defined, the next step is to write our script that uses the pipeline
to segment an input dataset. Note we need a dask cluster and a temporary directory setup before running the
:code:`forward()` method.

Expand All @@ -202,8 +203,8 @@ to segment an input dataset. Note we need a dask cluster and a temporary directo
client.close()
viewer.show(block=True)
If instead :code:`viewer_args=None` is passed the :code:`example_seg_process()` function will display
nothing, process the image and cache it.
If instead :code:`viewer_args=None` is passed the :code:`example_seg_process()` function will process
the image and cache it, but displays nothing.
- A process function has signature :code:`process(arg1, ..., argn, context_args)`, where
arg1 to n are arbitrary arguments and :code:`context_args` is a dictionary
Expand Down
Loading

0 comments on commit 2b1bdda

Please sign in to comment.