Default pipeline
Refer to the pipeline
module for a general overview of the pipeline concept (involving different stages, inputs, and outputs).
Theory
This is an overview of the fundamental concepts described in Kostrykin and Rohr (TPAMI 2023).
Deformable shape models
Let \(\omega \subseteq \Omega\) be any image region, that is a non-empty subset of the image points \(\Omega\) in an arbitrary but fixed order \(\omega = \left\{ x_1, \dots, x_{\#\omega} \right\}\). Then, a deformable shape model within this image region is defined as the zero-level set of the deformable surface
where
\(f_x\) is a second-order polynomial basis function expansion of the image point \(x\), and \(G_\omega\) is a block Toeplitz matrix where each row corresponds to a Gaussian function with standard deviation \(\sigma_G\) centered at the image points \(x_1, \dots, x_{\#\omega}\). The vectors \(\theta\) and \(\xi\) are the polynomial parameters and the deformation parameters, respectively. See Section 2.1 of the paper for more details.
Convex energy minimization
Model fitting within any image region \(\omega\) is performed by minimization of the convex energy function
where \(\ell(\theta, \xi)\) is a convex loss function defined by
and \(\alpha\) is a regularization parameter which governs the regularization of the deformations. This is implemented in the superdsm.dsm
module. See Section 2.2 of the paper for more details.
The vector \(Y_\omega\) corresponds to the image intensities, shifted by the intensity offsets \(\tau_{x^{(1)}}, \dots, \tau_{x^{(\#\omega)}}\). These offsets are chosen so that they roughly separate image foreground and image background, in the sense that image foreground rather corresponds to positive components of the vector
whereas image background rather corresponds to negative components. The computation of the intensity offsets is based on the Gaussian filter \(\mathcal G_\sigma\) and described in Supplemental Material 1 of the paper.
Coarse-to-fine region analysis
Let \(U\) be a universe of atomic image regions, so that no atomic image region contains more than a single object (but any object can be split into multiple atomic regions). The atomic image regions are generated by recursively splitting image regions until certain criteria are met (the procedure is hence refered to as coarse-to-fine region analysis). Image regions are split by choosing two seed points, which correspond to local intensity peaks, and performing a seeded watershed transform of the image intensities. Details are given in Supplemental Material 5.
Splitting of image regions is performed according to the normalized energy
see the C2F_RegionAnalysis
stage for details.
Two atomic image regions \(u,v \in U\) are called adjacent if and only if there exists a path \(\pi \subset \Omega\) between \(u\) and \(v\) so that \(Y_\omega|_{\omega=\pi} > 0\). Let \(\Pi \subseteq U \times U\) be the set of all connected atomic image regions, i.e. \((u,v) \in \Pi\) if and only if the adjacency graph \(\mathcal G = (U, \mathcal E)\) contains a path between \(u\) and \(v\). Details are given in Section 2.3.1 of the paper.
Joint segmentation and cluster splitting
Global energy minimization is performed by solving \(\operatorname{MSC}(\mathbb P(U))\), where
and
is an instance of the min-weight set-cover problem, and
is the set energy function. The constant term \(\beta\) governs the sparsity of the solution. It is also the maximum allowed energy difference of merging two deformable shape models (two image regions). See Section 2.3.2 of the paper for details.
Instead of solving \(\operatorname{MSC}(\mathbb P(U))\) directly, a sequence \(\mathscr U_1, \dots, \mathscr U_{\# U} \subseteq \mathbb P(U)\) is computed so that
If, however, \(c(U) \leq \beta + \sum_{u \in U} c(\{u\})\), then the closed-form solution
holds and the sequential computation is not required. Regions of possibly clustered objects are processed separately of each other, so, in fact, there are multiple disjoint universes of atomic image regions per image. Thus, the closed-form solution corresponds to cases of non-clustered objects. See Sections 2.3.3, 3.1, and 3.3 of the paper for details.
Pipeline stages
The function pipeline.create_default_pipeline()
employs the following stages:
Preprocessing
— Implements the computation of the intensity offsets.DSM_Config
— Provides the hyperparameters from thedsm
namespace as an output.C2F_RegionAnalysis
— Implements the coarse-to-fine region analysis scheme.GlobalEnergyMinimization
— Implements the global energy minimization.Postprocessing
— Discards spurious objects and refines the segmentation masks.
Inputs and outputs
Pipeline stages require different inputs and produce different outputs. These are like intermediate results, which are shared or passed between the stages. The pipeline maintains their state, which is kept inside the pipeline data object. Below is an overview over all inputs and outputs available within the default pipeline:
g_raw
The raw image intensities \(g_{x^{1}}, \dots, g_{x^{\#\Omega}}\), normalized so that the intensities range from 0 to 1. Up to the normalization, this corresponds to the original input image, unless histological image data is being processed (i.e. the hyperparameter
histological
is set toTrue
). Provided by the pipeline via theinit()
method, refer to its documentation for details.g_rgb
This is the original image, if histological image data is being processed (i.e. the hyperparameter
histological
is set toTrue
). Otherwise,g_rgb
is not available as an input. Provided by the pipeline via theinit()
method, refer to its documentation for details.y
The offset image intensities \(Y_\omega|_{\omega = \Omega}\), represented as an object of type
numpy.ndarray
of the same shape as theg_raw
image. Provided by thePreprocessing
stage.dsm_cfg
A dictionary corresponding to the hyperparameters which reside in the
dsm
namespace. Provided by theDSM_Config
stage.y_mask
Binary image corresponding to a mask of “empty” image regions (
False
), that are discarded from consideration, and those which possibly contain objects and are considered for segmentation (True
). This is described in Section 3.1 of the paper. Provided by theC2F_RegionAnalysis
stage.atoms
Integer-valued image representing the universe of atomic image regions. Each atomic image region has a unique label, which is the integer value. Provided by the
C2F_RegionAnalysis
stage.adjacencies
The adjacency graph \(\mathcal G\), represented as an object of the type
AtomAdjacencyGraph
. Provided by theC2F_RegionAnalysis
stage.seeds
The seed points which were used to determine the atomic image regions, represented by a list of tuples of coordinates. Provided by the
C2F_RegionAnalysis
stage.clusters
Integer-valued image representing the regions of possibly clustered obejcts. Each region has a unique label, which is the integer value. Provided by the
C2F_RegionAnalysis
stage.y_img
An
Image
object corresponding to a joint representation of the offset image intensitiesy
and masky_mask
. Provided by theGlobalEnergyMinimization
stage.cover
An
MinSetCover
object corresponding to \(\operatorname{MSC}(\mathscr U_{\# U})\). The optimal family \(\mathscr X \subseteq \mathbb P(U)\) is accessible via itssolution
property. Provided by theGlobalEnergyMinimization
stage.objects
List of all computed objects \(\mathscr U_{\# U}\), each represented by the
Object
class. Provided by theGlobalEnergyMinimization
stage.performance
An object of the
PerformanceReport
class which carries values indicating the performance of the algorithms used by theGlobalEnergyMinimization
stage. Provided by theGlobalEnergyMinimization
stage.postprocessed_objects
List of post-processed objects, each represented by the
PostprocessedObject
class. Provided by thePostprocessing
stage.
Batch system
Task specification
To perform batch processing of a dataset, you first need to create a task. To do that, create an empty directory, and put a task.json
file in it. This file will contain the specification of the segmentation task. Below is an example specification:
{
"runnable": true,
"num_cpus": 16,
"environ": {
"MKL_NUM_THREADS": 2,
"OPENBLAS_NUM_THREADS": 2
},
"img_pathpattern": "/data/dataset/img-%d.tiff",
"seg_pathpattern": "seg/dna-%d.png",
"adj_pathpattern": "adj/dna-%d.png",
"log_pathpattern": "log/dna-%d",
"cfg_pathpattern": "cfg/dna-%d.json",
"overlay_pathpattern": "overlays/dna-%d.png",
"file_ids": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
"config": {
}
}
The meaning of the different fields is the follows:
runnable
Marks this task as runnable (or not runnable). If set to
false
, the specification will be treated as a template for derived tasks. Derived tasks are placed in sub-folders and inherit the specification of the parent task. This is useful, for example, if you want to try out different hyperparameters. The batch system automatically picks up intermediate results of parent tasks to speed up the completion of derived tasks.num_cpus
The number of processes which is to be used simultaneously (in parallel).
environ
Defines environment variables which are to be set. In the example above, MKL and OpenBLAS numpy backends are both instructed to use two threads for parallel computations.
img_pathpattern
Defines the path to the input images of the dataset, using placeholders like
%d
for decimals and%s
for strings (decimals can also be padded with zeros to a fixed length using, e.g., use%02d
for a length of 2).seg_pathpattern
Relative path of files, where the segmentation masks are to be written to, using placeholders as described above.
adj_pathpattern
Relative path of files, where the images of the atomic image regions and adjacency graphs are to be written to, using placeholders as described above (see Coarse-to-fine region analysis).
log_pathpattern
Relative path of files, where the logs are to be written to, using placeholders as described above (mainly for debugging purposes).
cfg_pathpattern
Relative path of files, where the hyperparameters are to be written to, using placeholders as described above (mainly for reviewing the automatically generated hyperparameters).
file_ids
List of file IDs, which are used to resolve the pattern-based fields described above. In the considered example, the list of input images will resolve to
/data/dataset/img-1.tiff
, …,/data/dataset/img-10.tiff
. File IDs are allowed to be strings, and they are also allowed to contain/
to encode paths which involve sub-directories.last_stage
If specified, then the pipeline processing will end at the specified stage.
dilate
Performs morphological dilation for all final segmentation masks, using the given amount of pixels. For negative values, morphological erosion is performed.
merge_overlap_threshold
If specified, then any pair of two objects (final segmentation masks) with an overlap larger than this threshold will be merged into a single object.
config
Defines the hyperparameters to be used. The available hyperparameters are described in the documentation of the respective stages of the default pipeline (see Pipeline stages). Note that namespaces must be specified as nested JSON objects.
Instead of specifying the hyperparameters in the task specification directly, it is also possible to include them from a separate JSON file using the base_config_path
field. The path must be either absolute or relative to the task.json
file. It is also possible to use {DIRNAME}
as a substitute for the name of the directory, which the task.json
file resides in. The placeholder {ROOTDIR}
in the path specification resolves to the root directory passed to the batch system (see below).
Examples can be found in the examples
sub-directory of the SuperDSM repository.
Batch processing
To perform batch processing of all tasks specified in the current working directory, including all sub-directories and so on:
python -m 'superdsm.batch' .
This will run the batch system in dry mode, so nothing will actually be processed. Instead, each task which is going to be processed will be printed, along with some additional information. To actually start the processing, re-run the command and include the --run
argument.
In this example, the current working directory will correspond to the root directory when it comes to resolving the {ROOTDIR}
placeholder in the path specification.
Note that the batch system will automatically skip tasks which already have been completed in a previous run, unless the --force
argument is used. On the other hand, tasks will not be marked as completed if the --oneshot
argument is used. To run only a single task from the root directory, use the --task
argument, or --task-dir
if you want to automatically include the dervied tasks. Note that, in both cases, the tasks must be specified relatively to the root directory.
Refer to python -m 'superdsm.batch' --help
for further information.