Skip to main content

Joint level-set and spatio-temporal motion detection for cell segmentation



Cell segmentation is a critical step for quantification and monitoring of cell cycle progression, cell migration, and growth control to investigate cellular immune response, embryonic development, tumorigenesis, and drug effects on live cells in time-lapse microscopy images.


In this study, we propose a joint spatio-temporal diffusion and region-based level-set optimization approach for moving cell segmentation. Moving regions are initially detected in each set of three consecutive sequence images by numerically solving a system of coupled spatio-temporal partial differential equations. In order to standardize intensities of each frame, we apply a histogram transformation approach to match the pixel intensities of each processed frame with an intensity distribution model learned from all frames of the sequence during the training stage. After the spatio-temporal diffusion stage is completed, we compute the edge map by nonparametric density estimation using Parzen kernels. This process is followed by watershed-based segmentation and moving cell detection. We use this result as an initial level-set function to evolve the cell boundaries, refine the delineation, and optimize the final segmentation result.


We applied this method to several datasets of fluorescence microscopy images with varying levels of difficulty with respect to cell density, resolution, contrast, and signal-to-noise ratio. We compared the results with those produced by Chan and Vese segmentation, a temporally linked level-set technique, and nonlinear diffusion-based segmentation. We validated all segmentation techniques against reference masks provided by the international Cell Tracking Challenge consortium. The proposed approach delineated cells with an average Dice similarity coefficient of 89 % over a variety of simulated and real fluorescent image sequences. It yielded average improvements of 11 % in segmentation accuracy compared to both strictly spatial and temporally linked Chan-Vese techniques, and 4 % compared to the nonlinear spatio-temporal diffusion method.


Despite the wide variation in cell shape, density, mitotic events, and image quality among the datasets, our proposed method produced promising segmentation results. These results indicate the efficiency and robustness of this method especially for mitotic events and low SNR imaging, enabling the application of subsequent quantification tasks.


Cell identification, quantification and characterization using imaging techniques are emerging research areas that are systematically integrated in biological and medical studies [1]. Recent developments in time-lapse microscopy enable the observation and quantification of cell-cycle progression, cell migration, and growth control [2]. The tasks of detecting and tracking individual cells or particles in a time series of images are key elements in this process. More importantly, the large volume of data produced by fluorescence microscopy and imaging modalities emphasizes the need for automated and robust techniques that can address the challenges in accurate detection and segmentation as well as tracking.

Cell tracking methodologies involve the tasks of preprocessing, cell segmentation and motion tracking [39]. In this context, segmentation of cells is a particularly challenging task that has a direct impact on the overall quantification process. Image segmentation is a popular field in the domain of image analysis. More specifically, parametric [10] and nonparametric active contour models [1114] have been widely used in development of bio-imaging and biomedical image analysis techniques. An interesting aspect in cell analysis methods is the relation between image quality and segmentation accuracy. Many segmentation methods address certain types of datasets; however, for low-quality images and different cell types and shapes, the same methods may yield varying levels of performance.

Earlier published works propose to use partial differential equation (PDE) models for heat diffusion to detect motion with applications to moving edge detection [15], and human assistive technologies [16]. Building upon previous ideas for estimating motion activity using spatio-temporal diffusion [16], in this work we develop and utilize a heat flow analogy model in the joint spatio-temporal domain and combine this process with a region-based level-set optimization approach for cell segmentation of images obtained by fluorescence microscopy. Spatial and temporal motion parameters of our model are estimated for each dataset and an optimal Parzen bandwidth parameter is experimentally determined for density estimation of edges and outliers in each dataset. High activity regions are initially detected by solving numerically a system of coupled spatio-temporal nonlinear partial differential diffusion equations on three consecutive frames. In order to obtain more stability in parameter choice, we apply a histogram transformation approach to match the reference background threshold from the intensity distribution of each three consecutive frames to an intensity distribution model learned from all frames of the sequence during the training stage. After this step, each video sequence frame is scaled and transformed into a sequence with background with same order of magnitude for a more robust and a less sensitive parameter choice method. Then spatial and temporal motion parameters of our model are estimated for each dataset and an optimal Parzen bandwidth parameter is experimentally estimated for density estimation for edges and outliers for each dataset. After the spatio-temporal diffusion stage is completed, we compute the edge map by nonparametric density estimation using Parzen kernels. This process is followed by watershed-based segmentation to detect the moving cells. Next, adjacent regions with motion are merged to form a moving cell by mean intensity thresholding of these regions. Thresholding determines the boundary of each cell. Finally, the moving delineation curve is used as an initial level-set to be refined using a region-based process for final segmentation. We validated the joint approach denoted by ST-Diff-TCV over a set of sequences against reference data and compared the segmentation accuracy of the joint spatio-temporal and level-set technique with results derived from Chan-Vese (CV) segmentation [17], a temporally linked level set method that we have recently presented [18] denoted by TCV, and spatio-temporal diffusion based segmentation only (ST-Diff). Our method can accurately detect fluorescent cells at an average Dice coefficient rate of 89 % showing a clear improvement over region-based level set segmentation with and without temporal linking, and nonlinear diffusion-based segmentation. In addition, it can detect and segment newly appearing cells. Another feature of this method is it can detect cells hardly detectable by means of mean intensity and produces accurate results for high or low cell density images. This method allows to detect cells that were impossible to detect using the region based CV segmentation because the optimization criterion is defined by the mean intensity inside and outside the level set defined moving curve. Hence, cell regions with low intensity values are considered as part of the background, and the region competition process fails to delineate these cells. However, these regions are detected by the spatio-temporal motion detection method because they are rather detected by their high activity process than by their intensity value, then refined by CV model to detect the cell boundaries more accurately.

The structure of this paper is organized as follows: In Section ‘Methods’ we introduce the frame intensity standardization approach by histogram transformation, the spatio-temporal diffusion-based technique, followed by the detection of spatio-temporal discontinuities by Parzen density estimation, cell delineation and identification from background, followed by the region-based level-set model with temporal linking, and finally the joint spatio-temporal diffusion and temporal level-set region-based method. Section ‘Results and discussion’ describes the dataset properties, image quality assessment, performance validation with comparisons between CV, TCV, ST-Diff, and ST-Diff-TCV, followed by discussion of results. Finally, in the conclusion we summarize the main observations about the advantages of this method and perspectives for future work.


Frame intensity standardization by histogram transformation

PDE-based techniques calculate differential approximations; therefore they are sensitive to variations in pixel intensity ranges. The main objective of this stage is first to reduce intensity variations between frames of each sequence, and second, to obtain a robust intensity prior for the cell delineation process. We apply a histogram transformation approach to match the intensity distribution of each three consecutive frames defined by (1) to an intensity distribution model learned from all frames of the sequence during the training stage.

$$ P_{3F}(I) = {\lim}_{N_{total} \to \infty} \frac{N(I)}{N_{total}}, \quad F_{3F}(k) = {\int_{0}^{k}} P_{3F}(I) dI $$

The general idea is to transform the frame intensities so that the reference cell/background threshold I ref determined from the F AF as expressed in (2) matches the global CDF reference F AF (I ref ) corresponding to the I ref value (2) indicating the tail of background intensity distribution of the complete sequence as displayed in Fig. 1 (top).

$$ P_{AF}(I) = {\lim}_{N_{total} \to \infty} \frac{N(I)}{N_{total}}, \quad F_{AF}(k) = {\int_{0}^{k}} P_{AF}(I) dI $$
Fig. 1
figure 1

(Training stage) Probability density function of 48 frames of C2DL −MSC02 dataset and cumulative distribution function. (Testing stage) Normalized PDF and CDF of three consecutive frames of C2DL −MSC02 dataset

We aim to find a transformation so that the output image is a similar image that has a background value with the same order of brightness of the input image. Figure 1 (bottom) displays how we can determine the I test value from the PDF of each three consecutive frames at the training stage using the prior value F AF (I ref ) as expressed by (3).

$$ I_{test} = arg \min_{i} \vert F_{3F}(I(\omega)) - F_{AF}(I_{ref}) \vert, $$

where ωΩ 3F , \(I : \mathbb {Z}^{2} \rightarrow \mathbb {R}^{+}\). Using these values, the resulting images are scaled and defined over the [ 0−255] range and with respect to the global minimum and global maximum intensities of all the frames of the dataset sequence after applying Eqs. 4, (5) and (6).

$$ T_{1}(I) = \frac{I_{ref} - G_{Min}}{I_{test} - L_{Min}} (I - G_{Min}) + G_{Min} $$
$$ T_{2}(I) = \frac{255 \cdot (I - G_{Min})}{G_{Max} - G_{Min}} $$
$$ I_{S}(\omega) = T(I(\omega)) = (T_{2} \circ T_{1}) (I(\omega)), \quad \omega \in \Omega_{3F} $$

We experimentally found that the matched frames are less sensitive to the temporal, spatial diffusion parameters and Parzen kernel bandwidth values than the raw frames. In our experiments we used 256 bins for all datasets.

The following steps define the two algorithms that learn the CDF reference value for background intensity F AF (I ref ) at the training stage (Algorithm 1) and transform every source image at the testing stage (Algorithm 2) so as to make its testing background as close as possible to the reference intensity.

Figure 2 displays an intensity standardization example applied on three consecutive frames. The top row displays the histogram of the complete dataset and transformations T 1 and T 2 given by (4) and (5) respectively. The bottom row shows the histogram of 3 frames used to determine I test , the original histogram of currently processed frame and transformed histogram after applying (6).

Fig. 2
figure 2

Histogram of all concatenated images of the C2DL-MSC02 dataset, and linear transformations and scaling of each pixel of the image (top row, left to right). The histogram of three consecutive frames that will be matched to the training dataset, and the histograms of the current frame before and after the transformation T(I(ω))=(T 2T 1)(I(ω)), ωΩ 3F (bottom row, left to right)

We note that after the histogram transformation and scaling, all frames of the same sequence are going to have similar pixel intensity ranges.

Spatio-temporal diffusion

Perona-Malik anisotropic diffusion

Diffusion algorithms perform image restoration by finding numerical solutions of the heat diffusion PDE [19, 20]. In this framework, the linear diffusion model is equivalent to applying Gaussian filtering to the image. To avoid the blurring and localization problems of linear diffusion filtering, Perona and Malik [21] proposed to replace the classic isotropic diffusion equation with the nonlinear diffusion model, which is based on the following PDE:

$$ \frac{\partial I }{\partial s} = div \left[ g \left({\vert \nabla I(x,y,s)} \vert \right) \cdot \nabla I(x,y,s) \right] $$

where I is the image intensity, s the scale variable for 2D case, and g(·) a function that determines the amount of diffusion, also known as diffusivity function. This function is chosen to satisfy \({\lim }_{x \to \infty } g(x) \to 0\) so that diffusion is attenuated across edges. This function controls the amount of diffusion according to the edgestrength. Common options for g(·) are the sigmoid and exponential functions also reported by Perona and Malik in [21]:

$$ g(x) = \frac{1}{\left(1 + \frac{x^{2}}{k^{2}} \right)} $$
$$ g(x) = e^{\left(- \frac{x^{2}}{k^{2}} \right)} $$

where k denotes the conductance parameter that is a positive constant. The anisotropic diffusion method has been extensively used for image restoration as it largely preserves edge image features.

Moving regions are initially detected in each three consecutive frames by numerically solving the spatio-temporal partial-differential diffusion equation [18] where the diffusivity function is applied to the gradient magnitude of the image I. In this work we used the function (9) that is more suitable for region oriented applications [22]. This nonlinear diffusion is bound to the gradient magnitude [23]. It applies more diffusion in uniform regions and slows down at edges, therefore preserves high contrast edges over low contrast ones.

Spatio-temporal nonlinear diffusion

Partial differential equation model

Here we propose to simulate nonlinear heat flow through the processed frames in both spatial and temporal dimensions. This operation smooths-out the background regions and simultaneously preserves the spatio-temporal discontinuities corresponding to cells. More specifically, given 3 consecutive frames of the sequence at times {t−1,t,t+1}, we define a system of three coupled PDEs for each frame.

At time points τ={t−1,t,t+1}

$$ \begin{aligned} \frac{\partial I(i,j,\tau,s)}{\partial s} = & g(\vert \nabla I(i,j,\tau,s) \vert) \cdot \Delta I(i,j,\tau,s) \\ & + \nabla g(\vert \nabla I(i,j,\tau,s) \vert) \cdot \nabla I(i,j,\tau,s) \end{aligned} $$

Initial condition

$$ I(i,j,\tau,0) = I_{0}(i,j,\tau) $$

Boundary condition

$$ \frac{\partial I}{ \partial \vec{n}} = 0 \qquad \text{on} \partial \Omega \times \partial T \times (0, S). $$
Numerical solution

We used the Finite Difference method to solve the system of (10) on a 2D square grid lattice. We applied padding by replicating the pixel intensities at the image borders to obtain a zero gradient at the boundaries. This will enable the detection and localization of motion within each 3 consecutive frames.

At t

$$ {}\begin{aligned} I_{i,j,t}^{s+1} = I_{i,j,t}^{s} + \lambda_{s} &\left[ g\left(\vert \nabla I_{i+1,j,t}^{s} \vert\right) \cdot N_{t} + g\left(\vert \nabla I_{i-1,j,t}^{s} \vert\right) \cdot S_{t}\right.\\ & +\left. g\left(\vert \nabla I_{i,j+1,t}^{s} \vert\right) \cdot E_{t} + g\left(\vert \nabla I_{i,j-1,t}^{s} \vert\right) \cdot W_{t}\right]\\ & + \lambda_{t} \left[\nabla I_{i,j,t-1}^{s} \cdot PF + \nabla I_{i,j,t+1}^{s} \cdot NF\right] \end{aligned} $$

At t−1

$$ \begin{aligned} I_{i,j,t-1}^{s+1} = I_{i,j,t-1}^{s} + \lambda_{s} &\left[g \left(\vert \nabla I_{i+1,j,t-1}^{s} \vert\right) \cdot N_{t-1}\right.\\ & + g\left(\vert \nabla I_{i-1,j,t-1}^{s} \vert\right) \cdot S_{t-1}\\ & + g\left(\vert \nabla I_{i,j+1,t-1}^{s} \vert\right) \cdot E_{t-1} \\ & + \left.g\left(\vert \nabla I_{i,j-1,t-1}^{s} \vert\right) \cdot W_{t-1}\right]\\ & - 2 \lambda_{tPF} \cdot \nabla I_{i,j,t-1}^{s} \cdot PF \end{aligned} $$

At t+1

$$ \begin{aligned} I_{i,j,t+1}^{s+1} = I_{i,j,t+1}^{s} + \lambda_{s} & \left[g\left(\vert \nabla I_{i+1,j,t+1}^{s} \vert\right) \cdot N_{t+1}\right. \\ & + g\left(\vert \nabla I_{i-1,j,t+1}^{s} \vert\right) \cdot S_{t+1}\\ & + g\left(\vert \nabla I_{i,j+1,t+1}^{s} \vert\right) \cdot E_{t+1}\\ & + \left. g\left(\vert \nabla I_{i,j-1,t+1}^{s} \vert\right) \cdot W_{t+1}\right]\\ & - 2 \lambda_{tNF} \cdot \nabla I_{i,j,t+1}^{s} \cdot NF \end{aligned} $$


$$ N_{t} = I_{i-1,j,t}^{s} - I_{i,j,t}^{s}, \quad S_{t} = I_{i+1,j,t}^{s} - I_{i,j,t}^{s} $$
$$ W_{t} = I_{i,j-1,t}^{s} - I_{i,j,t}^{s}, \quad E_{t} = I_{i,j+1,t}^{s} - I_{i,j,t}^{s} $$
$$ PF = I_{i,j,t-1}^{s} - I_{i,j,t}^{s}, \quad NF = I_{i,j,t+1}^{s} - I_{i,j,t}^{s} $$

In (13), (14), and (15) λ s , λ t , λ tPF , λ tNF denote the numerical “time” steps for spatial, temporal, next frame temporal, and previous frame temporal terms respectively. In our implementation we set λ t =T S Ratio ·λ s and λ tPF =λ tNF , where T S Ratio is a fixed parameter for the ratio of temporal to spatial diffusion. The diffusivity function is applied to the gradient magnitude of the image I.

Detection of spatio-temporal discontinuities by Parzen density estimation

The idea is to estimate the likelihood of mean intensity in the neighborhood of each pixel in the diffused frame. Assuming a model of unimodal probability density function (PDF) for region interiors and bimodal PDF for edges, we use the likelihood of mean intensity as an index of edge occurrence. Low values of this index correspond to a bimodal PDF indicating an edge. We estimate this likelihood by the nonparametric technique of Parzen kernels [2426].

The Parzen density estimation belongs to the nonparametric density methods [23] i.e. methods to estimate the probability density function of a random variable that do not impose any initial assumptions about the shape of the probability density functions. Its operation is based on placing at each observation sample a probability mass and producing a potential according to a Gaussian kernel. The contributions of all the sample points are averaged to estimate the density value at every point of the image [25].

$$ f_{h} (x) = 1/(n \cdot h^{p}) \sum\limits_{i=1}^{n} K((x-x_{i})/h) $$

where (x1,x2,,x n) is an independent and identically distributed sample drawn from some distribution with an unknown density P, K(·) is the kernel and h>0 is a smoothing parameter called the bandwidth. We can see in (19) that the kernel-bandwidth h can strongly affect the PDF estimate, especially when the number of observations n is finite. Very small h values will produce a ragged density estimate, while very large values will smooth the structure of the PDF. An optimal h value is usually experimentally determined to find a compromise between the variability and accuracy and converge towards the true PDF. Figure 3 shows three density estimates: the green solid line corresponds to a small bandwidth, the black line corresponds to a large bandwidth, while the blue line represents a bandwidth selection that produces a more accurate estimate of the underlying bimodal distribution.

Fig. 3
figure 3

An example of a noisy edge detection using b nonparametric density estimation. Comparison of the Parzen density estimate for different bandwidth values of h on the same image intensity samples plotted on the horizontal axis. The optimal h value estimates the bimodality of the local intensity distribution. Use of smaller h is susceptible to statistical variability, while larger h will reduce the estimation accuracy

Cell delineation and identification

The edge map can be interpreted as a topographic surface consisting of valleys corresponding to spatio-temporally homogeneous areas and peaks denoting spatio-temporal discontinuities. The next step is to apply watershed segmentation. Watershed analysis has emerged from mathematical morphology and was implemented by a series of morphology operators in its early versions. Since then, several implementations have appeared, proposing iterative, sequential, arrowing, flow line oriented and flooding techniques [27]. Regional minima of the topographic relief are selected and flooded to form the moving regions. We obtain a watershed region in the resulting segmentation for each minimum. We first find the watershed ridges of the stochastic map of spatio-temporal discontinuities. The watershed transform divides a multivalued image into separate regions by identifying the regional minima and applying flooding operations to each minimum to fill the watershed basins. Each basin corresponds to a region. We first invert the stochastic map produced by Parzen density estimation to form regions separated by spatio-temporal discontinuities.

To separate the cells we calculate intensities and areas of watershed regions and classify them into cells or background using area and intensity prior information and likelihoods p(a r e a|c i ), p(I|c i ) in Gaussian form, where c i ={b a c k g r o u n d,c e l l}. Adjacent watershed regions with coherent motion should be merged together to form a moving object. We compute mean intensity over the watershed regions and classify into foreground or background using as threshold value the standardized reference value T(I ref ) calculated by (6).

Region-based level-set model with temporal linking

In contrast to edge based methods like classical snakes [10] and early level-set methods [11], where an edge detector is used to stop the evolving curve, region-based methods tend to be less sensitive to noise. The use of region-based statistics may prove advantageous for images characterized by edge discontinuity and higher level of noise. Chan-Vese (CV) method [17] is a region-based active contour model for energy minimization. Here, we shortly describe the theoretical background of Chan-Vese model and its minimization framework. This model is a special case of the Mumford-Shah functional [28] for segmentation using piecewise constant approximation.

This model segments an input scalar image I(x,y) with \(I:\Omega \to \mathbb {R}\) and \((x,y) \in \Omega \subset \mathbb {R}^{2}\) into two disconnected regions Ω 1 and Ω 2 representing the foreground and background respectively of low intra-region variance and separated by a smooth closed contour C such that Ω=Ω 1Ω 2C. Chan and Vese proposed to use level-set functions to solve this optimization problem. In the level-set method, the contour is represented as the zero level-set of a Lipschitz function \(\phi : \Omega \to \mathbb {R}\), where ϕ is positive inside C and negative outside C. Segmentation is obtained by minimizing the following energy functional in terms of level-set:

$$ \begin{aligned} F(\phi,c_{1},c_{2}) = & \mu \cdot length\{\phi=0\} + v \cdot area \{ \phi \geq 0 \}\\ & + \lambda_{1} \int_{\phi \geq 0} {\vert I - c_{1} \vert}^{2} dx dy\\ & + \lambda_{2} \int_{\phi < 0} {\vert I - c_{2} \vert}^{2} dx dy \end{aligned} $$

where C is the evolving curve, c 1 and c 2 are the average intensity levels inside and outside the contour C, and μ,ν,λ 1,λ 2≥0 are energy weights. The length and area of C are regularizing terms are formulated using the Heaviside H and Dirac δ functions. In [17] the Euler-Lagrange equations and the gradient-descent method were used to derive the following evolution equation for the level-set function ϕ that minimizes the fitting energy using time to parametrize the gradient descent:

$$\begin{array}{*{20}l} {}\frac{\partial \phi (t,x,y)}{\partial t} = & \delta(\phi(x,y)) \cdot \left[ \mu \cdot div \left(\frac{\nabla \phi(x,y)}{\vert \nabla \phi(x,y) \vert} \right) - v \right. \\ & -\left. \lambda_{1} {(I-c_{1})}^{2} + \lambda_{2} {(I-c_{2})}^{2}{\vphantom{\sum\limits{1}}} \right] \in (0, \infty) \times \Omega \end{array} $$

with initial and Neumann boundary conditions

$$ \phi(0,x,y) = \phi_{0}(x,y) \in \Omega $$
$$ \frac{\delta(\phi)}{\vert \nabla \phi \vert} \cdot \frac{\partial \phi}{\partial \vec{n}} = 0 \in \partial \Omega $$

Temporally linked level-set segmentation

This approach makes use of temporal connection between consecutive level-set results [17]. That is, when segmenting an image, which is a part of a temporal sequence, we make use of the level-set results reached from minimization of the global energy associated with the contours of the segmented cells found in the previous time point

$$ \phi_{n+1} (x,y;0) = \phi_{n} (x,y; i_{final}), \forall (x,y) \in \Omega, $$

where n is the frame number in the time-lapse sequence, and i final is the number of iterations required to converge for frame n. We take the contour result of each frame as the initial contour for the following one. These results are utilized to minimize the energy functional of the next image. If the segmentation in frame n is accurate, then this initialization will correspond to a point close to the global optimum of the energy functional in frame n+1. The main steps of this technique are summarized in Algorithm 3.

Joint spatio-temporal diffusion and temporally linked level-set approach (ST-Diff-TCV)

We propose a joint method combining the S-T differential information with the high delineation accuracy that characterizes level set-based segmentation [15, 17]. More specifically, we use S-T Diffusion to delineate the cells first, then initialize TCV with the S-T Diffusion result to refine the cell segmentation. We apply the S-T Diffusion technique on each modulo k frame to address cell events that may not be handled by TCV such as cell mitosis, cell division, new cells entering the field of view, and other cases.

This strategy may also reduce the computational cost by applying the S-T Diffusion technique to a limited number of frames. We apply these methods on several datasets of fluorescence microscopy images with varying levels of difficulty with respect to cell density, resolution, contrast, and signal-to-noise ratio. The flowchart in Fig. 4 outlines the main stages of our proposed technique. Furthermore, in Fig. 5 we display intermediate results from each stage on a test frame and its temporal neighbors for the C2DL-MSC02 and N2DH-SIM04 sequences.

Fig. 4
figure 4

Outline of the proposed joint spatio-temporal nonlinear diffusion algorithm and temporally linked level sets methods (ST-Diff-TCV)

Fig. 5
figure 5

Intermediate results produced by ST-Diff-TCV on sample frames of a C2DL-MSC02 and b N2DH-SIM04 data sequences. First row: center, previous and next frames in the temporal space (left to right) Second row: S-T diffused frame, kernel density estimation of edge-moving regions then the inverted probability density map. Third row: watershed result, cell identification after foreground/background separation, and the reference segmentation mask (left to right)

Results and discussion

Data description

The datasets consist of 2D fluorescent microscope time-lapse image sequences. We used 12 time-lapse video sequences; 6 real microscopy time-lapse sequences and 6 computer simulated videos with various cell densities and noise levels. We obtained the training and challenge data sets from the cell tracking challenge website [29]. Simulated videos: The 6 simulated videos displayed fluorescently labeled nuclei of the HL60 (human promyelocytic leukemia) cell line migrating on a flat 2D surface (N2DH-SIM01, N2DH-SIM02, N2DH-SIM03, N2DH-SIM04, N2DH-SIM05, N2DH-SIM06). They differ in the level of noise, cell density of the initial population, the number of cells leaving and entering the field of view and the number of simulated mitotic events, yielding up to 70 cells in the field of view [29]. Real videos: We used 3 datasets each containing 2 time-lapse sequences. Two video sequences named Fluo-C2DL-MSC01 and Fluo-C2DL-MSC02 with rat mesenchymal stem cells, 2 video sequences named N2DH-GOWT101 and N2DH-GOWT102 of mouse embryonic stem cells and N2DL-HeLa01 and N2DL-HeLa02 expressing HeLa cells. These datasets are considered to have high level of difficulty [29] because of the high cell density and low resolution and intensity. Summarized information on our test data is listed in Table 1, including the image matrix size, number of frames and level of difficulty. Furthermore, a sample frame of each dataset is displayed in Fig. 6.

Fig. 6
figure 6

The 6 real and 6 simulated time-lapse sequences used for validation

Table 1 Image sequence properties and quality using Signal-to-Noise Ratio (SNR) and Contrast-to-Noise Ratio (CNR)

Image quality assessment of the datasets

In our first experiment, we measured the image quality of our datasets and then evaluated the segmentation accuracy. We utilized the available reference data for this purpose. The reference data consist of manually annotated videos for segmentation and tracking along with a short description and links to the raw datasets obtained from [29]. We first used the reference data to estimate the average Signal-to-Noise Ratio (SNR) and Contrast-to-Noise Ratio (CNR) of each dataset. The SNR and CNR measures are defined as follows:

$$ SNR = 20 \log_{10} \frac{\bar{u}_{C}}{\bar{u}_{B}} $$
$$ CNR = \frac{\vert \bar{u}_{C} - \bar{u}_{B} \vert}{\sigma_{B}} $$

where \(\bar {u}_{C}\) is the average image intensity over the cell regions, \(\bar {u}_{B}\) is the average intensity over the background and σ B is the standard deviation of the background pixels. In Table 1 we list the average SNR (in dB) and average CNR that are means over all frames in each sequence using (25) and (26) and corresponding standard deviations of each dataset over cell regions. A comparison between the qualitative level of difficulty and the image quality metrics in Table 1 shows that the simulated sequences have higher SNR and CNR, therefore being more amenable to segmentation than the real sequences.

Comparison of CV, TCV, ST-Diff, and ST-Diff-TCV methods

We applied the standard CV, TCV, ST-Diff, and ST-Diff-TCV methods on 12 time-lapse fluorescent microscopy datasets listed in Table 1. Fluorescent microscopy imaging is often times subjected to a mixture of different types of noise. The main goal of a preprocessing step is to reduce the corruption caused by noise and to improve the image quality [28]. To facilitate data analysis, a combination of filters and histogram enhancement is applied to the datasets to obtain better delineation accuracy.

We segmented each dataset using each method and evaluated the segmentation performance against reference masks. The main purpose is to evaluate how well the segmented cells match the cell regions of the reference mask. We quantify the accuracy of the segmentation performance by computing the DICE similarity coefficient denoted by DSC. This is defined as:

$$ DSC = 2 \times \frac{\vert R_{S} \cap R_{Ref} \vert}{\vert R_{S} \vert + \vert R_{Ref} \vert} \in\ [\!0,1], $$

where R Ref is the set of all pixels that belong to cell regions in the reference image, R S is the set of all binary regions delineated by the tested segmentation technique. The DICE coefficient measures the relative similarity between two binary images over their cardinalities. It is frequently used for image segmentation validation. The value of 1 indicates perfect matching.

We computed the DICE coefficient between the automated and reference segmentations for each method and for each dataset. Further, we computed the means and the standard deviations of the DICE similarity coefficients over all frames for each dataset sequence. Figure 7 and Table 2 report the DSC estimates and their variations for each sequence. In addition, the last row in Table 2 lists the overall DSC values for all datasets. In Fig. 7 and Table 2 we observe that ST-Diff-TCV yields higher DSC values for 11 out of the 12 test sequences. ST-Diff-TCV yields an average Dice coefficient of 0.89 over all datasets, while both CV and TCV yield 0.78, and ST-Diff yields 0.85 (Table 2). Furthermore, the standard deviation values in Table 2 show more robustness and stability. That is, the standard deviations obtained from ST-Diff and ST-Diff-TCV (0.01-0.03) are significantly smaller than those derived from the CV method (0.01-0.4) and even TCV (0.02-0.08) indicating better convergence and stability.

Fig. 7
figure 7

Dice similarity coefficients (DSC) produced by standard Chan −Vese model (CV), temporally linked Chan −Vese technique (TCV), spatio −temporal diffusion (ST −Diff), and the joint ST −Diff−TCV methods over all 12 datasets

Table 2 The mean DICE coefficient obtained from segmentation of each sequence by CV, TCV, ST-Diff, and the joint ST-Diff-TCV method

To illustrate the performance comparison among the three tested methods in more detail, we show in Fig. 8 the results derived from CV, TCV, ST-Diff, and ST-Diff-TCV methods on N2DH-SIM02 and N2DH-SIM04 datasets. In the N2DH-SIM02 sequence (Fig. 8(a)) we observe that because of the non-convexity of the energy functional (allowing therefore many local minima), the CV method reached several local minima of energy. In contrast, the TCV method led to a global minimum of the energy. ST-Diff-TCV method yields accurate delineation of the cells with fewer fluctuations in the Dice coefficient than the other methods. We note that ST-Diff-TCV yields an average Dice coefficient of 0.94, while CV yields 0.78, TCV yields 0.86, and ST-Diff yields 0.88. In the N2DH-SIM04 dataset as displayed in Fig. 8(b) we observe that ST-Diff-TCV produces the highest accuracy at a DSC value of 0.93, followed by ST-Diff, CV and TCV with Dice coefficients of 0.91, 0.88 and 0.86 respectively.

Fig. 8
figure 8

Dice similarity coefficients (DSC) produced by standard Chan-Vese segmentation (CV), temporally linked Chan-Vese technique (TCV), spatio-temporal diffusion (ST-Diff), and the joint ST-Diff-TCV methods for each frame of a N2DH-SIM02 and b N2DH-SIM04 datasets

Furthermore, Fig. 9 displays cell delineations represented by yellow contour maps for one frame of the sequence N2DL-Hela2 including the manual reference, and automated segmentation produced by all tested methods. This sequence has an increased level of difficulty because of the high cell density and low contrast between some cells and the background. Because CV and TCV methods use piecewise constant approximations for object and background as can be seen in (21), the low contrast cells are likely to be falsely identified as background therefore reducing DSC (CV: 0.84, TCV: 0.67). On the other hand, both ST-Diff and ST-Diff-TCV identify the spatio-temporal discontinuities and detect the cells that are missed by CT and TCV as outlined by white rectangles in Fig. 9. In the magnified local regions of the test image we note that ST-Diff-TCV yields more accurate cell separation for adjacent cells than ST-Diff.

Fig. 9
figure 9

Cell boundaries produced by the 4 tested methods on N2DL-HeLa02 sequence frame. The spatio-temporal analysis enables the identification of more moving cells than the level-set models. Furthermore, ST-Diff-TCV produces more accurate cell separation than ST-Diff (magnified regions)

In some of our datasets, the fluorescent images contain low intensity nuclei where there is absorption rather than fluorescence in local parts of the same image resulting in heterogeneous cell intensity levels. These darker regions have low intensity that can be mis-detected as background. As a result the region competition process would fail to delineate them. However, these cells are identified by the spatio-temporal motion analysis method because of their high temporal activity, which is proven to be more efficient in these cases. For example, we observed that CV was not able to detect some cells with very low intensity in both N2DL-Hela and N2DH-GOWT sequences. Conversely, those cells were very well delineated using the temporal differences between frames, i.e., by ST-Diff-TCV, thus significantly improving the segmentation accuracy from 0.72 to 0.82 for N2DL-Hela01, and from 0.68 to 0.92 for N2DH-GOWT102 leading to DSC improvements up to 24 %.

On the other hand, the proposed technique involves some motion diffusion – i.e., T S Ratio , λ t , and Parzen kernel parameters – which are experimentally determined for each sequence. ST-Diff-TCV performance exhibits moderate sensitivity to the parameter values. In this work we performed exhaustive grid search in the parameter space to identify the optimal settings. Alternate parameter optimization techniques may be required to achieve more accurate segmentation in sequences with significantly different quality levels and cell types. In summary, our experiments suggest that the joint ST-Diff-TCV method improves the segmentation accuracy compared to CV, TCV, and ST-Diff, especially when applied to simulated and real microscopy images with cells characterized by wide intensity variations and undergoing mitotic events, changes in density, and low SNR.


In this work, we introduced a local-global co-operative approach to dynamic cell segmentation. One component of this approach performs nonlinear spatio-temporal diffusion-based motion analysis, Parzen kernel-based detection of discontinuities, and watershed-based foreground-background separation. This local-based segmentation part generates a delineation that we use as the initial level-set in a region-based temporally linked level-set model. The improvement in segmentation accuracy is mainly achieved by using both the local motion and the global statistical information for segmenting cells with heterogeneous intensity levels. We evaluated the performance of our approach denoted by ST-Diff-TCV, two level-set based methods denoted by CV and TCV, and ST-Diff methods on datasets obtained from the online Cell Tracking challenge [29]. Every dataset addresses a different type of challenge for segmentation.

In comparison to CV and TCV, both ST-Diff and ST-Diff-TCV perform more robust cell segmentation, especially for cells undergoing mitosis, leaving and entering the field of view, and cells with lower mean intensity than the background intensity level. ST-Diff-TCV further improves the segmentation accuracy compared to ST-Diff by refining the cell delineation. Still, this method is dependent on some parameter retuning to optimize segmentation accuracy for different types of imaging sequences. This approach is beneficial for quantification of a wide range of types of image sequences. Future goals are to compute cell features representing cell morphology for classification and tracking.


  1. Eils R, Athale C. Computational imaging in cell biology. J Cell Biol. 2003; 161(3):477–81. doi:10.1083/jcb.200302097.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Stephens DJ, Allan VJ. Light microscopy techniques for live cell imaging. Science. 2003; 300(5616):82–6. doi:10.1126/science.1082160.

    Article  CAS  PubMed  Google Scholar 

  3. Yang X, Li H, Zhou X. Nuclei segmentation using marker-controlled watershed, tracking using mean-shift, and kalman filter in time-lapse microscopy. Circ. Syst. I: Regular Papers IEEE Trans. 2006; 53(11):2405–14. doi:10.1109/TCSI.2006.884469.

    Google Scholar 

  4. Zhou X, Li F, Yan J, Wong STC. A novel cell segmentation method and cell phase identification using markov model. IEEE Trans Inf Technol Biomed. 2009; 13(2):152–7. doi:10.1109/TITB.2008.2007098.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Li F, Zhou X, Ma J, Wong STC. Multiple nuclei tracking using integer programming for quantitative cancer cell cycle analysis. IEEE Trans Med Imaging. 2010; 29(1):96–105. doi:10.1109/TMI.2009.2027813.

    Article  PubMed  Google Scholar 

  6. Meijering E, Dzyubachyk O, Smal I. Methods for cell and particle tracking. Methods Enzymol. 2012; 504:183–200. doi:10.1016/B978-0-12-391857-4.00009-4.

    Article  PubMed  Google Scholar 

  7. Chen C, Wang W, Ozolek JA, Rohde GK. A flexible and robust approach for segmenting cell nuclei from 2d microscopy images using supervised learning and template matching. Cytom Part A. 2013; 83A(5):495–507. doi:10.1002/cyto.a.22280.

    Article  CAS  Google Scholar 

  8. Maska M, Ulman V, Svoboda D, Matula P, Matula P, Ederra C, Urbiola A, Espana T, Venkatesan S, Balak DMW, Karas P, Bolckova T, Streitova M, Carthel C, Coraluppi S, Harder N, Rohr K, Magnusson KEG, Jalden J, Blau HM, Dzyubachyk O, Kizek P, Hagen GM, Pastor-Escuredo D, Jimenez-Carretero D, Ledesma-Carbayo MJ, Munoz-Barrutia A, Meijering E, Kozubek M, Ortiz-de-Solorzano C. A benchmark for comparison of cell tracking algorithms. Bioinformatics. 2014; 30(11):1609–17. doi:10.1093/bioinformatics/btu080.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Faure E, Savy T, Rizzi B, Melani C, Stasova O, Fabreges D, Spir R, Hammons M, Cunderlik R, Recher G, Lombardot B, Duloquin L, Colin I, Kollar J, Desnoulez S, Affaticati P, Maury B, Boyreau A, Nief JY, Calvat P, Vernier P, Frain M, Lutfalla G, Kergosien Y, Suret P, Remesikova M, Doursat R, Sarti A, Mikula K, Peyrieras N, Bourgine P. A workflow to process 3d+time microscopy images of developing organisms and reconstruct their cell lineage. Nat Commun. 2016;7. doi:10.1038/ncomms9674.

  10. Kass M, Witkin A, Terzopoulos D. Snakes: Active contour models. Int J Comput Vis. 1988; 1(4):321–31. doi:10.1007/BF00133570.

    Article  Google Scholar 

  11. Malladi R, Sethian JA, Vemuri BC. Shape modeling with front propagation: a level set approach. IEEE Trans Pattern Anal Mach Intell. 1995; 17:158–75. doi:10.1109/34.368173.

    Article  Google Scholar 

  12. Cremers D, Rousson M, Deriche R. A review of statistical approaches to level set segmentation: Integrating color, texture, motion and shape. Int J Comput Vis. 2006; 72(2):195–215.

    Article  Google Scholar 

  13. Dzyubachyk O, van Cappellen WA, Essers J, Niessen WJ, Meijering EHW. Advanced level-set-based cell tracking in time-lapse fluorescence microscopy. IEEE Trans Med Imaging. 2010; 29(3):852–67. doi:10.1109/TMI.2009.2038693.

    Article  PubMed  Google Scholar 

  14. Nath SK, Palaniappan K, Bunyak F. Cell Segmentation Using Coupled Level Sets and Graph-Vertex Coloring In: Larsen R, Nielsen M, Sporring J, editors. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2006: 9th International Conference, Copenhagen, Denmark, October 1-6, 2006. Proceedings, Part I. Berlin, Heidelberg: Springer: 2006. p. 101–8.

    Google Scholar 

  15. Direkoglu C, Nixon MS. Moving-edge detection via heat flow analogy. Pattern Recogn Lett. 2011; 32(2):270–9. doi:10.1016/j.patrec.2010.08.012.

    Article  Google Scholar 

  16. Makrogiannis SK, Bourbakis NG. Motion analysis with application to assistive vision technology. In: 16th IEEE International Conference On Tools with Artificial Intelligence, 2004. Piscataway, New Jersey: IEEE: 2004. p. 344–52, doi:10.1109/ICTAI.2004.89.

    Google Scholar 

  17. Chan TF, Vese LA. Active contours without edges. Image Process IEEE Trans Image Process. 2001; 10(2):266–77. doi:10.1109/83.902291.

    Article  CAS  PubMed  Google Scholar 

  18. Boukari F, Makrogiannis S. Spatio-temporal level-set based cell segmentation in time-lapse image sequences In: Bebis G, Boyle R, Parvin B, Koracin D, McMahan R, Jerald J, Zhang H, Drucker S, Kambhamettu C, El Choubassi M, Deng Z, Carlson M, editors. Advances in Visual Computing. Lecture Notes in Computer Science, vol. 8888. Cham, Switzerland: Springer International: 2014. p. 41–50.

    Google Scholar 

  19. Weickert J. Anisotropic Diffusion in Image Processing. ECMI Series. Stuttgart: Teubner; 1998.

  20. Black MJ, Sapiro G, Marimont DH, Heeger D. Robust anisotropic diffusion. IEEE Trans Image Process. 1998; 7(3):421–32. doi:10.1109/83.661192.

    Article  CAS  PubMed  Google Scholar 

  21. Perona P, Malik J. Scale-space and edge detection using anisotropic diffusion. Pattern Anal Mach Intell IEEE Trans. 1990; 12(7):629–39. doi:10.1109/34.56205.

    Article  Google Scholar 

  22. You YL, Xu W, Tannenbaum A, Kaveh M. Behavioral analysis of anisotropic diffusion in image processing. IEEE Trans. Image Process. 1996; 5(11):1539–53. doi:10.1109/83.541424.

    Article  CAS  PubMed  Google Scholar 

  23. Tsiotsios C, Petrou M. On the choice of the parameters for anisotropic diffusion in image processing. Pattern Recognit. 2013; 46(5):1369–81. doi:10.1016/j.patcog.2012.11.012.

    Article  Google Scholar 

  24. Parzen E. On estimation of a probability density function and mode. Ann Math Stat. 1962; 33(3):1065–76.

    Article  Google Scholar 

  25. Izenman A. Recent developments in nonparametric density estimation. J Am Stat Assoc. 1991; 86:205–24.

    Google Scholar 

  26. Comaniciu D, Meer P. Mean shift: A robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell. 2002; 24(5):603–19. doi:10.1109/34.1000236.

    Article  Google Scholar 

  27. Soille P. Morphological image analysis: principles and applications, 2nd edn. Secaucus, NJ, USA: Springer; 2003.

    Google Scholar 

  28. Mumford D, Shah J. Optimal approximations by piecewise smooth functions and associated variational problems. Comm Pure Appl Math. 1989; 42(5):577–685. doi:10.1002/cpa.3160420503.

    Article  Google Scholar 

  29. Cell Tracking Challenge. 2013.

Download references


We acknowledge the support by the Center for Research and Education in Optical Sciences and Applications (CREOSA) in Delaware State University funded by NSF CREST HRD-1242067. Research reported in this publication was also supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number SC3GM113754. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.


This article has been published as part of BMC Medical Genomics Vol 9 Suppl 2 2016: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2015: medical genomics. The full contents of the supplement are available online at


Publication charges for this article have been funded by the National Institutes of Health.

Availability of data and materials

The time-lapse microscopy videos and the reference datasets used in this paper are publicly available by the organizers of the cell tracking challenge at

Authors’ contributions

FB: Designed and implemented the algorithm, designed and performed validation experiments, interpreted the results and wrote the manuscript. SM: Contributed to designing and implementing the algorithm, designing the validation experiments, interpreting the results, and writing the manuscript. Both authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

The experimental procedures involving human subjects and animal models described in this paper were approved by the Institutional Review Board and the Institutional Animal Care and Ethics Committee of the institutions that provided the data.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Sokratis Makrogiannis.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boukari, F., Makrogiannis, S. Joint level-set and spatio-temporal motion detection for cell segmentation. BMC Med Genomics 9 (Suppl 2), 49 (2016).

Download citation

  • Published:

  • DOI: