- Research
- Open Access

# Reverse-engineering of gene networks for regulating early blood development from single-cell measurements

- Jiangyong Wei
^{1}, - Xiaohua Hu
^{2}, - Xiufen Zou
^{3}and - Tianhai Tian
^{4}Email author

**10 (Suppl 5)**:72

https://doi.org/10.1186/s12920-017-0312-z

© The Author(s) 2017

**Published:**28 December 2017

## Abstract

### Background

Recent advances in omics technologies have raised great opportunities to study large-scale regulatory networks inside the cell. In addition, single-cell experiments have measured the gene and protein activities in a large number of cells under the same experimental conditions. However, a significant challenge in computational biology and bioinformatics is how to derive quantitative information from the single-cell observations and how to develop sophisticated mathematical models to describe the dynamic properties of regulatory networks using the derived quantitative information.

### Methods

This work designs an integrated approach to reverse-engineer gene networks for regulating early blood development based on singel-cell experimental observations. The wanderlust algorithm is initially used to develop the pseudo-trajectory for the activities of a number of genes. Since the gene expression data in the developed pseudo-trajectory show large fluctuations, we then use Gaussian process regression methods to smooth the gene express data in order to obtain pseudo-trajectories with much less fluctuations. The proposed integrated framework consists of both bioinformatics algorithms to reconstruct the regulatory network and mathematical models using differential equations to describe the dynamics of gene expression.

### Results

The developed approach is applied to study the network regulating early blood cell development. A graphic model is constructed for a regulatory network with forty genes and a dynamic model using differential equations is developed for a network of nine genes. Numerical results suggests that the proposed model is able to match experimental data very well. We also examine the networks with more regulatory relations and numerical results show that more regulations may exist. We test the possibility of auto-regulation but numerical simulations do not support the positive auto-regulation. In addition, robustness is used as an importantly additional criterion to select candidate networks.

### Conclusion

The research results in this work shows that the developed approach is an efficient and effective method to reverse-engineer gene networks using single-cell experimental observations.

## Keywords

- Genetic regulatory network
- Blood stem cell
- Single-cell experiment
- Graphic model
- Dynamic model

## Background

The advances in omics technologies have generated huge amount of information regarding gene expression levels and protein kinase activities. The availability of the large datasets provides unprecidental opportunities to study large-scale regulatory networks inside the cell by using various types of omics datasets [1, 2]. The majority of the generated datasets are based on the measurements using a population of cells. However, biological experiments and theoretical studies have suggested that noise is a very import factor to determine the dynamics of biological systems [3–5]. In recent years, a number of experimental tools including single-cell qPCR, single-cell RNA sequencing, and multiplex single-cell proteomic methods, have been used for lineage tracing of cellular phenotypes, understanding cellular functionality, and high-throughput drug screening [6–8]. A centre theme in single-cell study is the highly heterogeneity at virtually all molecular levels beyond the genome [9]. The availability of large amount single-cell data has stimulated great interests in bioinformatics studies for analysing, understanding and visualizing single-cell data. A particular interesting research problem is the development of regulatory network models using single-cell observation data [10–12].

Mathematical methods for the analysis of single-cell observation data is mainly for normalization of experimental data, identification of variable genes, sub-population identification, differentiation detection and pseudo-temporal ordering [13]. These top-down approaches are mainly based on statistical analysis and machine learning techniques, and thus are able to deal with large-scale single-cell datasets [14]. For example, the algorithm Wanderlust represents each cell as a node and then ensemble cells into k-nearest neighbour graphs [15]. For each graph, it computes iteratively the shortest-path distance between cells. Another important type of approaches is the graphic model that represents the connection and/or regulations between genes and proteins. Different methods, such as the probabilistic graphic model, linear regression model, Bayesian network and Boolean network, have been applied to develop the regulatory networks [16–21].

One of the major challenges in computational biology is the development of dynamic models, such as differential equation models, to study the dynamic properties of genetic regulatory networks [17, 22, 23]. There are two major steps in designing a dynamic model, namely to determine the structure of network by specifying the connection and regulation between genes and proteins [19], and to quantify the strength of regulations [24]. In the last decade, a number of approaches have been applied to design dynamic models, including differential equation model, neural network model, petri-network model, and chemical reaction systems [25–29]. Recently we have proposed an integrated framework that combines both the probabilistic graphic model and differential equation model to infer the p53 gene networks that regulating the apoptosis process [30]. Regarding single-cell data, an additional step is to develop the pseudo-temporal trajectory by assuming each cell is uniformly distributed at different time points of the evolutionary process. A number of methods have been developed to analyse single-cell data [15, 31–35]. For example, The dimensionality reduction methods and cellular trajectory learning technique have been used to reverse-engineer the regulatory network by using differential equation based models [31]. In addition, the Single Cell Network Synthesis(SCNS) toolkit has been designed to develop regulatory network using single-cell experimental observation [36, 37]. Although these methods use either logistic models or dynamic models to infer genetic networks, a number of issues still remains, such as inference of network structure, development of appropriate dynamic models, and estimation of model parameters in the dynamic model.

To address these issues, this work propose a novel approach to reverse-engineer gene networks using single-cell observations. To get pseudo-temporal ordering of single cells, we first use a method of dimensionality reduction, namely diffusion maps [36, 37], to get the lower dimensional structure of gene expression data, and then use the wanderlust algorithm [15] to order single cells according to their relative position in the cell cycle. Since there is substantial noise in the generated pseudo-trajectory data, the Gaussian processes regression method is used to smooth the expression data [38]. Then the GENIE3 algorithm is employed to infer the structure of gene regulatory network [39]. Using single-cell quantitative real-time reverse transcription-PCR analysis of 33 transcription factors and additional marker genes in 3934 cells with blood-forming potential, we develop a graphic model for the network of 40 genes and dynamic model for a network of nine genes.

## Methods

### Experimental data

A recent experimental study used the single-cell qPCR technique to identify the expression levels of 46 genes in 3934 single stem cells that were isolated from the mouse embryo [37]. The Flkl expression in combination with a Runx1-ires-GFP report mouse was used to measure cells with blood potential at distinct anatomical stages across a time course of mouse development. Single Flkl ^{+} cells were flow sorted at primitive streak (PS), neural plate (NP) and head fold (HF) stages. In addition, the E8.25 cells were subdivided into putative blood and endothelial populations by isolating GFP ^{+} cells (four somite, 4SG) and Flkl ^{+} GFP ^{−} cells (4SFG ^{−}), respectively. Cells were sorted from multiple embryos at each time point, with 3934 cells going to subsequent analysis. Experimental study quantified the expression of 33 transcriptional factors involved in endothelial and hematopoietic development, nine marker genes, including the embryonic globin *Hbb-bH1* and cell surface marker such as *Cdh5* and *Itga2b* (CD41), as well as four reference housekeeping genes (i.e. *Eif2b1, Mrpl19, Polr2a and UBC*). We select 40 genes from this dataset but exclude four house-keeper genes and two other genes (i.e. *HoxB2* and *HoxD8*) because the variations in the expression levels of these fix genes are relative small.

The dCt values in Supplementary Table 3 [37] represent the relative gene expression levels. These values will be employed as experimental data to reverse-engineer the regulatory network of the remaining 40 genes in this work. Note that the majority of these dCt values are negative. It would be difficult to use a dynamic model to describe the data with negative values. Therefore we conduct a shift computation by assuming that the minimal dCt value is zero in “Mathematical modelling” subsection.

### Pseudo-temporal ordering

The process of pseudo-temporal ordering can be divided two major steps. The first step uses Diffusion Maps for lower-dimensional visualization of high-dimensional gene expression data, then the Wanderlust algorithm is employed to order the individual cells to get the trends of different genes.

*i*,

*D*is the number of single-cell, ∥·∥ is the Euclidean norm, and

*ε*is an appropriately chosen kernel bandwidth parameter which determines the neighbourhood size of points. In addition, an

*N*×

*N*Markov matrix normalizing the rows of the kernel matrix is constructed as follows:

where *P*(*x*
_{
i
}) is an normalization constant given by \(P(x_{i})=\sum _{j} W_{\varepsilon }(x_{i},x_{j})\). *M*
_{
ij
} represents the connectivity between two data points *x*
_{
i
} and *x*
_{
j
}. It is also a measure of similarity between data points within a certain neighbourhood. Finally we compute eigenvalues and eigenvectors of this Markov matrix, and choose the largest *d* eigenvalues. The corresponding *d* eigenvectors are the output as the lower dimensional dataset *Y*
_{
i
},*i*=1,…*d*.

Using the generated low dimensional dataset, we use the Wanderlust algorithm to get a one-dimensional developmental trajectory. There are several assumptions about the application Wanderlust to sort gene expression data from single cells. Firstly, the data sample includes cells of entire developmental process. In addition, the developmental trajectory is linear and non-branching, it means that the developmental processes is only one-dimensional. Furthermore, the changes of gene expression values is gradual during the developmental process, and thus the transitions between different stages are gradual. Based on these assumptions, we can infer the ordering of single cells and identify different stages in the cell development by using the Wanderlust algorithm.

The Wanderlust algorithm also consists of two major stages, namely an initiation step and an iterative step for trajectory detection. In the first stage, we select a set of cells as landmarks uniformly at random. Each cell will have landmarks nearby. Then we construct a *k*-nearest-neighbours graph that every cell connect to *k* cells that are most similar to it, then we randomly pick *l* neighbours out of the *k*-nearest-neighbours for each cell and generate a *l*-out-of- *k*-nearest-neighbours graph (l-k-NNG). Then the second stage begins for the trajectory detection. One early cell point *s* should be selected first in this algorithm, which serves as the starting point of psuedo-trajectory. The point *s* can be determined by the Diffusion Maps method in the first. For every single cell, the initial trajectory score can be calculated as the distance from the starting-point cell *s* to that cell.

*t*with early cell

*s*and landmark cell

*l*, if

*d*(

*s,t*)<

*d*(

*s,l*), then

*t*precedes

*l*, otherwise

*t*follows

*l*in the pseudo-trajectory. For each landmark cell, we define a weight as

*t*is defined as

where *n*
_{
l
} is the number of landmark cells and the summation is over all landmarks *l*. The scores also include the beginning cell and landmarks. Then we use trajectory score as a new orientation trajectory and repeat the orientation step until landmark positions converge.

### Data smooth

The pseudo-trajectory gene expression data determined by the Wanderlust algorithm have a large variations in the gene expression levels. Thus we use the Gaussian processes regression method for smoothing the noisy data. The Gaussian processes regression is a generative non-parametric approach for modelling probability distributions over functions. It begins with a prior distribution and updates this distribution when data points are observed, and finally produces the posterior distribution over functions [38].

**y**can be regarded as samples of random variables with the underlying distribution function

*f*(

**t**), which is described by a Gaussian noise model:

*y*

^{∗}at a point

*t*

^{∗}based on the above model. The joint distribution of

**y**and

*y*

^{∗}is:

*K*is a kernel trick to connect two observations. One popular choice for the kernel is the squared exponential covariance function, defined by

*σ*

^{2}is a signal variance parameter and

*l*is a length scale parameter. If

*t*≈

*t*

^{′}, it means that

*f*(

*t*) is highly correlated with

*f*(

*t*

^{′}). However, if

*t*is distant from

*t*

^{′},

*K*(

*t,t*

^{′})≈0, the two points is largely uncorrelated with each other. Finally the posterior distribution is given by

*y*

^{∗}|

**y**∼

*N*(μ,Σ) with:

### Networks construction

*N*genes separately by using the regression model. In this regression model, the expression level of a particular gene is described by the regulatory function that is determined by the expression of the other

*N*−1 genes, given by

where **x**
^{(−j)}=(*x*
^{(1)},…,*x*
^{(j−1)},*x*
^{(j+1)},…,*x*
^{(N)})^{
T
}, *ε* is a random noise with zero mean, and function *f*
_{
j
} is designed to search for genes that regulate the expression of gene *x*
^{(j)} using a random forest method. The function only exploits the expression in **x**
^{−j
} of the genes that are direct regulators of gene *j*, i.e. genes that are directly connected to gene *j* in the targeted network. For each gene, a learning sample is generated with the expression levels of that gene as the output by using and expression levels of all other genes as input. A function is learned from the data and a local ranking of all genes except *j* is computed. The *N* local rankings are then aggregated to get a global ranking of all regulatory links.

*f*

_{ j }is unknown but they are expected to involve the expression of several genes (combinatorial regulation) and to be non-linear. Each subproblem, defined by a learning sample, is a supervised (non-parametric) regression problem. Using square error loss, each problem amounts at finding a function that minimizes the following error:

Note that, depending of the interpretation of the weight, the aggregation to get a global ranking of regulatory links is not trivial. It requires to normalize each expression vector appropriately in the context of this tree-based method.

### Mathematical modelling

*i*-th gene is denoted as

*x*

_{ i }(

*t*) at time

*t*. The evolution of gene expression levels is described by the following dynamic model using differential equations, given by

*c*

_{ i }and

*k*

_{ i }are the basal and maximal synthesis rate of gene

*i*in gene expression, respectively,

*d*

_{ i }is the decay rate of transcript. The key point is the selection of the regulatory function, which should include both positive and negative regulations appropriately. Based on the Shea-Ackers’ formula [41], this work uses the following function

Coefficients *a*
_{
ij
} represents regulation from gene *j* to the expression of gene *i*. This regulation may be positive (*a*
_{
ij
}>0) or negative (*a*
_{
ij
}=0) if the corresponding coefficient (*b*
_{
ij
}>0). For example, if *a*
_{
ii
}>0, a larger value in the expression level of gene *j* will promote the expression level of gene *i*. However, if *a*
_{
ij
}=0, a higher expression level of gene *j* will only increase the value of denominator of the regulatory function and thus decrease its value. This model assumes that, if *b*
_{
ij
}=0, then the coefficient *a*
_{
ij
} must be zero. In addition, it may be no regulatory relationship from gene *j* to gene *i* if (*a*
_{
ij
}=*b*
_{
ij
}=0). Since there is no time scale for experiment, is is assumed that each cell in the pseudo-trajectory is a unit time. Thus the time period of model is [ 0,3933] since there are 3934 single cells.

*x*

_{ i }(

*t*)≥0. However, the majority of the experimental data are negative. To address this issue, we conduct a linear transformation in order to change the negative values of dCt to positive values, denoted as

*dCt*

_{1}. For each gene, we assume the minimal value of the dCt values is zero and the shift computation is

It is clear that the transformed value *dCt*
_{1} is always non-negative.

*dCt*

_{1}may be quite large, we use the relative absolute error the measure the difference between simulations and experimental data, given by

*x*

_{ ij }are the simulated and experimentally measured gene expression levels at time point

*t*

_{ j }for gene

*i*, respectively. Due to the large number of single cells, which leads to a long range of time period for numerical simulation of model (7), the proposed model is stiff and simulation frequently breaks down when the search space is not very small. Thus instead of obtaining simulation of the whole time interval, we consider the numerical solution over

*k*pseudo-time points and calculate the transfer probability

*k*=100 and assume that

*f*

_{0}[(

*t*

_{0},

*x*

_{0})|

*θ*]=1 and the transitional probability is calculated by using the absolute error kernel, given by

*E*

_{ i }is the simulation error (9) using (

*t*

_{ k(i−1)+1},

*x*

_{ k(i−1)+1}) as the initial condition.

Since different sets of estimated model parameters may generate simulations with similar simulation error, we use the robustness property of the mathematical model as an additional criterion to select the estimated model parameters [44, 45]. The detailed computing process of robustness analysis can be found in [30]. For each module of gene regulation, we use the ABC algorithm to generate a number of sets of model parameters, and then select the top 5 sets that have the minimal estimation error for robustness analysis. In this way, we are able to exclude the influence of simulation error on the robustness property of the model.

## Results and discussion

### Diffusion map visualization

### Wanderlust ordering

*s*in the previous section, the Wanderlust algorithm is then employed to obtain the pseudo-temporal trajectory for the expression dynamics of genes. In this program, the widely used Euclidean measure is used to calculate the distance between different cells. Figure 2 provides the determined pseudo-temporal trajectory of four genes based on the raw dCt data in [31]. Based on the generated pseudo-temporal trajectory, we find that the expression levels of the four housekeeping genes are barely changed. Similar observations can also be applied to the expression levels of genes

*HoxB2*and

*HoxD8*. Therefore we will not consider these six genes in the network development.

The pseudo-temporal trajectory has large variations in the expression levels of every gene. Thus it is very difficult to use differential equation model to realize such data with large variation. To address issue, the Gauss process regression method is used to remove the variations in the data and produce more smooth trajectory. Figure 2 also provides the pseudo-temporal trajectory after the smooth process for the four genes. Compared with the raw dCt data with large variations, the smoothed data for the same gene have much smaller fluctuations in the expression levels.

One characteristics of the expression levels is that all genes (excluding the six genes) have both high and low expression levels. Genetic switching exists in the expression levels. Based on the time interval with high or low expression levels, the processed data can be classified into a number of patterns. For example, gene *Cdh1* has high expression levels in the pseudo-time interval [ 0,500] and low expression levels in the following time interval. However, genes *Gfi1b* and *Tal1* have low expression level in the pseudo-time interval [ 0,2500] and [ 0,800], respectively, but the expression level is switched to high levels in the following time intervals. A few of genes, such as gene *Kdr* in Fig. 2, have two genetic switchings in the pseudo-temporal trajectory. Since our proposed technique has been used to realize the similar observations for genetic switching [45], this modelling approach will be used in this work to realized the pseudo-temporal trajectories.

### Networks construction

*GATA1, Gfi1b, Hhex, Ikaros, Myb, Nfe2, Notch1, Sox7*and

*Sox17*. The regulation relationship between these nine genes in these two networks are consistent. However, the regulation between these nine genes in our graphic model in Fig. 3 is not a fully connected network, thus the regulation between the gene pair (it Sox7, Sox17), which exists in the network in [37], was added to Fig. 4 in order to form a complete network. The developed network model is presented in Fig. 4.

### Mathematical model

*GATA1, Gfi1b, Ikaros, Myb, Nfe2*, whose expression are activated at the pseudo-time point

*t*=∼2300 and their expression activities are promoted a high level with different speeds for different genes. However, the expression of the remaining four genes, namely

*Hhex, Notch1, Sox7 and Sox17*, is first inhibited and their expression levels go down to a low level at he pseudo-time point

*t*=∼700; but their expression is activated at

*t*=∼2500 and the expressions return to the high levels again. The observed gene expression changes are consistent with other experimental observations showing that genes

*GATA1, Gfi1b*and

*Ikaros*do have substantial changes of the expression levels over time [45]. The genetic switching in the expression levels of these genes is important to maintain the functional activities of blood stem cells. Using the proposed modelling method in [45], it is assumed that some key model parameters are variables of time rather than a constant. For the five genes in the first group, we use the following synthesis rate for gene expression, given by

*k*

_{ i0}is the basal synthesis rate of gene

*i*, and

*A*

_{ i }>1. Regarding the four genes in the second group, it is assumed that both synthesis rate and degradation rate are variables of time, since we need to realize the first switching from high expression level to low expression level,

*t*=50 and

*t*=200. Numerical results are consistent with those showing Fig. 5. In addition, the ABC algorithm is used to infer the unknown model parameters using the data shown in Fig. 2. Figure 5 suggests that the proposed model is able to match experimental data very well. Certainly the simulation error is dependent on the length of subinterval. The simulation error is larger if the length of subinterval is larger.

### Inference of network with more regulations

The proposed network in Fig. 4 includes nine one-way or mutual regulations. It is fully consistent with the network predicted in [37]. To make predictions about the potential regulations among these nine genes, we extend the network by including more regulations. We apply the GENIE3 algorithm to the raw dCt data of these nine genes only. According to the calculated weight of the target edges, we select the highest weight of 27 one-way regulations. Since there are a few two-way regulations in the selected 27 regulation edges, the generated network includes 17 un-directional regulations (see Additional file 1). Compared with the network in Fig. 4, the number of potential regulation edges has been doubled. The structure of the extended network is shown in Additional file 1: Figure S1 in Supplementary Information. Note that the regulations in Additional file 1: Figure S1 do not include all the regulations in Fig. 4. The added regulation (*Sox 7, Sox 17*) in Fig. 4 does not appear in Additional file 1: Figure S1. However, the other eight regulations in Fig. 4 also appears in Additional file 1: Figure S1.

### Inference of network with auto-regulation

In the graphic model generated by the GENIE3 algorithm, the auto-regulation, namely the positive or negative regulation of a gene to the expression of itself, is not considered. To find the potential auto-regulations in these genes, we test the network by adding positive auto-regulation to a particular gene. For the *i*-th gene, we set *b*
_{
ii
}>0; and the value of *a*
_{
ii
} is *a*
_{
ii
}>0 for positive auto-regulation. We first test the gene network shown in Fig. 4 with positive auto-regulation for only one gene. Numerical results in Fig. 7
a suggest that no module with positive auto-regulation (model index 2 ∼ 10) has smaller simulation error than that of the network without any auto-regulation (index 1 in Fig. 7
a). We have also tested the network in which each gene has positive auto-regulation. Results in (index 11 in Fig. 7
a) shows that simulation error of this module is larger than that of any other module in Fig. 7
a. Thus it is unlikely that all these genes have positive auto-regulation.

We have also conducted the positive auto-regulation test for the model based on the network in Additional file 1: Figure S1. An interesting result is that, compared with the network without auto-regulation (index 1 in Fig. 7 b), the model with positive auto-regulation to any one gene (model index 2 ∼ 10) has better accuracy than the network model without auto-regulation. Note that, compared with the network model without auto-regulation, the network model with positive auto-regulation to one gene has only two additional model parameters. The small change in the number of unknown parameters would not bring much flexibility to match experimental data. Thus this result suggests that there may be positive auto-regulation for some genes in this network. However, when we add auto-regulation to all genes (index 11 in Fig. 7 b), similar to the result in (index 11 in Fig. 7 a), the simulation error of this module is larger than any other module. This gives further evidence that it is unlikely that all these genes have positive auto-regulation.

### Robustness analysis

*Gata1, Gfi1b, Hhex*have better robustness property than the model without any perturbation. However, for the gene network with 17 regulations in Additional file 1: Figure S1, the networks with auto-regulation for genes

*Sox7, Sox17*have better robustness property than the network without positive auto-regulation. These simulation results do not provide strong evidence to support the positive auto-regulation in the nine genes in Fig. 4.

## Conclusion

In this work we have designed an integrated approach to reverse-engineer gene networks for regulating early blood development based on singel-cell experimental observations. The diffusion map method is firstly used to obtain the visualization of gene expression data derived from 3934 stem blood cells. The wanderlust algorithm is then employed to develop the pseudo-trajectory for the activities of a number of genes. Since the gene expression levels in the developed pseudo-trajectory show large fluctuations, we then use Gaussian process regression method to smooth the gene express data in order to obtain pseudo-trajectory with much less fluctuations. The proposed integrated framework consist of both the GENIE3 algorithm to reconstruct the regulatory network and a mathematical model using differential equations to describe the dynamics of gene expression. The developed approach is applied to study the network regulating early blood cell development, and we designed a graphic model for a regulatory network with forty genes and a differential equations model for a network of nine genes. The research results in this work shows that the developed approach is an efficient and effective method to reverse-engineer gene networks using single-cell experimental observations.

In this work we use simulation error as the key criterion to select the model parameters and infer the regulation between genes. However, because of the complex searching space of model parameters and noise in experimental data, it may be difficult to judge which model is really better than others if the difference between simulation errors is small. In fact, simulation errors of various models for the network of nine genes are quite close to each other. Therefore, in addition to using simulation error as the unique criterion to select a model, other measurements, such as AIC value, parameter identifiability and robustness property of a network, are also needed as important criteria. All of these issues are potential topics for future research.

## Declarations

### Acknowledgements

This research work is supported by the National Natural Science Foundation of China (11571368), and Australian Research Council Discovery Project (DP120104460).

### Funding

T.T. is supported by the Australian Research Council (ARC) Discovery Projects (DP120104460) which supports the publication cost of this paper.

### Availability of data and materials

Not applicable.

### About this supplement

This article has been published as part of *BMC Medical Genomics* Volume 10 Supplement 5, 2017: Selected articles from the IEEE BIBM International Conference on Bioinformatics & Biomedicine (BIBM) 2016: medical genomics. The full contents of the supplement are available online at https://bmcmedgenomics.biomedcentral.com/articles/supplements/volume-10-supplement-5.

### Software

Not applicable.

### Authors’ contributions

TT conceived the research. JW and TT conducted the research. JW, XH, XZ and TT interpreted the results and wrote the paper. All authors edited and approved the final version of the manuscript.

### Ethics approval and consent to participate

Not applicable.

### Consent to publication

Not applicable.

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## Authors’ Affiliations

## References

- Taniguchi Y, Choi PJ, Li GW, Chen H, Babu M, Hearn J, Emili A, Xie XS. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science. 2010; 329:533–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Dumitriu A, Golji J, Labadorf AT, Gao B, Beach TG, Myers RH, Longo KA, Latourelle JC. Integrative analyses of proteomics and RNA transcriptomics implicate mitochondrial processes, protein folding pathways and GWAS loci in Parkinson disease. BMC Med Genomics. 2016; 9:5.View ArticlePubMedPubMed CentralGoogle Scholar
- Munsky B, Neuert G, van Oudenaarden A. Using gene expression noise to understand gene regulation. Science. 2012; 336:183–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Espinosa Angarica V, Del Sol A. Modeling heterogeneity in the pluripotent state: A promising strategy for improving the efficiency and fidelity of stem cell differentiation. Bioessays. 2016; 38:758–68.View ArticlePubMedPubMed CentralGoogle Scholar
- Waldron D. Gene expression: Environmental noise control. Nat Rev Genet. 2015; 16:624–5.View ArticlePubMedGoogle Scholar
- Junker JP, van Oudenaarden A. Every cell is special: genome-wide studies add a new dimension to single-cell biology. Cell. 2014; 157:8–11.View ArticlePubMedGoogle Scholar
- Deng Q, Ramskold D, Reinius B, Sandberg R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science. 2014; 343:193–6.View ArticlePubMedGoogle Scholar
- Wei W, Shin YS, Ma C, Wang J, Elitas M, Fan R, Heath JR. Microchip platforms for multiplex single-cell functional proteomics with applications to immunology and cancer research. Genome Med. 2013; 5:75.View ArticlePubMedPubMed CentralGoogle Scholar
- Buganim Y, Faddah D, Cheng A, Itskovich E, Markoulaki S, Ganz K, Klemm S, Vanoudenaarden A, Jaenisch R. Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell. 2012; 150(6):1209–22.View ArticlePubMedPubMed CentralGoogle Scholar
- Stegle O, Teichmann SA, Marioni JC. Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet. 2015; 16:133–45.View ArticlePubMedGoogle Scholar
- Poirion OB, Zhu X, Ching T, Garmire L. Single-Cell Transcriptomics Bioinformatics and Computational Challenges. Front Genet. 2016; 7:163.View ArticlePubMedPubMed CentralGoogle Scholar
- Woodhouse S, Moignard V, Göttgens B, Fisher J. Processing, visualising and reconstructing network models from single-cell data. Immunol Cell Biol. 2015; 94:256–65.View ArticlePubMedGoogle Scholar
- Marr C, Zhou JX, Huang S. Single-cell gene expression profiling and cell state dynamics: collecting data, correlating data points and connecting the dots. Curr Opin Biotechnol. 2016; 39:207–14.View ArticlePubMedPubMed CentralGoogle Scholar
- Moignard V, Macaulay IC, Swiers G, Buettner F, Schütte J, Calero-Nieto FJ, et al. Characterization of transcriptional networks in blood stem and progenitor cells using high-throughput single-cell gene expression analysis. Nat Cell Biol. 2013; 15:363–72.View ArticlePubMedPubMed CentralGoogle Scholar
- Bendall SC, Davis KL, Amir el-AD, Tadmor MD, Simonds EF, Chen TJ, Shenfeld DK, Nolan GP, Pe’er D. Single-cell trajectory detection uncovers progression and regulatory coordination in human b cell development. Cell. 2014; 157(3):714–25.View ArticlePubMedPubMed CentralGoogle Scholar
- Penfold CA, Wild DL. How to infer gene networks from expression profiles, revisited. Interface Focus. 2011; 1:857–70.View ArticlePubMedPubMed CentralGoogle Scholar
- Bar-Joseph Z, Gitter A, Simon I. Studying and modelling dynamic biological processes using time-series gene expression data. Nat Rev Genet. 2012; 13:552–64.View ArticlePubMedGoogle Scholar
- Maetschke SR, Madhamshettiwar PB, Davis MJ, Ragan MA. Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Brief Bioinform. 2012; 15:195–211.View ArticleGoogle Scholar
- Wang J, Cheung LW, Delabie J. New probabilistic graphical models for genetic regulatory networks studies. J Biomed Inform. 2005; 38:443–55.View ArticlePubMedGoogle Scholar
- Sachs K, Perez O, Pe’Er D, Lauffenburger DA, Nolan GP. Causal protein-signaling networks derived from multiparameter single-cell data. Science. 2005; 308:523–9.View ArticlePubMedGoogle Scholar
- Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci. 2000; 97:262–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Kim Y, Han S, Choi S, Hwang D. Inference of dynamic networks using time-course data. Brief Bioinformatics. 2014; 15:212–28.View ArticlePubMedGoogle Scholar
- Chasman D, Siahpirani AF, Roy S. Network-based approaches for analysis of complex biological systems. Curr Opin Biotechnol. 2016; 39:157–66.View ArticlePubMedGoogle Scholar
- Wang J, Tian T. Quantitative model for inferring dynamic regulation of the tumour suppressor gene p53. BMC Bioinformatics. 2010; 11(1):36.View ArticlePubMedPubMed CentralGoogle Scholar
- Ocone A, Sanguinetti G. Reconstructing transcription factor activities in hierarchical transcription network motifs. Bioinformatics. 2011; 27:2873–9.View ArticlePubMedGoogle Scholar
- Maraziotis IA, Dragomir A, Thanos D. Gene regulatory networks modelling using a dynamic evolutionary hybrid. BMC Bioinformatics. 2010; 11(1):1.View ArticleGoogle Scholar
- Petralia F, Wang P, Yang J, Tu Z. Integrative random forest for gene regulatory network inference. Bioinformatics. 2015; 31(12):i197–i205.View ArticlePubMedPubMed CentralGoogle Scholar
- Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Stolovitzky G. Wisdom of crowds for robust gene network inference. Nat Methods. 2012; 9:796–804.View ArticlePubMedPubMed CentralGoogle Scholar
- Kim D, Kang M, Biswas A, Liu C, Gao J. Integrative approach for inference of gene regulatory networks using lasso-based random featuring and application to psychiatric disorders. BMC Med Genomics. 2016; 9(Suppl 2):50.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang J, Wu Q, Hu X, Tian T. An integrated approach to infer dynamic protein-gene interactions – a case study of the human p53 protein. Methods. 2016; 110:3–13.View ArticlePubMedGoogle Scholar
- Ocone A, Haghverdi L, Mueller NS, Theis FJ. Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data. Bioinformatics. 2015; 31:i89–i96.View ArticlePubMedPubMed CentralGoogle Scholar
- Krishnaswamy S, Spitzer MH, Mingueneau M, Bendall SC, Litvin O, Stone E, Pe’er D, Nolan GP. Systems biology. Conditional density-based analysis of T cell signaling in single-cell data. Science. 2014; 346:1250689.View ArticlePubMedPubMed CentralGoogle Scholar
- Ji Z, Ji H. Tscan: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 2016; 44:e177.View ArticleGoogle Scholar
- Pina C, Teles J, Fugazza C, May G, Wang D, Guo Y, Soneji S, Brown J, Edén P, Ohlsson M. Single-cell network analysis identifies ddit3 as a nodal lineage regulator in hematopoiesis. Cell Reports. 2015; 24:1503–10.View ArticleGoogle Scholar
- Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikk elsen TS, Rinn JL. Pseudo-temporal ordering of individual cells reveals dynamics and regulators of cell fate decisions. Nat Biotechnol. 2014; 32:381.View ArticlePubMedPubMed CentralGoogle Scholar
- Haghverdi L, Buettner F, Theis FJ. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics. 2015; 31(18):2989–98.View ArticlePubMedGoogle Scholar
- Moignard V, Woodhouse S, Haghverdi L, Lilly AJ, Tanaka Y, Wilkinson AC, Buettner F, Macaulay IC, Jawaid W, Diamanti E, et al.Decoding the regulatory network of early blood development from single-cell gene expression measurements. Nat biotechnol. 2015; 33:269–76.View ArticlePubMedPubMed CentralGoogle Scholar
- Williams CKI, Rasmussen CE. Gaussian processes for machine learning. Cambridge: MIT Press; 2006, pp. 7–32.Google Scholar
- Irrthum A, Wehenkel L, Geurts P, et al.Inferring regulatory networks from expression data using tree-based methods. PloS ONE. 2010; 5(9):e12776.View ArticlePubMedPubMed CentralGoogle Scholar
- Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P. Inferring regulatory networks from expression data using tree-based methods. Plos ONE. 2009; 5(9):e12776.View ArticleGoogle Scholar
- Shea MA, Ackers GK. The OR control system of bacteriophage lambda. A physical-chemical model for gene regulation. J Mol Biol. 1985; 181(2):211–30.View ArticlePubMedGoogle Scholar
- Turner BM, Van Zandt T. A tutorial on approximate Bayesian computation. J Math Psych. 2012; 56:69–85.View ArticleGoogle Scholar
- Wu Q, Smith-Miles K, Tian T. Approximate bayesian computation schemes for parameter inference of discrete stochastic models using simulated likelihood density. BMC Bioinfomatics. 2014; 15(S12):S3.View ArticleGoogle Scholar
- Kitano H. Towards a theory of biological robustness. Mol Syst Biol. 2007; 3:137.View ArticlePubMedPubMed CentralGoogle Scholar
- Tian T, Smith-Miles K. Mathematical modeling of gata-switching for regulating the differentiation of hematopoietic stem cell. BMC Systems Biol. 2014; 8(S1):S8.View ArticleGoogle Scholar