In this section, we will introduce the steps of PMN, including network construction and strategies of evaluating nodes.
The PCC between two genes X and Y is defined as follows:
$$ {\mathrm{PCC}}_{XY}=\frac{\sum_i^n\left({X}_i-\overline{X}\right)\left({Y}_i-\overline{Y}\right)}{\sqrt{\sum_{i=1}^n{\left({X}_i-\overline{X}\right)}^2{\sum}_i^n{\left({Y}_i-\overline{Y}\right)}^2}}, $$
(1)
where Xi and Yi represent the observations of the gene X and Y on the i ‐ th sample, respectively. n is the number of samples, \( \overline{X} \) and \( \overline{Y} \) are the mean observations on the gene X and Y in all samples.
The entropy of discrete random variables is defined as follows:
$$ H(X)=-\sum \limits_xp(x)\log p(x), $$
(2)
where p(x) = Pr {X = x} is the probability density function, the mutual information between two genes X and Y can be described as:
$$ MI\left(X,Y\right)=H(X)+H(Y)-H\left(X,Y\right), $$
(3)
where H(X, Y) = − ∑x∑yp(x, y) log p(x, y) is joint entropy. Many studies have shown that MI is better for discrete values [8]. But gene expression data contain all continuous values. In order to make better use of the MI measure, we learn from the treatment in Minet [8] method to discretize continuous values.
The relationship between genes is not unique and the existing single measure is difficult to accurately describe the relationship between genes. We consider the different relationships between genes, so that the stronger relationships can be retained in the networks. The definition of adjacency matrix A = [aij] is based on the construction of two measures to retain the linear or non-linear relationships of genes:
$$ {a}_{ij}=\left\{\begin{array}{l}1\kern1.8em if\kern0.4em PCC>{t}_1\kern0.5em or\kern0.5em MI>{t}_2,\\ {}0\kern1.6em else,\end{array}\right. $$
(4)
where t1 and t2 are the filter thresholds of the matrices about PCC and MI. Theoretically, there is no strict requirements to formulate t1 and t2 .
For node mining, on the one hand, degree centrality of node Dx is taken into consideration: degree centrality is the most basic topological property that describes a single node in the network. The degree centrality of node v refers to the number of edges connected directly to v in the network, implying the local characteristics of a node. On the other hand, the betweenness centrality of a node is also excavable for mining nodes. The betweenness of a node is a measure of the sum of its proportions appearing in the shortest path between other nodes. The betweenness of the node is defined as follows:
$$ {B}_x={\sum}_{i\ne x\ne j\in X}\frac{\sigma_{ixj}}{\sigma_{ij}}, $$
(5)
where σij represents the counts of shortest path between node i and j, but σixj is the number of shortest path via node x. The betweenness indicates the role a node plays in connecting other nodes to each other. The higher the betweenness, the more important the node is in maintaining tight connectivity for networks, which reflecting the global characteristics of a node.
In this paper, we define a weighted value Wx for each gene node in the network by combining above two aspects to select the abnormally expressed genes:
$$ {W}_x={D}_x\times {B}_x. $$
(6)
Taking the type of values into account, multiplication can better reflect the effect of two characteristics. The measured values of the degree are greater than 0, and the values of the betweenness are less than 1. Considering the type of values and conducting extensive experimental tests, multiplication can better reflect the effect of two characteristics. It is also a combination of local and global features for nodes. Detailed schematic diagram of the overall process flow shown in Fig. 1, the following is a detailed process and details.