 Research
 Open Access
 Published:
iOPTICSGSO for identifying protein complexes from dynamic PPI networks
BMC Medical Genomics volume 10, Article number: 80 (2017)
Abstract
Background
Identifying protein complexes plays an important role for understanding cellular organization and functional mechanisms. As plenty of evidences have indicated that dense subnetworks in dynamic proteinprotein interaction network (DPIN) usually correspond to protein complexes, identifying protein complexes is formulated as densitybased clustering.
Methods
In this paper, a new approach named iOPTICSGSO is developed, which is the improved Ordering Points to Identify the Clustering Structure (OPTICS) algorithm with Glowworm swarm optimization algorithm (GSO) to optimize the parameters in OPTICS when finding dense subnetworks. In our iOPTICSGSO, the concept of core node is redefined and the Euclidean distance in OPTICS is replaced with the improved similarity between the nodes in the PPI network according to their interaction strength, and dense subnetworks are considered as protein complexes.
Results
The experiment results have shown that our iOPTICSGSO outperforms of algorithms such as DBSCAN, CFinder, MCODE, CMC, COACH, ClusterOne MCL and OPTICS_PSO in terms of fmeasure and pvalue on four DPINs, which are from the DIP, Krogan, MIPS and Gavin datasets. In addition, our predicted protein complexes have a small pvalue and thus are highly likely to be true protein complexes.
Conclusion
The proposed iOPTICSGSO gains optimal clustering results by adopting GSO algorithm to optimize the parameters in OPTICS, and the result on four datasets shows superior performance. What’s more, the results provided clues for biologists to verify and find new protein complexes.
Background
Proteins are the indispensable components in various types of cells and tissues, and the executors of the biological functions. At the same time, each protein in the cell does not exist in isolation, and the occurrence of every life process must involve more than one protein [1]. Protein complexes are not only the basis of normal biological processes, also play important role in the pathological processes [2]. Therefore, identifying protein complexes play an important role in understanding the cellular organizations and functional mechanisms [3]. As a variety of protein interaction database have produced, it is possible to identify protein complexes from proteinprotein interaction (PPI) networks. Living organisms are always changing, so are PPIs in living cells [4]. In addition, the interactions between proteins are changing over time not only with the presence and degradation of protein, but also with the environment. In [5], the authors incorporated the “time” factor for proteins in the form of cellcycle phases into the analysis of complexes and studied the dynamic phenomena of complexes assembly and disassembly across various cell cycles. To express the dynamics, many dynamic data, including gene expression profiles [6], have been used to construct dynamic PPI networks (DPINs).
The discovery of protein complexes is equivalent to find subsets of functionrelated proteins from a data set. Clustering is an effective method, which can find subsets that have some common attributes from the database [7]. Therefore, the development of improved clustering algorithms has received a lot of attention in the last few years. The clustering algorithm based on density is an important type of clustering analysis method and one of its main advantages is able to detect any shape of cluster while being not sensitive to noise [8]. The DensityBased Spatial Clustering of Applications with Noise (DBSCAN) [9], which was proposed by Ester et al., is a clustering algorithm based on density. The DBSCAN algorithm is applicable to any shape and size of the dataset. It is noisetolerant and independent of ordering of data objects. However, it has two initial parameters, the field radius and the minimum point within the field radius. The DBSCAN algorithm requires the user to manually input these two parameters while the clustering results are very sensitive to the values of two parameters. The DBSCAN algorithm also needs initialization parameters. In order to overcome those shortcomings of DBSCAN algorithm, Ankerst et al. [10] proposed a new algorithm called Ordering Points to Identify the Clustering Structure (OPTICS). Its basic idea is similar to DBSCAN when identifying clusters, and both searching for high density regions.
In real life, many optimization problems require not only to calculate the extremum, but also obtain their optimal values. This kind of problem is a serious challenge to the traditional algorithm. In this case, a growing number of swarm intelligence algorithms are successively put forward, such as Genetic Algorithm (GA) [11], Particle Swarm Optimization (PSO) [12]. Glowworm swarm optimization algorithm (GSO) [13], proposed by Krishnan and Ghose in 2005, is a bionic swarm intelligence algorithm. GSO simulates the glowworm group in motion guided by fluorescence to attract other glowworms or foraging around, the greater the value of fluorescein, the bright the glowworm is, and the more attractive it is.
OPTICS algorithm does not produce cluster for a data set explicitly; but instead creates an augmented ordering queue representing its densitybased clustering structure. Then we need to deal with clusterordering and get clustering results. For each network clustering, different parameters settings produce different results. In this study, we put forward the algorithm named iOPTICSGSO which is the improved OPTICS algorithm by using GSO to optimize the parameters in OPTICS. In order to investigate its performance, iOPTICSGSO with other seven computing methods including DBSCAN [9], CFinder [14], MCODE [15], CMC [16], COACH [17], ClusterOne [18], MCL [19] and OPTICS_PSO [20]. At the same time, we also use the pvalue for function enrichment analysis. The experiment results illustrated that iOPTICSGSO achieved better performance compared with other competing algorithms.
The outline of this paper is as follows. In Section 2, after reviewing the GSO algorithm, basic OPTICS and our iOPTICSGSO are presented. In Section 3, experimental results and analysis are described and discussed, and the conclusions are in Section 4.
Methods
GSO algorithm
In the GSO algorithm, glowworms with higher fluorescein are more attractive to other glowworms, and thus a group of glowworms move towards the glowworms with high fluorescein. Each glowworm in its dynamic decision domain radius chooses a glowworm whose fluorescein value is higher than its own fluorescein value to move towards and updates its dynamic decisionmaking domain. Then some glowworms are selected according to probability to update the position from dynamic decisionmaking domain. Finally, the decision domain updated. GSO algorithm has two important phases as follows.
The phase for updating the fluorescein.
The fluorescein value of each glowworm is related to the value of previous generation of fluorescein and the current fitness function. Let x_{i} (t) represent the location of the ith glowworm in the tth generation, J(x_{i}(t)) represent the fitness function of the ith glowworm in the tth generation. The fluorescein value l _{i}(t) of the ith glowworm in the tth generation is calculated as follows:
where ρ and γ are two parameters with the values between 0 and 1.
The phase of updating the position.
Each new position of the glowworms is a small movement from the original position, which is calculated as follows:
where S is the update step length of the glowworms, S_{0} is the initial step length, and t_{max} is the largest number of iterations. Here, we adopt the method of linear regressive instead of fixed step length [21], in order to improve optimization ability of the algorithm when updating the population.
In the GSO, each glowworm is looking for the neighborhood within its field of vision, and then moves to a brighter glowworm. Each time the moving direction depends on the neighborhood selection. In addition, the glowworm decision domain radius size is influenced by the number of glowworms in different neighborhoods, when the number of glowworms is too small, glowworms will increase their decisions radius in order to find more glowworms; On the contrary, they will reduce their own decisionmaking radius. At the end, the GSO makes most of the glowworms gathered in a better position.
Optics
The key idea of densitybased clustering such as OPTICS is that for each object in a cluster the neighborhood within a given radius has to contain at least a minimum number of objects (MinPts), which is the cardinality of the neighborhood. The condition Card(N _{ ε }(q)) ≥ MinPts is called the “core object condition”. If this condition holds for an object p, then we call p a “core object”. Only from core objects, can other objects be directly densityreachable.
In PPI networks, the node degrees obey powerlaw distribution, we select all nodes as core nodes so that the node which degree is small can be considered. As a result, we redefined two definitions as follows.
Definition 1: (Distance_{ core } of node p).
Let p be a protein in a PPI network, Distance_{ MinPts } (p) be the MinPtsth maximum distance from node p to all the other nodes. Then, the coredistance of p is defined as follows:
Definition 2: (Distance_{reachability} of node p).
Let nodes p and o be two proteins in a PPI network, let N(o) be the set which contains neighbors of node o. Then, the Distance_{reachability} is defined as follows:
where d_{op} is the distance from node p to node o. As can be seen above, the reachability distance of a node cannot be smaller than the core distance of node o. Thus OPTICS creates an ordering queue of all nodes, and stores the core distance as well as a suitable reachability distance for each node.
The proposed iOPTICSGSO
In this section, we elaborate the proposed iOPTICSGSO how to identify protein complexes. The following four subsections describe the calculation of distance between proteins, clustering PPI networks, iOPTICSGSO algorithm and its time complexity analysis, respectively.

1.
Calculating the distance in a PPI network
In a PPI network, we use the similarity between two proteins to measure their distance. As we know, the fewer the number of same neighbors between two proteins is, the less the similarity of two proteins is, and the smaller the probability that they belong to the same protein complex is. On the contrary, the higher the similarity of the two proteins is, the more likely they belong to the same protein complex [22]. Therefore, the similarity is determined according to the number of same neighbors the two nodes share in the PPI network. Consider a PPI network PN, A is adjacency matrix of PN, and the binary vector X _{ i } = (A_{ i1 }, A _{ i2 }, …, A _{ in }) indicates the interactions between protein i and other proteins, then we calculate the number of common neighbor(CN) between proteins i and j by the equation: CN _{ ij } = N _{ i }∩N _{ j }. Here N _{ i } and N _{ j } expresses the neighbor that proteins i and j have, respectively. Therefore, if CN_{ij} ≠ 0, the similarity between proteins i and j is calculated as follows [23]:
Considering in the PPI network, the two nodes which have no common neighbor also have connection, and there have multiple protein complexes which only contains two proteins in standard complexes. we redefined the similarity S as follows:
The greater the similarity between two proteins, the smaller the distance between them is. Then the distance can be calculated as follows:
We use the D _{ ij } to replace the Euclidean distance in OPTICS for measuring the distance between two proteins in a PPI network.

2.
Clustering PPI network.
Fig. 1 shows a PPI network with distances between node o and other nodes. In this study, we set the MinPts to be 4, and then from Fig. 1, we select firstly the core to be node o. For obtaining the core distance of o, we calculate all distances between core o and its neighbors according to Eq. (8). From the definition, we get the value Distance_{reachability} (d, o) = 0.64. In the same manner, we obtain a sequence of values of all nodes.
We can now improve the algorithm to preserve the track of all the reachability distance values and use them to save the expensive operations identified above. We can obtain an augmented ordering queue from OPTICS, and convert the ordering queue into a reachabilityplot. Fig. 2 shows such a reachabilityplot and an example of cluster. Each sunken part in Fig. 2a can be viewed as a cluster. That is, the new cluster starts from a steep down region and end up with next steep down region. As a result, form the reachabilityplot, the algorithm can find all clusters.
For example, in Fig. 2b we can see a cluster starting at object #1 and ending at object #15. Note that object #1, which is the last object with a high reachability value, is part of the cluster, its high reachability indicates that it is far away from the previous cluster. It has to be close to object #2. However, because object #3 has a low reachability value, indicating that it is close to one of the objects #1 or #2. Because the next object that OPTICS chooses is in the clusterordering, it has to be close to #2 (if it were close to object #l it would have been assigned index 1 and not index 2). A similar argument holds for object #15, which is the last object with a low reachability value, and therefore is also a member of the cluster.

3.
iOPTICSGSO Algorithm.
Although the OPTICS algorithm can find all clusters, the dynamic PPI network has more than one subnetwork, and the size and topological structure of these subnetworks are quite different. For example, when we apply OPTICS to dynamic PPI network with 12 subnetworks, 12 reachabilityplots are obtained; and each reachabilityplot is different from others. The optimal parameters and the corresponding performance of each subnetwork are shown in Table 1. It is evident that each subnetwork has its own optimal parameters and the performances of the clustering result are different. It also can be seen that the OPTICS with global density parameters is not suitable for datasets with different densities.
It is well known that the GSO algorithm has less parameters, simple operation and good stability, etc. GSO algorithm simulates the characteristic of glowworms glow in nature, by comparing the size of the fluorescein value to achieve the purpose of communication, so as to realize the optimization of the problem. So we introduce the GSO algorithm to optimize the parameters of OPTICS, in order to obtain optimal results. Algorithm2 describes the details of iOPTICSGSO. After several circulations iterative process, a glowworm constantly updates its position and iteratively approaches to the best position. At last, the glowworm finds the best position.
The corresponding relationships between GSO and OPTICS are showed in Fig. 3. When we adopt the GSO algorithm to optimize the parameter ɛ in OPTICS, the position of glowworms in GSO also is related to the value of parameter ɛ. By updating its dynamic decision domain radius, a glowworm moving its position corresponds to searching for the optimal value of parameters ε. When fitness function achieves the maximum value in GSO after a number of positions are updated, OPTICS finds the best clustering result.
In Algorithm: iOPTICSGSO, firstly, the fluorescein values, the decision domain radius and the positions of glowworms are initialized. Secondly, GSO algorithm is used to optimize the parameter ɛ in OPTICS. In this part, one position of a glowworm is one parameter value. Then OPTICS is run by using this parameter value. For each value (position), a corresponding clustering result is obtained. Next the clustering performance is evaluated for each value (position). Next the fluorescein value is updated and the glowworms move accordingly. After iterations, the new positions of glowworms are found. The maximum fitnessvalue is selected as the optimal position.

4.
Time complexity analysis of iOPTICSGSO algorithm
The time complexity is used to estimate the efficiency of the iOPTICSGSO algorithm. Let maxiter be the maximal iterations of external loop in iOPTICSGSO algorithm, num be the number of proteins in subworks and PopSize be the number of glowworms. The time complexity is analyzed below:

The time complexity of OPTICS algorithm is O (num ^{2}).

The time complexity of computing the fitness of glowworms is O (PopSize * O (num ^{2}).

The time complexity of glowworms moving process is O (PopSize ^{2}).

The time complexity for updating the position O is (PopSize).
In summary, the time complexity of iOPTICSGSO is O (maxiter * (num ^{2} + PopSize * num ^{2}+ PopSize ^{2} + PopSize)). Finally, the time complexity of this algorithm is O (maxiter * PopSize * num ^{2}).
Results and discussion
Experimental datasets
In this study, we used four static PPI networks for yeast, including DIP [24], Krogan [25], MIPS [26] and Gavin [27] to evaluate our proposed iOPTICSGSO. The DIP data consists of 4995 proteins and 21,554 interactions, Krogan data consists of 2674 proteins and 7075 interactions, MIPS data consists of 4546 proteins and 12,319 interactions and Gavin data consists of 1430 proteins and 6531 interactions. For verifying protein complexes identified by our proposed method, the set of protein complexes derived from CYC2008 [28] is selected as the gold standard dataset in this study, which includes 408 protein complexes and covers 1492 proteins,
In study, we construct DPINs similar to Ref. [29] by integrating gene expression profiles. Gene expression data were available from GEO (Gene Expression Omnibus) [30] with access number GSE3431. The data contained 9336 genes at 36 time points in the 3 cell life cycles. DPINs are constructed from static PPI network and gene expression data, we use the threesigma principle to judge whether a gene is expressed in a particular timestamp. For example, we preset a threshold value, if the value of a protein is greater than the threshold at a certain timestamp t, this protein is judged to be an active protein at t timestamp. Each subnetwork is constituted by these active proteins and the interactions between them. Then these subnetworks together form the DPIN. As a result, we get four DPINs from DIP, Krogan, MIPS and Gavin, respectively. Table 2 shows different scales of different subnetworks from these four static PPI networks.
Performance evaluation
In order to evaluate the clustering results, we have adopted three kinds of commonly used statistical metrics: precision, recall and fmeasure [31]. Precision and recall measure the accuracy of the protein complexes identified by algorithm matching the known protein complexes in the standard dataset and the accuracy of the known protein complexes matching the identified protein complexes, respectively. fmeasure is used to evaluate the closeness between the known protein complexes and the identified protein complexes. Precision, recall and fmeasure are calculated as follows:
where X is the set of proteins in an identified protein complexes and F is the set of known complexes in the standard dataset. pc is the number of proteins in the identified protein complex and kc is number of proteins in the known protein complex. The overlapping score (OS) evaluates how many proteins in the true protein complexes can be recovered by the identified protein complexes [32, 33]. Usually we consider an identified protein complex matches the known protein complex when the OS is equal to or larger than 0.2 [5]. We also use the pvalue to evaluate the statistical and biological significance of the identified protein complexes [34]. In detail, given k proteins in a true protein complex C with a biological function shared by an identified proteins complex F from a total set V of proteins, the pvalue is defined as:
which is the probability that an identified protein complex is enriched by a true protein complex only by chance [35]. A low pvalue of an identified protein complex means the collective occurrence of these proteins belongs to the same complex not by chance, yet with a high statistical significance. That is to say, the lower the pvalue of a protein complex is, the stronger biological significance the protein complex possesses, while the protein complex with pvalue greater than 0.01 is considered to be insignificant. In the experiments, pvalue was calculated on biological process ontologies.
The effect of parameter
In iOPTICSGSO algorithm, there is one parameter to be preset, which is the value of MinPts. According to the topological properties of PPI networks, if the value of MinPts is too large, there would be no meaningful cluster that can be identified by the algorithm. For example, when we set MinPts to 10, there is no meaningful cluster that can be identified from the DPIN network. On the contrary, if the value of MinPts is too small, it will be too many proteins in the same cluster and the number of identified protein complexes will be few. In this study, the value of MinPts is set according to Fig. 4 for the four datasets. The xaxis represents the values of parameter which range from 2 to 8, and the yaxis represents the values of fmeasure. Each value of parameter corresponds to a value of fmeasure,a set of values form the line chart, as shown in Fig. 4. The blue line represents the result on DIP data, the orange line represents the result on Krogan data, the green line represents the result on MIPS data, and the yellow line represents the result on Gavin data.
In Fig. 4, the effect of different values of MinPts on fmeasure is not very big, and this also confirms that the reachabilityplot is rather insensitive to the input parameter of the method. We observe that the value of fmeasure increases initially as the value of MinPts increases and decreases after reaching the maximum. Then we chose the value of MinPts at which the fmeasure reaches the maximum in iOPTICSGSO. As a result, we find that the optimal values of MinPts are 3, 2, 2 and 4 for DIP, Krogan, MIPS and Gavin, respectively.
Clustering comparisons
In order to directly validate its performance, the iOPTICSGSO is compared with other seven competing algorithms, DBSCAN [9], CFinder [14], MCODE [15], CMC [16], COACH [17], ClusterOne [18] MCL [19] and OPTICS_PSO [20]. At the same time, the iOPTICSGSO is also compared with the basic OPTICS. All comparisons are on the DIP, Krogan, MIPS and Gavin datasets. Each algorithm uses its best parameter when comparing, and it was found that these algorithms can get best results under the default parameter setting. The performances of all clustering algorithms are reported in Table 3 which contains the category of each algorithm, the number of identified protein complexes, and the average size of protein complexes.
From Table 3, we can see that the numbers of clusters obtained by the proposed algorithm on four datasets are smaller than those compared methods. The reason of this result is that the number of interactions in most subnetworks is sparse, so the distance of these nodes calculated by Eq. (7) would be up to 1, and these nodes were regarded as a class, respectively. In the final phase, we filtered the results from each sunnetwork clustering, and deleted some clustering modules whose density was smaller or had only one node.
Fig. 5 depicts the precision, recall, fmeasure of each algorithm on four datasets. From Fig. 5, we can see that the proposed algorithm obtains the higher precision and fmeasure than other competing algorithms. After combining OPTICS with GSO algorithm, the iOPTICSGSO algorithm can produce the clustering results based on the optimal parameters. Therefore, it obtains a much better performance than the OPTICS algorithm. From the last green and blue column in Fig. 5, we can clearly see that the proposed algorithm obtains the higher precision and fmeasure than other competing algorithms.
To evaluate the biological significance and functional enrichment of the complexes identified by our algorithm, we calculated the pvalue of the identified protein complexes on Biological Process ontologies based on four datasets by using the tool SGD’s GO: TermFinder (http://www.yeastgenome.org/cgibin/GO/goTermFinder.pl). We calculate the pvalue of the protein complexes identified by six algorithms, COACH, MCL, MCODE, ClusterOne, OPTICS and OPTICS_PSO, whose size are greater than or equal to 3. The comparison results are showed in Table 4. From Table 4, it is obvious that the proposed algorithm achieves the better performance on DIP data, Krogan data, MIPS data and Gavin data. While the MCL and ClusterOne obtain poor performance on four datasets. There is a few protein complexes identified by iOPTICSGSO that are insignificant. Especially on the Krogan data, no protein complex is insignificant. That is to say, all protein complexes identified by iOPTICSGSO on Krogan data are significant. In detail, in DIP data, Krogan data, and Gavin data, the percentages of complexes with pvalue < E15 in predicted complexes by iOPTICSGSO was the highest. It accounted for 8.70%, 12.22% and 26.25%, respectively. In MIPS data, the percentage of complexes with pvalue < E15 in protein complexes identified by iOPTICSGSO was the highest. It accounted for 20.00%. As for the comparison with OPTICS_PSO, the percentage of complexes which are significant identified by iOPTICSGSO was the higher on DIP data and Krogan data. In MIPS data and Gavin data, the percentage of complexes with pvalue < E10 in protein complexes identified by iOPTICSGSO was the higher. In general, the statistical results in Table 4 indicate that iOPTICSGSO algorithm was more biologically meaningful than others for identifying significant protein complexes.
We list some identified protein complexes in Gavin data shown in Table 5. These protein complexes are not well matched with the benchmark dataset (the value of OS is low), but both have low pvalue of GO terms. The pvalue of the identified protein complexes is calculated on Molecular Function. In each row, the proteins in bold have well matched some known protein complex in benchmark complex dataset, and the additional proteins probably share the similar functions with other proteins. For example, 5 proteins do not matches the known protein complex in the first predicted protein complex, while 4 proteins of which (namely YNL248C, YJR063W, YOR340C and YIL021W) share the similar annotations—DNAdirected 5′3′ RNA polymerase activity—with the true protein complex. We visualize this protein complex shown in Fig. 6. Fig. 6a describes the interaction relationship between 16 proteins, and (b) shows the common GO slim between every two proteins. We can see clearly that the interactions in (a) are much less than those in network (b). This shows that even if there is no interaction between some proteins, but they still have the common GO slim, meaning that they as complex implement some functions with a high probability. Given the incompleteness of protein complex set, the predicted protein complexes have low value of OS but with small pvalue are highly likely to be true protein complexes. Therefore, the results provided clues for biologists to verify and find new protein complexes.
Conclusions
Protein complexes are not only the basis of normal biological processes, but also play an important role in the pathological process. Therefore, identifying protein complexes play an important role in understanding the cellular organizations and functional mechanisms. In this study, we have put forward the algorithm named iOPTICSGSO, which is the improved OPTICS algorithm by using GSO to optimize the parameter in OPTICS, and we changed the concept of core node and redefine the similarity which makes more accord with the actual situation of PPI network. As different parameter setting have different results on each subnetwork of DPIN, we have used GSO algorithm to optimize these parameters, and finally checked the quality of every cluster and gained the optimal cluster results. The experiment results have shown that our iOPTICSGSO outperforms competing algorithms in terms of fmeasure and pvalue. It means the results from iOPTICSGSO are more biologically meaningful than others for identifying significant proteins complexes. However we also found that the number of clustering modules is relatively small and the recall of clustering results is lower than other algorithms in iOPTICSGSO results. The reason may be that each protein only can belong to one cluster in iOPTICSGSO, which causes that other clustering modules are small. Therefore, it would be our focus to discover the effective strategy to improve the result and detect more protein complexes in the future.
References
Gavin AC, Bösche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick M, Michon AM, Cruciat CM, Remor M, Höfert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley RR, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, SupertiFurga G. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002;415(6868):141–7.
Kazemipour A, Goliaei B, Pezeshk H. Protein complex discovery by interaction filtering from protein interaction networks using mutual rank Coexpression and sequence similarity. Biomed Res Int. 2015;2015. Article ID 165186:1–7.
Lage K, Karlberg EO, Størling ZM, Ólason PÍ, Pedersen AG, Rigina O, Hinsby AM, Tümer Z, Pociot F, Tommerup N, Moreau Y, Brunak S. A human phenomeinteractome network of protein complexes implicated in genetic disorders. Nat Biotechnol. 2007;25(3):309–16.
Yang ZH, Yu FY, Lin HF, Wang J. Integrating PPI datasets with the PPI data from biomedical literature for protein complex detection. BMC Med Genet. 2014;7(2):S3.
Srihari S, Leong HW. Temporal dynamics of protein complexes in PPI networks: a case study using yeast cell cycle dynamics. BMC Bioinform. 2012;13(17):824–34.
Li M, Zheng RQ, Zhang HH, Wang JX, Pan Y. Effective identification of essential proteins based on priori knowledge, network topology and gene expressions. Methods. 2014;67:325–33.
Kaufman L, Rousseeuw PJ. Finding Groups in Data: An Introduction to Cluster Analysis. [M] DBLP, 1990.
Pilevar AH, Sukumar M. A gridclustering algorithm for highdimensional very large spatial data bases. Pattern Recogn Lett. 2005;26(7):999–1010.
Ester M, Kriegel HP, Sander J, Xu XW. A densitybased algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining. Menlo Park: The AAAI Press; 1996. p. 226–31.
Ankerst M, Breunig M, Kriegel H, Sander J. OPTICS: ordering points to identify the clustering structure. ACM SIGMOD Rec. 1999;28(2):49–60.
Holland JH. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. Quarterly Review of Biology. 1975;6(2):126–137.
Kennedy J, Eberhart R. Particle swarm optimization. In: Proceeding of the IEEE international conference on neural networks; 1995. p. 1942–8.
Krishnanand KN, Ghose D. Detection of multiple source locations using a glowworm metaphor with applications to collective robotics. Pasadena: IEEE Swarm Intelligence Sysposium; 2005. p. 84–91.
Adamcsek B, Palla G, Farkas IJ, Derényi I, Vicsek T. CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics. 2006;22(8):1021–3.
Bader GD, Hogue CW. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinform. 2003;4:1–27.
Liu G, Wong L, Chua H. Complex discovery from weighted PPI networks. Bioinformatics. 2009;25(15):1891–7.
Wu M, Li X, Kwoh C, Ng SK. A coreattachment based method to detect protein complexes in PPI networks. BMC Bioinform. 2009;10(1):1–16.
Nepusz T, Yu H, Paccanaro H. Detecting overlapping protein complexes in proteinprotein interaction networks. Nat Methods. 2012;9(5):471–2.
Dongen BSV. Graph clustering by flow simulation. Dissertation for doctoral degree, Center for Math and Computer Science (CWI). Utrecht: University of Utrecht; 2000.
Lei XJ, Li H, Wu FangXiang. Detecting Protein Complexes from DPINs by OPTICS Based on Particle Swarm Optimization. 2016 IEEE International Conference on Bioinformatics andBiomedicine. Shenzhen, China. 2016;1814–21.
Shi BY, Eberhart R. A modified particle swarm optimizer. Proceedings of the IEEE Congress on Evolutionary Computation. Anchorage: IEEE; 1998:303–8.
Yedidia J, Freeman WT, Weiss Y. Understanding belief Propa gation and its generalizations. Int Joint Conf Artif Intell (IJCAI). 2001;54(1):276–86.
Letovsky S, Kasif S. Predicting protein function from proteinprotein interaction data: a probabilistic approach. BMC Bioinform. 2003;19(6):197–204.
Xenarios I, Salwnski L, Duan XJ, Higney P, Kim SM, Eisenberg D. DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 2002;30(1):303–5.
Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP. Global landscape of protein complexes in the yeast Saccharomyces Cerevisiae. Nature. 2006;440(7084):637–43.
Güldener U, Münsterkötter M, Oesterheld M, Pagel P, Ruepp A, Mewes HW, Stümpflen V. MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Res. 2006;34:D436–41.
Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dümpelfeld B, Edelmann A, Heurtier MA, Hoffman V, Hoefert C, Klein K, Hudak M, Michon AM, Schelder M, Schirle M, Remor M, Rudi T, Hooper S, Bauer A, Bouwmeester T, Casari G, Drewes G, Neubauer G, Rick JM, Kuster B, Bork P, Russell RB, Furga GS. Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006;440(7084):631–6.
Pu S, Wong J, Turner B, Cho E, Wodak SJ. Uptodate catalogues of yeast protein complexes. Nucleic Acids Res. 2009;37(3):825–31.
Lei XJ, Wang F, Wu FX, Zhang AD, Pedrycz W. Protein complex identification through Markov clustering with firefly algorithm on dynamic proteinprotein interaction networks. Inf Sci. 2016;329:303–16.
Tu BP, Kudlicki A, Rowicka M, McKnight SL. Logic of the yeast metabolic cycle: temporal compart mentalization of cellular processes. Science. 2005;310:1152–8.
Zhang AD. Protein interaction networks: computational analysis. New York: Cambridge University Press; 2009.
Brohée S, Helden JV. Evaluation of clustering algorithms for protein–protein interaction network. BMC Bioinform. 2006;7(1):1–19.
Friedel CC, Krumsiek J, Zimmer R. Bootstrapping the interactome: unsupervised identification of protein complexes in yeast. In. In: Vingron M, Wong L, editors. Proceedings of the 12th annual conference on research in computational molecular biology (RECOMB); 2008. p. 3–16.
Sadeque A, Serão NV, Southey BR, Delfino KR, RodriguezZas SL. Identification and characterization of alternative exon usage linked glioblastoma multiforme survival. BMC Med Genet. 2012;5(1):59.
AltafUlAmin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S. Development and implementation of an algorithm for detection ofprotein complexes in large interaction networks. BMC Bioinformatics. 2006;7:207–19.
Acknowledgements
We are grateful to the help of National Natural Science Foundation of China. We appreciate the experimental conditions provided by our college. Especially, we thank our laboratory members for useful discussion and comments.
Funding
This paper is supported by the National Natural Science Foundation of China (61,672,334, 61,502,290, and 61,401,263).
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
About this supplement
This article has been published as part of BMC Medical Genomics Volume 10 Supplement 5, 2017: Selected articles from the IEEE BIBM International Conference on Bioinformatics & Biomedicine (BIBM) 2016: medical genomics. The full contents of the supplement are available online at https://bmcmedgenomics.biomedcentral.com/articles/supplements/volume10supplement5.
Author information
Authors and Affiliations
Contributions
X.L. conceive the study, guided the design of the method and the algorithm. H.L. designed and performed the experiment and analyzed the data. X.L. and H.L. drafted the manuscript. A.ZH. and F.X.W revised the manuscript and polished the English expression. All the authors read and approved the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Lei, X., Li, H., Zhang, A. et al. iOPTICSGSO for identifying protein complexes from dynamic PPI networks. BMC Med Genomics 10 (Suppl 5), 80 (2017). https://doi.org/10.1186/s129200170314x
Published:
DOI: https://doi.org/10.1186/s129200170314x
Keywords
 Ordering points to identify the clustering structure algorithm (OPTICS)
 Glowworm swarm optimization algorithm (GSO)
 Protein complex
 Densitybased clustering