Integrative subspace clustering by common and specific decomposition for applications on cancer subtype identification

Background Recent high throughput technologies have been applied for collecting heterogeneous biomedical omics datasets. Computational analysis of the multi-omics datasets could potentially reveal deep insights for a given disease. Most existing clustering methods by multi-omics data assume strong consistency among different sources of datasets, and thus may lose efficacy when the consistency is relatively weak. Furthermore, they could not identify the conflicting parts for each view, which might be important in applications such as cancer subtype identification. Methods In this work, we propose an integrative subspace clustering method (ISC) by common and specific decomposition to identify clustering structures with multi-omics datasets. The main idea of our ISC method is that the original representations for the samples in each view could be reconstructed by the concatenation of a common part and a view-specific part in orthogonal subspaces. The problem can be formulated as a matrix decomposition problem and solved efficiently by our proposed algorithm. Results The experiments on simulation and text datasets show that our method outperforms other state-of-art methods. Our method is further evaluated by identifying cancer types using a colorectal dataset. We finally apply our method to cancer subtype identification for five cancers using TCGA datasets, and the survival analysis shows that the subtypes we found are significantly better than other compared methods. Conclusion We conclude that our ISC model could not only discover the weak common information across views but also identify the view-specific information.


Background
With the advancements of biological technologies, there are many kinds of data available such as genomic DNA copy number arrays, DNA methylation, exome sequencing, messenger RNA arrays, microRNA sequencing and reverse-phase protein arrays and so on. By analyzing the multiple data generated by cancer patients, it is now possible to classify cancer patients to different subgroups, and *Correspondence: liminli@mail.xjtu.edu.cn School of Mathematics and Statistics, Xi'an Jiaotong University, Xianning West 28, Xi'an, China thus improve the diagnostic and treatment. For example, Breast cancer is one of the most common cancers worldwide, and it is clinically categorized into four basic therapeutic subgroups: (1). Luminal A with oestrogen receptor (ER) positive group; (2). Luminal B with oestrogen receptor (ER) positive group; (3) HER2 amplified group; (4) triple-negative breast cancers (TNBCs, also called basallike, lacking expression of ER, progesterone receptor (PR) and HER2). The ER positive (including Luminal A and B) is the most common and diverse, and several genomic tests can be used to predict outcomes for ER+ patients receiving endocrine therapy. The treatment for the HER2 amplified subtype has a great success due to the effective therapeutic targeting of HER2. The basal-like breast cancers, often with BRCA1 mutations or of African ancestry have only option of chemotherapy. Therefore, subtype identification for breast cancers surely can assist the treatment for the patients.
Most molecular studies of subtype identification for breast cancer integrate genomic, epigenomic, and transcriptomic profiling including mRNA expression profiling, miRNA expression, DNA methylation and DNA copy number analysis, and so on. It is assumed in these studies that integrative clustering of multi-omics data can capture clearer structure that can not be discovered by only exploring a single omic data. In fact, in many other applications, a single object often can be represented by multiple features or views. For example, an image can be represented by its pixels and its captions, an Internet webpage can be represented by its text contents and the hyperlinks to other webpages, and a scientific publication can be represented by its text contents and its citations. In all these applications, multi-view clustering takes information from all views into account such that better clustering structures could be discovered.
The difficulty in multi-view learning mainly lies in that the similarity measurement, geometric distribution, clustering structure, and noisy levels and so on are often diverse for different views. Samples represented in different views may have their own clustering structures, or subspaces they lie in. The differences hamper the clustering significantly. It is challenging to efficiently reconcile the conflicting information among views.
Most of existing multi-view clustering approaches follow three directions. The first class of methods [1][2][3][4][5][6][7] attempt to determine new representations by minimizing the differences or maximizing the correlations between different views. The second class of approaches propagate information from different views to construct graphs or similarities in a slightly different way, including multi-view EM [8], multi-view spectral clustering [9,10], multi-view clustering with unsupervised feature selection [11,12], nonnegative Matrix Factorization [13], pattern fusion [14], similarity network fusion (SNF) [4]. For example, the similarity network fusion (SNF) [4] fuses multiple networks to one network by iteratively updating a sequence of nonnegative status matrices. The third class of methods aim to learn an optimal linear combination of multiple kernels or similarities [15][16][17][18][19][20]. For example, the optimized kernel k-means [16] is proposed to obtain optimal linear combination of multiple kernels and cluster assignment matrix simultaneously by minimizing a trace clustering loss.
However, almost all the existing methods assume strong consistency among different views or omics, and thus they capture the clustering structure by using the hidden shared information. This may face problem in the case when the different views share relatively weak common clustering structure. For instance, different views may have different levels of noisy information. Furthermore, different views may have conflicting clustering structures, or one single view may have different clustering structures with all the others. All of these may make it difficult to identify the shared information among views. A biological example is that, the analysis on different omics for glioblastoma multiforme (GBM), an aggressive adult brain tumor, obtains different results. One work [21] based on expression and copy-number-variant data, identifies two subtypes, which is inconsistent with the results obtained in [22], which identifies four subtypes primarily only by expression data. Therefore, when the consistent information is weaker than the conflicting information, which is highly likely in subtype identification, it is challenging to discover the hidden clustering structures. A natural idea to overcome this challenge is to decompose the information in each view to a shared part across all views and a view-specific part. A kernel based method [23] is developed following this idea, which attempts to construct a consensus kernel using multi-omics data. However, for applications, it focuses more on the common part, but ignores the view-specific clustering structure. Furthermore, the semi-definite programming for the optimization problem is computational complex.
In this work, we propose a novel integrative subspace clustering method by assuming that the common structure information is weak across views. The main idea is to find a specific subspace for each view, so that the new representation for each sample in each view in this subspace is a concatenation of two vectors, say, a common representation among all views, and a specific representation for this view. This could make sure that the common parts and the specific parts lie in two orthogonal subspaces for each view. Furthermore, the representations of the common part are expected to be independent with those of each specific part, where the dependence is measured by Hilbert Schmidt Independence Criterion (HSIC). Our main contributions in this work are summarized as follows.
1. We propose a novel subspace learning model to discover the common and specific representations for each sample, especially for the case when the common information might be relatively weaker than the specific information. We propose an algorithm to solve the corresponding optimization problem efficiently. 2. We test our method on simulation datasets, text multi-view datasets, cancer type identification, and it works the best for most cases. Especially, our model works even the common information across views is very weak.
3. We apply the proposed clustering method on subtype identification, by assuming that the subtype information may also come from the view-specific part of a single omics data. We apply our approach to identify subtypes for five cancers using TCGA datasets. The survival analysis on the clustering results shows that our method works the best for most cases.

Methods
In this section, we will present the proposed integrative subspace clustering method by multi-view matrix decomposition. We first give a problem statement, and then propose a subspace learning method by mult-view matrix decomposition. We then introduce the Hilbert Schmidt Independence Criterion, and finally propose our integrative subspace clustering model ISC and the corresponding optimization algorithm.

Problem statement
Suppose we are given n samples with V views, The aim is to cluster the n samples with a given cluster number based on the integrative information from the v views. In cancer subtype identification, the views can be different data sources, omics or platforms.

Subspace learning for common and specific decomposition
We consider the samples X v ∈ R p v ×n from view v are approximately lying in a d-dimensional subspace v ⊂ R p v (d < p v ), which is spanned by the columns of an orthonormal matrix P v ∈ R p v ×d , P T v P v = I d . This means that We assume that the samples X v from view v have both common and specific clustering structures, which means that z v i can be further represented as where c i ∈ R d 0 is the common representation of x i across all views, and s v i ∈ R d v is the specific representation of x i in the v-th view. Note that d = d 0 + d v . In other words, x v i can be approximately represented as v are orthogonal subspaces to each other. We can rewrite the above equations in a matrix form as follows, where and E v is the error matrix for view v. We demonstrate the decomposition idea in Fig. 1. We attempt to find two orthogonal subspaces Hopefully, the common clustering structure is hidden in C, and the specific clustering structure for view v is hidden in S v .

Hilbert-Schmidt Independence criterion (HSIC)
To better decompose each view to a common and a view-specific part, such that each view-specific clustering structure in S v is independent to the common part C across all views, a measurement for independence is required. We measure the independence by using the Hilbert-Schmidt Independence Criterion (HSIC) which is a measure of statistical independence [24]. Intuitively, HSIC can be considered as a squared correlation coefficient between two random variables c and s computed in feature spaces F and G.
Let c and s be two random variables from the domains C and S, respectively. Let F and G be feature spaces on C and S with associated kernels k c : C × C → R and k s : S × S → R, respectively. Denote the joint probability distribution of c and s by p (c,s) , and (c, s) and (c , s ) are drawn according to p (c,s) . Then the Hilbert Schmidt Independence Criterion can be computed in terms of kernel functions via: where E is the expectation operator. The empirical estimator of HSIC for a finite sample of points C and S from c and s with p (c,s) was given in [24] to be HSIC((C, S), F, G) ∝ tr(K c HK s H), Fig. 1 Demonstration of the main idea for the common and specific decomposition in our ISC model. a shows the plots for X 1 and X 2 respectively. b shows how the original X v is decomposed to two parts C and S v in two subspaces. c shows the plots for the reconstructed Z v , respectively. Note that the two axes of Z v represent two subspaces. We can see that in the two subspaces, the samples are clustered in different ways where tr is the trace operator of a matrix, H is the centering matrix H = I n − ee T n (e is a proper dimensional column vector with all ones), and K c and K s ∈ R n×n are kernel matrices. The smaller the HSIC value, the more likely C and S are independent from each other.

Integrative subspace clustering (ISC) model
Based on the above considerations, we propose our integrative subspace clustering model as follows, where S T v S v and C T C are the linear kernels of S v and C, respectively, and β is a parameter. Note that the first term is the decomposition term that tries to find the orthogonal subspaces where the corresponding common and view-specific representations lie in, and the second independence term is to minimize the dependence between the common part and the view-specific part. We use the linear kernel of C and S v to simplify the computation. After C and S v s for all views are obtained, k-means clustering is applied to cluster the samples represented by C and S v , respectively. The clustering results by using the common part C and the specific part S v are called ISC-C, ISC-S1,ISC-S2, · · · , respectively.
Based on the resulting C and S i s, we define a consensus score(C-score) which is similar to [23] as below: C-score is used to measure the weight of the consensus part in the i-th view. Note that the C-score ranges from 0 to 1, and a higher C-score implies stronger consistent information in the corresponding view.

Optimization algorithm
We propose an alternative updating approach to solve the optimization problem (3).
Step 1. We first fix P v and C in (3), and solve for optimal S 1 , · · · , S v one by one. The v-th optimization subproblem can be written as: Since P v can be represented as , the subproblem (5) to solve for S v can be simplified to: By setting the derivatives of the objective function f (S v ) in (6) with respect to S v to be zero, we obtain The matrix equation for S v in (7) is a standard Sylvester equation and can be solved efficiently using method in [25].
Step 2. We then fix C, S 1 , · · · , S V , and solve the optimization problem (3) for optimal P 1 , · · · , P V one by one. The corresponding v-th optimization subproblem can be written as: where The optimization problem (8) is a least square problem on grassman manifold, and solved by algorithm 2 in [26].
Step 3. We fix P 1 , · · · , P V and S 1 , · · · , S V , then solve the optimization problem (3) for C. The corresponding subproblem can be written as: Similarly, we set the derivatives of objective function of the subproblem (9) with respect to C, and obtain The matrix equation for C in (10) is also a standard Sylvester equation and the same algorithm for solving (7) can be used.
The overall algorithm for solving (3) is shown in the algorithm box ISC. For each iteration, we need to solve three subproblems in our ISC algorithm to alternatively update S v , P v and C. Since the objective function of ISC model in (3) has a lower bound of zero. and the objective values of our method is decreasing at each step to solve the three subproblems. Therefore the convergence of objective values in our algorithm can be assured. We also experimentally show the convergence of objective values by using four text datasets in Fig. 2, which further confirms the convergence analysis above.

Comparative methods
We compare our ISC model with the following comparative methods.
Fix the others and update S v by solving the Eq. (7) 6.
Fix the others and update P v by solving the Eq. (8) 7. end 8.
Fix the others and update C by solving the Eq. (10) 9. end • Co-regularized spectral clustering (Coreg) [3]. The coreg method extends the single view spectral clustering method by adding a co-regularization term which forces the low embeddings from multiple views to be close. • Similarity network fusion (SNF) [4]. The SNF method integrates the sample similarity network constructed by each data type into a single similarity network by a nonlinear combination approach. This converged network can be used to cluster multi-view datasets. • Enhanced consensus multi-view clustering model(ECMC) [23]. The ECMC method attempts to find the consensus kernels of multiple views by dividing the kernel of each view into a consensus kernel and a disagreement kernel. The method can achieve a relatively good clustering effects even the correlation between views is weak.

Measurements of clustering performance
We use the following three measurements to evaluate the clustering results when the ground truth clustering is given.
• Normalized mutual information (NMI). The normalized mutual information (NMI) of a clustering result C = {C k } is defined as where C * = {C * l } is the ground truth clustering, p(C k ) := |C k |/n, p C i , C * j is the joint probability of the two classes C i and C * j , and • Average clustering accuracy (ACC). with the clustering labels {l j } of C in a suitable clustering ordering which matches the ground truth labels l * j of C * , the average clustering correction (ACC) is defined as . For a computed cluster C i and a ground truth cluster C * j , let n i. = |C i |, n .j = |C * j |, and n ij = |C i ∩ C * j |. The adjusted rand index is defined as where C represents combination number operator. The range of ARI is from -1 to 1. A larger value of ARI means that the clustering result is more consistent with the ground truth clustering.
• Silhouette score (S-score) [27]. When the ground truth clustering is unkonwn, the above criterions could not be computed, and thus Silhouette score defined as follows can be used where a i is the average Euclidean distance from sample i to the other samples within the same cluster of sample i and b i is the minimum of the average Euclidean distance from sample i to all samples in any one of the other clusters different from the cluster of sample i. The range of silhouette score is from -1 to 1. The larger the silhouette score is, the better the clustering structure is.

Simulation experiments
In this section, we use synthetic datasets to evaluate our ISC model. The synthetic datasets are generated in the following way. We first sample 200 two-dimensional , μ 2 =[ 3, −10] and a common covariance matrix =[ 10 0; 0 6], and thus could obtain a matrix Y ∈ R 2×200 . By adding white noises to Y, we can get two data matrices Y 1 ∈ R 2×200 and Y 2 ∈ R 2×200 , which can be considered as the common part for two views. We then construct two specific matrices T 1 and T 2 by randomly permuting the columns of Y 1 and Y 2 , respectively. Finally, we randomly construct two matrices P v ∈ R 8×4 and construct the two-view matrices , where t is a parameter which could control the degree of inconsistency of different views. Note that the ground truth clustering labels for both common part, and the two specific parts are both known and denoted by y, y 1 , y 2 . We construct 10 corresponding datasets by taking t = {0.1, 0.9, 1, 2, 5, 6, 10, 15, 20, 30}. We report the consensus scores for two views on simulation datasets in Table 1. From the table, we can see that simulation datasets with small t have high consensus scores and those with large t have low consensus scores. Table 2 The average NMIs, ACCs and ARIs obtained by the our ISC method and other comparison partners in simulation datasets We first compare the three clustering results obtained by our method and show their performance when t changes. We apply our ISC model to compute the corresponding common part C and the specific parts S 1 and S 2 . k-means clustering is then applied on C, S 1 and S 2 , and three corresponding clustering results ISC-C, ISC-S1 and ISC-S2 are obtained, respectively. Since the k-means method may be sensitive to the initials, we run the kmeans method 100 times and report the average of the results. We choose the parameter β from {0, 1e − 6, 1e − 5, · · · , 1e + 5, 1e + 6}. We report the average Silhouette scores for the three clustering results in Table 1. As we can see, the clustering result of ISC-C achieves a higher silhouette score than the clustering results of ISC-S1 and ISC-S2 for any t, which indicates that the common part may have better clustering structure in the simulation datasets. We also compute the NMI, ACC and ARI by comparing the three clustering results with the ground truth labels y, y 1 and y 2 , respectively. The average values are reported in Table 2. We have two observations from the results. First, ISC-C peforms perfect when t changes, and the results by ISC-S1 and ISC-S2 are getting better when t increases. This means that the our ISC-C could always capture the common structure even the consisitency is very weak, and our ISC-S1 and ISC-S2 could capture the specific structures better when the consistency gets weak. Second, ISC-C achieves higher NMI, ACC and ARI values than ISC-S1 and ISC-S2, which is consistent with the results obtained by silhouette scores. This implies that Silhouette scores may be used to select the best clustering result.
We then compare our clustering result by ISC-C with the comparison methods by computing NMI, ACC, and ARI of each methods, which all assume strong consistency across views except ECMC. The average values of all the methods are reported in Table 2. When t is relatively small, almost all the methods could perform well. When the degree of inconsistency increases as t increases, our method ISC-C outperforms other methods. That is because, when the consistency signal is very weak, existing methods could not capture the common clustering structure any more, but our ISC-C could discover the common clustering structure very well. We also plot the clustering results for all multi-view methods with t=0.1 and t=10 in Fig. 3. In the figure, since the common result of the SNF method is in the form of the kernel, we present all the data in the form of a kernel. Specifically, as for the simulation datasets, the linear kernel of X v , Y v and T v are denoted as K v , K c v and K s v , respectively. In addition, when using a linear kernel, equations We can see that in Fig. 3a, t is small and consensus score is big, and all methods could discover the latent common clustering structure with high accuracy. However, in Fig. 3b, when t is big and the consensus score is low, all baseline methods fail to discover the best clustering structure, but our ISC-C method could still capture the common structure across views. This further shows the power of our method even when the common information is very weak.

Experiments on multi-view text datasets
In this section, we evaluate our ISC method on multi-view text datasets. Since only the ground truth labels for common part is known, we compare the ISC-C results with other methods.  The highest NMI, ACCs and ARIs are marked in bold probabilistic methods, reinforcement learning, rule learning, and theory. There are 2,708 papers in the entire corpus. The dataset consists of two views. One view is represented by a 0/1 value word vector, indicating the absence/presence of the corresponding word in the dictionary. The other view is the citation relationship between each publication and other publications.
By using the ISC model, we could obtain the common part C. We then apply k-means clustering on C. We compare the results of ISC-C with other methods, and the results are shown in Table 3. We can see from the table that, our ISC model works the best for most cases.

Identifying cancer types by colorectal cancer dataset
Tumors may not be diagnosed pathologically, and thus it's meaningful to determine whether the patient's specific symptoms are colon cancer or colorectal cancer. We further evaluate our method by identifying colon cancer and colorectal cancer on a colorectal cancer dataset [28]. which consists exome sequences, DNA copy number, promoter methylation and messenger RNA, and microRNA expression for 276 patients. We select three types of expression data including DNA methylation, mRNA expression and miRNA expression. Specifically, DNA methylation profiles are obtained by the Illumina Infinium HumanMethylation27 arrays, mRNA expression profiles are generated by Agilent microarray, and miRNA quantification via Illumina sequencing. After screening, we obtain 85 cancer patients with colon cancer and colorectal cancer.
We apply our ISC model to identify the cancer types (colon cancer or colorectal caner) for these patients with two or three views, and obtain the corresponding common part C and three specific parts S1, S2 and S3. Since we assume that the cancer type or subtype structures may be specifically shown in a single omics, we check the clustering results for both the common and specific parts and see whether they capture the clustering information for cancer types. Note that the ground truth for cancer types is known, thus we could also calculate NMI, ACC  and ARI by using the common part ISC-C, the specific parts ISC-S1, ISC-S2, ISC-S3. The results are reported in Table 4. Our method performs better than the baseline methods for most of the cases. Overall, our method ISC-C with common part with DNA methylation and miRNA expression data performs the best among all the obtained clustering results. While for miRNA and mRNA expression, SNF works the best, our ISC method with the specific part of DNA methylation (ISC-S1) works the best among all methods on the view combinations with DNA methylation. It may imply that DNA methylation plays an important role in the identification of the cancer type. This confirms our hypothesis that information about the type of cancer may be hidden in a particular omics.

Applications on cancer subtype identification using TCGA datasets
We finally apply our ISC model on The Cancer Genome Atlas (TCGA) Research Network [29] to identify subtypes for five cancers. TCGA is currently the largest database of cancer genetic information, and has included 33 types of cancer including 10 rare cancer types. In addition, in the database, each cancer data contains gene expression data, miRNA expression data, copy number variation, DNA methylation, SNP, etc., and has sufficient clinical data.

Data sets
The datasets for five cancers using TCGA datasets are collected by Wang et al. [4]. The datasets contain five cancer types: polymorphism Glioblastoma (GBM), renal clear cell carcinoma (KRCCC), breast invasive carcinoma (BIC), colon adenocarcinoma (COAD) and lung squamous cell carcinoma (LSCC). There are three types of cancer expression data: DNA methylation, mRNA expression, and miRNA expression, as well as clinical information, including survival data for patients. Since we don't have the ground truth labels for the subtypes of these datasets, survival analysis is mainly used to evaluate our model. For each of the five datasets, we apply the ISC model to compute the common part and specific parts, and then apply k-means to obtain clustering results. The procedure for obtaining the cancer subtype of the dataset is the same as that of Colorectal cancer dataset. The numbers of subtypes are chosen as 3, 3, 4, 3 and 4 for GBM, KRCCC, BIC, COAD, and LACC [4], respectively. We also report consensus scores for the three views of the five cancers in Table 5. As we can see, the consensus scores for the first two views are both very low. This implies that the consistency information across views are relatively weaker compared to the inconsistency, and thus the traditional multi-view methods may not work.

Survival analysis
We apply the log-rank test to measure whether different subtypes obtained by clustering are meaningful, since the survival time in months are given for each sample in the TCGA datasets . The log-rank test is a commonly used non-parametric test method for comparison of survival processes in survival analysis and can be used to compare whether two or more sets of survival curves are identical. In general, the smaller the p-value obtained from it, the  The highest silhouette scores are marked in bold more different the survival curves of the two or more groups.
The log-rank p-values for all the methods are reported in Table 6. we can see from the table that, for four cancers including GBM, BIC, KRCCC, and LSCC, our ISC method could obtain the most significant p-values. For COAD, our method with ISC-S2 could obtain the similarly good p-value with the ECMC method. Furthermore, the subtypes for GBM and KRCCC found by the common part across three views obtain the most significant pvalues, the BIC subtypes found by miRNA expression are the most significant, and the subtypes for LSCC found by DNA methylation are the most significant. We also report the silhouette scores for the clustering results of ISC-C, ISC-S1, ISC-S2, and ISC-S3 in Table 7. By comparing Tables 6 and 7, for four of five datasets except GBM, the best clustering results with the best cox p-values among our four clustering results are corresponding to the highest silhouette scores. This implies that the our selection sheme for the clustering results is effective in this application.
We also plot the Kaplan-Meier survival curves by the ISC clustering results with the most significant p-values for all the five cancer types. Figure 4 shows the curves for GBM, BIC, COAD, and LSCC, and   Table 6)  Table 6) SNF in Fig. 5. We can see the survival curves by our ISC method are more significantly different than that obtained by the other compared methods.

Subtype visualization
We further analyze the obtained breast cancer subtypes by our model ISC with S3, since S3 by miRNA expression generates the most significantly different survival profiles across different subtypes. Fig. 6 shows the visualization of four breast cancer subtypes identified by the specific part of miRNA (S3). It can be seen that with the clustering results, the samples in the other two views -mRNA expression and DNA methylation-are not separated, and some subtypes are even very similar. However, the characteristics of miRNA expression for the four subtypes seem significantly different. This implies that the resulting best subtype identified by ISC-S3 is specifically shown by miRNA expression, but not shown in other views.

Drug treatment analysis on cancer subtypes
We finally validate the obtained subtypes by comparing the survival profiles from different treatment groups in each subtype. We choose two drug treatments of Cytoxan and Adriamycin for breast cancer, and drug treatment temozolomide for GBM. For each subtype, we check whether the survival profiles are significantly different between the treatment patients and the untreated patients. The Cox p-values for all the three treatments in all subtypes are reported in Table 8. Interestingly, we can see that for breast cancer, the patients in Subtype 2 is sensitive to the two drug treatments of Cytoxan and Adriamycin. The Kaplan-Meier survival curves of these two  Fig. 7. In Subtype 1 of GBM, the patients with treatment temozolomide have significantly different survival profiles with the untreated patients in this subtype. the Kaplan-Meier survival curves of glio cancers in Subtype 1 is shown in Fig. 8. These further validate that the Subtypes we cound is biological meaningful.

Discussion on breast subtypes
We further discuss the subtypes we found for breast cancer. Breast cancer is a heterogeneous and polygenic disease, which is one of the most common malignancies in women. Based on histological and genomic features, breast cancer can be roughly separated into four subtypes (luminal A, luminal B, HER2-amplified, and basallike) [30].
To date, researchers have reported many genes related to subtypes of breast cancer. We firstly collect genes associated with these subtypes, respectively, and then check the matching between our resulting four subtypes and these four known subytpes. BUB1, CDCA4, CHEK1, FOXM1 and HDAC2 probably are the key genes in basallike subtype. Because alterations in these genes is a kind of deletion event in the basal cancers, which is related with basal-like cancer enriched subgroup, harbours chromosome 5q deletions, and several signaling molecules, transcription factors and cell division genes [31]. Besides, basal-like subtype may also correlate with the gene EGFR, which is supported with the fact that alterations of EGFR, p53 and pTeN are cooperative and likely to play an important role in basal-like breast cancer pathogenesis [32]. For luminal B subtype, PPP2R2A is an associated gene due to the dysregulation of specific PPP2R2A functions in luminal B breast cancers [31]. The genes ZNF703 and DHRS2 are likely to correlate with luminal B since [33] suggests ZNF703 is a luminal B specific driver and Tumors with elevated ZNF703 levels were characterized by alterations in a lipid metabolism and detoxification pathway that include DHRS2 as a key signaling component. For HER2 subtype, [34] confirms that agents targeting GAB2 or GAB2-dependent pathways may be useful for treating breast tumors that overexpress HER2, and thus we include GAB2 as a correlated gene for HER2 type breast cancer. Besides, Trastuzumab blocks the HER2-HER3(ERBB3) interaction and is used to treat breast cancers with HER2 overexpression, although some of these cancers develop trastuzumab resistance. By using small interfering RNA (siRNA) to identify genes involved in trastuzumab resistance, [35] identified several kinases and phosphatases that were upregulated in trastuzumab-resistant cancers, including PPM1H. This suggests that PPM1H and ERBB3 may have some link with HER2 type breast cancer.
For each computed subtype by our ISC algorithm, we first calculate t-test p-values for each of these correlated The subtype with p-value in boldface may correspond to a true breast cancer subtype genes to show whether the gene expression levels are significantly changed between the subtype and the other subtypes. We then apply the Fisher's combined probability test [36] to compute the group p-values for these genes, which could test whether the group of the selected genes are significantly different between the subtype the and other subtypes. We report the group p-values for each resulting subtype in Table 9. The results show that, our computed Subtype 2 is highly likely corresponding to the basal-like breast cancer subtype, with group p-value being 3.83e-08. Our computed Subtype 4 may also contain the basal-like breast cancer subtype, with group p-value being 4.79e-07. Our Subtype 4 probably corresponds to the HER2 breast cancer subtype, with group p-value being 4.17e-07, and our Subtype 3 is likely to correspond to the luminal B breast cancer subtype.

Conclusion
Our goal in this work is to discover common and specific information simultaneously from multi-views when the consistency across views is relatively weak, and the specific signal is strong. We propose integrative subspace clustering method (ISC) by common and specific decomposition to find two orthogonal subspaces for each view.
To better distinguish the common and view-specific part, we also hope the common part and view-specific part are as independent as possible by using the measurement HSIC. Our simulation experiments, real-world benchmark experiments, cancer type identification by colorectal data, subtype identification for five cancers by TCGA datasets all show that the ISC model outperforms other state-of-art multi-view clustering algorithms. In particular, we find some interesting subtypes in breast cancer and GBM cancer, and the survival analysis shows that the subtypes are biologically meaningful.