- Open Access
An isomiR expression panel based novel breast cancer classification approach using improved mutual information
© The Author(s) 2018
- Published: 31 December 2018
Gene expression-based profiling has been used to identify biomarkers for different breast cancer subtypes. However, this technique has many limitations. IsomiRs are isoforms of miRNAs that have critical roles in many biological processes and have been successfully used to distinguish various cancer types. Biomarker isomiRs for identifying different breast cancer subtypes has not been investigated. For the first time, we aim to show that isomiRs are better performing biomarkers and use them to explain molecular differences between breast cancer subtypes.
In this study, a novel method is proposed to identify specific isomiRs that faithfully classify breast cancer subtypes. First, as a null hypothesis method we removed the lowly expressed isomiRs from small sequencing data generated from diverse breast cancers types. Second, we developed an improved mutual information-based feature selection method to calculate the weight of each isomiR expression. The weight of isomiR measures the importance of a given isomiR in classifying breast cancer subtypes. The improved mutual information enables to apply the dataset in which the feature is continuous data and label is discrete data; whereby, the traditional mutual information cannot be applied in this dataset. Finally, the support vector machine (SVM) classifier is applied to find isomiR biomarkers for subtyping.
Here we demonstrate that isomiRs can be used as biomarkers in the identification of different breast cancer subtypes, and in addition, they may provide new insights into the diverse molecular mechanisms of breast cancers. We have also shown that the classification of different subtypes of breast cancer based on isomiRs expression is more effective than using published gene expression profiling. The proposed method provides a better performance outcome than Fisher method and Hellinger method for discovering biomarkers to distinguish different breast cancer subtypes. This novel technique could be directly applied to identify biomarkers in other diseases.
- Improved mutual information
- Breast cancer subtype
MicroRNAs (miRNAs) are short RNA molecules and play vital regulatory roles in a variety of biological processes . Mature miRNAs are generated from longer transcripts via several sequential processing steps . First the primary miRNA transcripts (pri-miRNA) are cleaved by the Microprocessor complex that contains Drosha, an RNase III enzyme . The cleaved precursor miRNAs (pre-miRNA) are further processed by another RNase III enzyme, Dicer, to produce small miRNA duplexes . Alterations in miRNA maturation, such as the alternative and imprecise cleavage of Drosha and Dicer, or the turnover of miRNAs could result in miRNAs that are heterogeneous in length and/or sequence [5, 6]. These variants are called isomiRs (isoforms of miRNA) and can be divided into three main categories: 3′ isomiR (trimmed or addition of one or more nucleotides at the 3′ position), 5′ isomiR (trimmed or addition of one or more nucleotides at the 5′ position), and polymorphic isomiR (some nucleotides within the sequence are different from the wild type mature miRNA sequence) .
It could be envisioned that the increased expression of miRNA variants, or individual isomiRs, lead to the loss or weakening of the function of the corresponding wild type mature miRNA or result in the regulation of a different transcriptome. Recent studies suggest that isomiRs probably play vital roles in a variety of cancers, tissues, and cell types . For example, Juzenas and colleagues claimed that isomiRs are differentially expressed in different human blood cell types . Telonis and colleagues showed that specific isomiRs could be superior cancer biomarkers compared to mature miRNAs when they used isomiRs to classify 32 different cancers . Specifically, Telonis and colleagues demonstrated that miRNA-based analysis was unable to differentiate two specific subtypes of breast cancer while, in comparison, isomiRs were able to make clear distinctions between the two subtypes . These findings suggest that isomiRs may play critical roles in differentiating subtypes of breast cancer and, furthermore, may provide novel insights into understanding the molecular mechanisms leading to the development of breast cancers.
Breast cancer is the most common cancer and the second leading cause of cancer-related deaths among women worldwide . Routine clinical evaluation and diagnosis of breast cancer is categorised into three major distinct molecular subtypes based on their hormone receptor status: estrogen receptor (ER α) and progesterone receptor (PR) positive, Herceptin 2 positive (HER2+), and triple negative (ER/PR/HER2 negative) [12–14]. However, the link between molecular mechanisms and disease prognosis defining the breast cancer subtypes is unclear . Understanding the mechanisms of breast cancer subtyping is clinically useful with respect to prognosis, prediction, and informed therapeutic choices . Within the major breast cancer subtypes, gene expression profiling has been used to further classify these molecular subtypes with the potential to design more specific targeted therapies . In addition, gene expression profiling has been found to be more predictive of treatment response. For example, in a study by Finn and colleagues they showed reclassification of breast cancer subtypes using an unbiased gene expression profiling technique predicted a better treatment outcome compared to the conventional breast cancer subtyping (ER/HER2 status) . In this study, a subset of three genes expressed in breast cancer were more likely to predict responsiveness to dasatinib, a small molecule specific kinase inhibitor. Dasatinib has been used in clinical trials for hard to treat metastatic breast cancer . However, most breast cancer clinical trial studies using dasatinib are inconclusive and potentially these studies would benefit from gene profiling to understand the lack of responsiveness.
Complex genetic diseases, such as breast cancer, inherently pose the problem to be characterised by a few biomarkers that faithfully characterise the subtypes of the disease. MiRNAs and isomiRs provide a potentially better alternative for classifying complex diseases compared to mRNA based biomarkering since they are regulatory “hubs” of gene expression. Therefore, the changes in their expression could influence multiple downstream mRNAs and therefore diverse biological pathways.
In this paper, we present a novel method that applies isomiR expression profiles for improved classification of breast cancer types using small RNA sequencing data available in the TCGA database. Firstly, since the TCGA dataset has many lowly expressed isomiRs that have significant negative influence on the identification of biomarkers, these lowly expressed isomiRs should be removed. The traditional method for removing the lowly expressed isomiRs is by selecting a ‘hard’ threshold [8, 9]. If the expression levels of an isomiR is lower than this ‘hard’ threshold, this isomiR is viewed as lowly expressed and should be removed. However, this ‘hard’ threshold may lead to a loss of information . In order to tackle this disadvantage, a ‘soft’ method based on a null hypothesis method was applied, and this method was designed to remove these lowly expressed isomiRs. Secondly, we utilized an improved mutual information method to calculate the weight of each isomiR, which measured the significance of the isomiR to classify different subtypes of breast cancer. The higher the weight of the isomiR, the more suitable the isomiR for classifying different subtypes of breast cancer. The traditional mutual information can only be used if both the feature and the label are continuous or discrete data. This improved mutual information can be applied to features if it is continuous data and the label is discrete data. Finally, a few isomiRs, which have high weights, were able to classify different breast cancer subtypes. In order to identify these key isomiRs, the SVM classification method was used.
Although there are many methods that have been designed for biomarker discovery, they can be divided into two major categories. The first category selects a set of biomarkers that can classify the data , such as support vector machine (SVM) , mutual information , and swarm optimizer . These methods do not calculate the weight of each biomarker and therefore, the importance of the biomarker in each breast cancer subtype classification is not known. The weight of the biomarker may reflect its regulatory importance in the molecular mechanism of the disease; therefore, it may be worth studying the potential role of gene regulation of highly weighed biomarkers. Another category of methods view the gene or isomiR as the feature and calculates the weight of each feature. The weight of the feature measures the importance of the feature in the classification. The top N features viewed as biomarkers. Information gain, t-test, and fold change methods are widely applied to identify biomarkers . However, t-tests and fold change methods are not suitable for identifying biomarkers from the data that has more than three categories. Although the information gain can be applied to find biomarkers from multiple categories, this method is very time consuming. Other methods, such as Fisher  and correlation coefficient method , can calculate the weight of each feature for data that comprises of more than two categories and is less time consuming than information gain. However, these methods also have their limitations. The Fisher method is based on the mean and standard deviation of the dataset and therefore, small data sets, confounded by outliers will negatively influence the results. If weights of the feature are calculated by the correlation coefficient method, it challenges the rank features based on their weights . Together, all these methods used for identifying biomarkers have their limitations. Therefore, a novel method is needed to identify unique, more discrete and effective biomarkers.
Data source and definitions
The expression profiles of isomiRs in breast cancer patients can be downloaded from TCGA GDC data portal website (https://portal.gdc.cancer.gov). However, the website does not provide the name of each isomiR. The nomenclature used in this study for discrete isomiR was derived from its mature miRNA: the name of the isomiR comprises of the name of the corresponding wild type (referenced) miRNA followed by a variant symbol, e.g hsa-miR-21-5p | 3′t-2. The sign | separates the isomiR name into miRNA name and variant symbol. The variant symbol is divided into two parts by the sign (−). The first part indicates the variant type of the isomiR. 3′t (5′a) implies that this isomiR is 3′ trimming (5′ additional) isomiR. The second part denotes the number of the nucleotide that is trimmed or added. In addition, the number of reads are not suitable for analyze. Thus, we calculated the RPM (reads per million mapped reads) of each isomiR. The clinical information of the breast cancer patients was obtained from the website (https://www.nature.com/articles/nature11412#supplementary-information). Since the TCGA website does not provide the expression levels of polymorphic isomiRs, this kind of isomiR was not taken into consideration in this paper. Although the clinical information contained 824 breast cancer patients, only 698 patients had valid clinical information. In this paper, we applied these 698 patients’ isomiR expression levels to identify biomarkers that classify breast cancer subtypes.
Breast cancer subtype reclassification for isomiR identification
Number of patient
Removal of lowly expressed isomiR
Calculating the weight of isomiR by improved mutual information
The mutual information is a powerful method in feature selection. Many mutual information-based feature selection methods have been developed and the performance has proven to be very good . However, these methods has some limitations. Although some methods select a set of features that are very important for classification, they do not provide the weight of the feature. Some methods are applied from the data of which both the feature and the label are discrete or continuous data. However, these methods were not deemed suitable for this type of research. Therefore, an improved mutual information was developed to calculate the weight of each isomiR. This improved mutual information calculated the weight of each isomiR and measured the relationship between features and labels.
This improved mutual information measured the relationship between features and labels. If the feature and the label have high co-relationship, the weight of the isomiR should be large. It implies that this isomiR is more important for the breast cancer subtype classification.
Identification of isomiR biomarkers that classify breast cancer subtypes
A few key isomiRs, which have the highest weights, can distinguish between the different subtypes of breast cancer. These key isomiRs can then be used as breast cancer biomarkers, and they can be identified through these processes: sorting isomiRs by using their weights from large to small, then using the different top N isomiRs to evaluate the performance in the classification of breast cancer subtypes. The performance of this type of breast cancer classification will be raised with the increasing number of selected isomiRs. If the performance of classification by using top N isomiRs is not significantly raised compared to the performance by using top N+1 isomiRs, it implies that these N isomiRs are key isomiRs and can be viewed as biomarkers.
In this paper, the SVM  classifier was applied to classify different subtypes of breast cancer. According to Table 1, different subtypes of breast cancer have variable numbers of patients. Around 68% of breast cancer patients are ER α+HER2-, while nearly 4.4% of breast cancer patients are ER α-HER2+. This dataset is an imbalanced dataset and the SMOTE method was used to balance the data . The receiver operation characteristic (ROC) curve is very popular to judge the discrimination ability of various statistical methods . The area under ROC curve (AUC) measures the performance of the classifier . Since this research is a multiclasses learning, macro-AUC of ROC was used to validate the performance of the classification . Further, 5-fold cross-validation was applied to evaluate the results.
Characterization of isomiRs identified in different subtypes of breast cancer
Identification of isomiRs that classify breast cancer subtypes
After the characterization of isomiRs in breast cancer, we calculated the weight of these isomiRs by using improved mutual information. Finally, we selected different numbers of isomiRs to compute their performance in the classification of breast cancer subtypes. The results and the Python source code of our algorithm can be downloaded from the website https://github.com/ChaowangLan/isomiRbreastsubtype.
The 20 isomiR biomarkers, their weights, and their ratios
Among the isomiRs that faithfully characterize breast cancer subtypes, 7 isomiRs were identified as 5′ variant isomiRs and the other isomiRs were identified as 3′ variant isomiR. Most of these isomiRs were highly expressed compared to their corresponding wild type miRNAs. We calculated the ratio of the expression levels of these isomiRs and their corresponding wild type miRNAs in different subtypes of breast cancer. These ratios are listed in Table 2. If the expression level of an isomiR was larger than the expression level of its corresponding wild type miRNA, the ratio was larger than 1. Among these 20 isomiRs, only hsa-mir-28-3p | 3′a-2 and hsa-mir-22-3p | 5′t-1 were lowly expressed compared to their corresponding wild type miRNAs, the other isomiRs were more abundant. These results denote that many of these isomiR biomarkers are more highly expressed compared to their corresponding wild type miRNAs.
Comparing the performance of improved mutual information to other feature selection methods
IsomiRs are superior biomarkers compared to protein coding gene expression-based approaches for the classification of different subtypes of breast cancer
Over the past decade, many studies have found that protein coding gene expression data can be used to classify breast cancer subtypes. For instance, Van and colleagues proposed that a 70-genes’ expression profile can use for identifying different subtypes of breast cancer , Parker and colleagues defined the PAM50 genes, which are the most famous biomarkers for breast cancer subtype classification , and Neve and colleagues also applied genes expression data for the classification of different subtypes of breast cancer . Their research indicated that differentially expressed mRNAs can be used as breast cancer subtype biomarkers.
IsomiRs may play important regulatory roles in different subtypes of breast cancer
Many studies have found that different categories of isomiRs have different functions in regulating biological processes. For example, 3′ isomiRs have low 3′ untranslated region stability and therefore, loose regulation of mRNAs . 5′ isomiRs have slightly altered seed sequences compared to the corresponding wild type miRNAs; therefore, besides weakening the regulatory effect of the wild type miRNAs they can target mRNAs that are significantly different from the wild type miRNA targeted transcriptome . Based on sequence similarities it is possible to predict potential mRNAs that are regulated by certain miRNAs [41, 42]and therefore, biological pathways that are influenced by miRNAs and their isomiRs. The elevated levels of isomiRs compared to their corresponding wild type miRNAs can also be used to predict changes in the regulation of gene expression in breast cancers that may well provide insight into the molecular mechanisms leading to breast cancer. We predicted that the presence of abundant 3′ isomiRs develop weakened regulatory effects on transcripts that are regulated by the corresponding wild type miRNAs. Thus, mRNAs that are regulated by the wild type miRNAs should show elevated expression levels when the expression level of isomiRs were significantly elevated. These targets that may be affected by the accumulation of 3′ isomiRs can be obtained from the miRWalker2.0 website (http://zmf.umm.uni-heidelberg.de/apps/zmf/mirwalk2/holistic.html). To predict potential targets for abundant 5′ isomiRs with modified seed sequence we used the miRDB website (http://www.mirdb.org/). In order to obtain the most likely targeted mRNAs, the score of the prediction target gene should be higher than 95 (the maximum score is 100).
Five KEGG pathways which are relative to breast cancer progresses and subtype classification
Number of gene
Pathways in cancer
p53 signalling pathway
MAPK signalling pathway
Insulin signalling pathway
Estrogen signalling pathway
The first two KEGG pathways in Table 3 are very important for analysis of breast cancer outcome . This data suggests that isomiRs also play a vital role in breast cancer development. The clinical breast cancer classification is based on the hormone receptor status, some of these KEGG pathways are involved in regulating the hormone receptor status. For example, Neve’s research highlights that up-regulation of genes involved in insulin/MAPK signaling predicts response to Herceptin . It implies that these two signaling pathways regulate the Herceptin status. According to the third and fourth line of Table 3, isomiRs were shown to influence 56 and 29 genes in MAPK and insulin signal pathways, respectively. Therefore, isomiRs could affect the Herceptin statue through these two pathways and lead to the development of different subtypes of breast cancer. We also identified the estrogen signalling pathway represented by 20 genes that is potentially affected by the isomiRs (Table 3). It implies that isomiRs could affect the expression of these genes to influence the estrogen receptor status. Above all, isomiRs may regulate the hormone receptor status via different KEGG pathways and therefore, affecting different breast cancer subtypes.
Assessing the role of individual isomiRs in the regulation of breast cancer specific pathways
The average expression level of isomiRs and miRNA in each breast cancer subtype
Breast cancer subtype
5 ′ variant isomiRs’ predicted target genes
Predicted target mRNA
In this paper, we propose a novel method for identifying isomiR biomarkers for breast cancer subtyping from small RNA sequencing data. We first removed the lowly expressed isomiRs from the data sets. Then we calculated the weight of the isomiR by utilizing the improved mutual information. The improved mutual information measured the co-relationship between the expression level of isomiRs and breast cancer subtypes. The higher the co-relationship between isomiR’s expression and breast cancer subtypes, the more important the isomiR for breast cancer subtype classification. Further, this improved mutual information can be applied to the data set that the feature is continuous data and the label is discrete data. While the traditional mutual information cannot. Finally, the SVM classifier was applied to find specific isomiR biomarkers for classification of the different breast cancer subtypes. This method, proved to be more effective and efficient in identifying fewer key isomiRs needed for breast cancer subtyping in comparison to the Fisher and Hellinger methods. Importantly, in this study, we describe the enhanced identification of isomiR biomarkers for classification of breast cancer subtypes and, in addition, isomiRs were found to be superior biomarkers compared to classification based on mRNA gene expression for this type of classification. Further, applying this improved methodology, we identified individual isomiRs that may be key in the regulation of specific breast cancer pathways. There is great potential in exploiting these novel isomiR regulatory mechanisms as drug-targets for more personalized subtype breast cancer specific therapies.
Discovery of unique biomarkers in different breast cancer subtype is a challenge in research, especially since the regulation mechanism of different breast cancer subtypes is not yet fully understood. Our research provides a new way to explore the mechanism of breast cancer subtypes.
Publication of this article was funded by China Scholarship Council.
Availability of data and materials
The results and the Python source code of our algorithm can be downloaded from the website https://github.com/ChaowangLan/isomiRbreastsubtype.
About this supplement
This article has been published as part of BMC Medical Genomics Volume 11 Supplement 6, 2018: Proceedings of the 29th International Conference on Genome Informatics (GIW 2018): medical genomics. The full contents of the supplement are available online at https://bmcmedgenomics.biomedcentral.com/articles/supplements/volume-11-supplement-6.
EMM, GH, and JL supervised the study. CL processing data, designed the methods, and performed the experiments. HP participated in the analysis and performed some experiments. EMM commented on breast cancer subtype biology and helped to revise the manuscript. GH suggested and commented on miRNA and isomiR biology and helped to write the manuscript. JL revised the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Lan C, Chen Q, Li J. Grouping miRNAs of similar functions via weighted information content of gene ontology. BMC Bioinformatics. 2016; 17(19):507.View ArticleGoogle Scholar
- Li S-C, Liao Y-L, Ho M-R, Tsai K-W, Lai C-H, Lin W-c. miRNA arm selection and isomiR distribution in gastric cancer. In: BMC Genomics, vol. 13. London: BioMed Central: 2012. p. 13.Google Scholar
- Maher C, Timmermans M, Stein L, Ware D. Identifyng microRNAs in plant genomes. In: Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE. Stanford: IEEE: 2004. p. 718–723.Google Scholar
- Hutvágner G, McLachlan J, Pasquinelli AE, Bálint É, Tuschl T, Zamore PD. A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science. 2001; 293(5531):834–8.View ArticleGoogle Scholar
- Swierniak M, Wojcicka A, Czetwertynska M, Stachlewska E, Maciag M, Wiechno W, Gornicka B, Bogdanska M, Koperski L, de la Chapelle A, et al.In-depth characterization of the microRNA transcriptome in normal thyroid and papillary thyroid carcinoma. J Clin Endocrinol Metab. 2013; 98(8):1401–9.View ArticleGoogle Scholar
- Neilsen CT, Goodall GJ, Bracken CP. IsomiRs–the overlooked repertoire in the dynamic microRNAome. Trends Genet. 2012; 28(11):544–9.View ArticleGoogle Scholar
- Chen L, Wong G. Novel tumor biomarker based on isomiR expression profiles. In: Bioinformatics and Biomedicine (BIBM), 2017 IEEE International Conference On. Kansan City: IEEE: 2017. p. 2328–9.Google Scholar
- Juzenas S, Venkatesh G, Hübenthal M, Hoeppner MP, Du ZG, Paulsen M, Rosenstiel P, Senger P, Hofmann-Apitius M, Keller A, et al.A comprehensive, cell specific microRNA catalogue of human peripheral blood. Nucleic Acids Res. 2017; 45(16):9290–301.View ArticleGoogle Scholar
- Telonis AG, Magee R, Loher P, Chervoneva I, Londin E, Rigoutsos I. Knowledge about the presence or absence of miRNA isoforms (isomiRs) can successfully discriminate amongst 32 TCGA cancer types. Nucleic Acids Res. 2017; 45(6):2973–85.View ArticleGoogle Scholar
- Telonis AG, Loher P, Jing Y, Londin E, Rigoutsos I. Beyond the one-locus-one-miRNA paradigm: microRNA isoforms enable deeper insights into breast cancer heterogeneity. Nucleic Acids Res. 2015; 43(19):9158–75.View ArticleGoogle Scholar
- Lynce F, Blackburn MJ, Cai L, Wang H, Rubinstein L, Harris P, Isaacs C, Pohlmann PR. Characteristics and outcomes of breast cancer patients enrolled in the National Cancer Institute Cancer Therapy Evaluation Program sponsored phase I clinical trials. Breast Cancer Res Treat. 2018; 168(1):35–41.View ArticleGoogle Scholar
- Patani N, Martin L-A, Dowsett M. Biomarkers for the clinical management of breast cancer: international perspective. Int J Cancer. 2013; 133(1):1–13.View ArticleGoogle Scholar
- Goldhirsch A, Wood WC, Coates AS, Gelber RD, Thürlimann B, Senn H-J, members P. Strategies for subtypes—dealing with the diversity of breast cancer: highlights of the St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2011. Ann Oncol. 2011; 22(8):1736–47.View ArticleGoogle Scholar
- Ellsworth RE, Blackburn HL, Shriver CD, Soon-Shiong P, Ellsworth DL. Molecular heterogeneity in breast cancer: state of the science and implications for patient care. In: Seminars in Cell & Developmental Biology, vol. 64. Amsterdam: Elsevier: 2017. p. 65–72.Google Scholar
- Taherian-Fard A, Srihari S, Ragan MA. Breast cancer classification: linking molecular mechanisms to disease prognosis. Brief Bioinform. 2014; 16(3):461–74.View ArticleGoogle Scholar
- Santagata S, Thakkar A, Ergonul A, Wang B, Woo T, Hu R, Harrell JC, McNamara G, Schwede M, Culhane AC, et al.Taxonomy of breast cancer based on normal cell phenotype predicts outcome. J Clin Investig. 2014; 124(2):859–70.View ArticleGoogle Scholar
- Lehmann BD, Bauer JA, Chen X, Sanders ME, Chakravarthy AB, Shyr Y, Pietenpol JA. Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J Clin Investig. 2011; 121(7):2750–67.View ArticleGoogle Scholar
- Finn RS, Dering J, Ginther C, Wilson CA, Glaspy P, Tchekmedyian N, Slamon DJ. Dasatinib, an orally active small molecule inhibitor of both the src and abl kinases, selectively inhibits growth of basal-type/“triple-negative” breast cancer cell lines growing in vitro. Breast Cancer Res Treat. 2007; 105(3):319–26.View ArticleGoogle Scholar
- Herold CI, Chadaram V, Peterson BL, Marcom PK, Hopkins J, Kimmick GG, Favaro J, Hamilton E, Welch RA, Bacus S, et al.Phase II trial of dasatinib in patients with metastatic breast cancer using real-time pharmacodynamic tissue biomarkers of Src inhibition to escalate dosing. Clin Cancer Res. 2011; 17(18):6061–70.View ArticleGoogle Scholar
- Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. 2005; 4(1).Google Scholar
- Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H. Feature selection: A data perspective. ACM Comput Surv (CSUR). 2017; 50(6):94.View ArticleGoogle Scholar
- Zhang S, Mo Y. -y., Ghoshal T, Wilkins D, Chen Y, Zhou Y. Novel gene selection method for breast cancer intrinsic subtypes from two large cohort study. In: Bioinformatics and Biomedicine (BIBM), 2017 IEEE International Conference On. Kansan City: IEEE: 2017. p. 2198–2203.Google Scholar
- Zheng K, Wang X. Feature selection method with joint maximal information entropy between features and class. Pattern Recog. 2018; 77:20–9.View ArticleGoogle Scholar
- Gu S, Cheng R, Jin Y. Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Comput. 2018; 22(3):811–22.View ArticleGoogle Scholar
- Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007; 23(19):2507–17.View ArticleGoogle Scholar
- Gu Q, Li Z, Han J. Generalized fisher score for feature selection. In: Twenty-Seventh Conference on Uncertainty in Artificial Intelligence.2011. p. 266–273.Google Scholar
- Weston J, Elisseeff A, Schölkopf B, Tipping M. Use of the zero-norm with linear models and kernel methods. J Mach Learn Res. 2003; 3(Mar):1439–61.Google Scholar
- Yin L, Ge Y, Xiao K, Wang X, Quan X. Feature selection for high-dimensional imbalanced data. Neurocomputing. 2013; 105:3–11.View ArticleGoogle Scholar
- Pearson K. Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material. Philos Trans R Soc Lond. 1895; 186(Part I):343–424.View ArticleGoogle Scholar
- Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al.Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011; 12(Oct):2825–30.Google Scholar
- Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002; 16:321–57.View ArticleGoogle Scholar
- Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve,. Radiology. 1982; 143(1):29–36.View ArticleGoogle Scholar
- Ferri C, Hernández-Orallo J, Flach PA. A coherent interpretation of AUC as a measure of aggregated classification performance. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11).Bellevue: Omnipress: 2011. p. 657–664.Google Scholar
- Zhang M-L, Zhou Z-H. A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng. 2014; 26(8):1819–37.View ArticleGoogle Scholar
- Cieslak DA, Chawla NV. Learning decision trees for unbalanced data. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin: Springer: 2008. p. 241–56.Google Scholar
- Van De Vijver MJ, He YD, Van’t Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, et al.A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002; 347(25):1999–2009.View ArticleGoogle Scholar
- Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T, Davies S, Fauron C, He X, Hu Z, et al.Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009; 27(8):1160.View ArticleGoogle Scholar
- Neve RM, Chin K, Fridlyand J, Yeh J, Baehner FL, Fevr T, Clark L, Bayani N, Coppe J-P, Tong F, et al.A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell. 2006; 10(6):515–27.View ArticleGoogle Scholar
- Burroughs AM, Ando Y, de Hoon MJ, Tomaru Y, Nishibu T, Ukekawa R, Funakoshi T, Kurokawa T, Suzuki H, Hayashizaki Y, et al.A comprehensive survey of 3 animal miRNA modification events and a possible role for 3 adenylation in modulating miRNA targeting effectiveness. Genome Res. 2010; 20(10):1398–410.View ArticleGoogle Scholar
- Tan GC, Chan E, Molnar A, Sarkar R, Alexieva D, Isa IM, Robinson S, Zhang S, Ellis P, Langford CF, et al.5 isomiR variation is of functional and evolutionary importance. Nucleic Acids Res. 2014; 42(14):9424–35.View ArticleGoogle Scholar
- Agarwal V, Bell GW, Nam J-W, Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs. Elife. 2015; 4:05005.View ArticleGoogle Scholar
- Betel D, Koppal A, Agius P, Sander C, Leslie C. Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites. Genome Biol. 2010; 11(8):90.View ArticleGoogle Scholar
- Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A, et al.Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016; 44(W1):90–7.View ArticleGoogle Scholar
- Gasco M, Shami S, Crook T. The p53 pathway in breast cancer. Breast Cancer Res. 2002; 4(2):70.View ArticleGoogle Scholar
- Dressman M, Walz T, Lavedan C, Barnes L, Buchholtz S, Kwon I, Ellis M, Polymeropoulos M. Genes that co-cluster with estrogen receptor alpha in microarray analysis of breast biopsies. Pharmacogenomics J. 2001; 1(2):135.View ArticleGoogle Scholar