In silico pathway analysis based on chromosomal instability in breast cancer patients

Background Complex genomic changes that arise in tumors are a consequence of chromosomal instability. In tumor cells genomic aberrations disrupt core signaling pathways involving various genes, thus delineating of signaling pathways can help understand the pathogenesis of cancer. The bioinformatics tools can further help in identifying networks of interactions between the genes to get a greater biological context of all genes affected by chromosomal instability. Methods Karyotypic analyses was done in 150 clinically confirmed breast cancer patients and 150 age and gender matched healthy controls after 72 h Peripheral lymphocyte culturing and GTG-banding. Reactome database from Cytoscape software version 3.7.1 was used to perform in-silico analysis (functional interaction and gene enrichment). Results Frequency of chromosomal aberrations (structural and numerical) was found to be significantly higher in patients as compared to controls. The genes harbored by chromosomal regions showing increased aberration frequency in patients were further analyzed in-silico. Pathway analysis on a set of genes that were not linked together revealed that genes HDAC3, NCOA1, NLRC4, COL1A1, RARA, WWTR1, and BRCA1 were enriched in the RNA Polymerase II Transcription pathway which is involved in recruitment, initiation, elongation and dissociation during transcription. Conclusion The current study employs the information inferred from chromosomal instability analysis in a non-target tissue for determining the genes and the pathways associated with breast cancer. These results can be further extrapolated by performing either mutation analysis in the genes/pathways deduced or expression analysis which can pinpoint the relevant functional impact of chromosomal instability.


Background
Complex genomic changes that arise in tumors are a consequence of Chromosomal Instability (CIN), which leads to numerical [(N)-CIN] as well as structural chromosomal instability [(S)-CIN] [1]. The increased levels of aneuploidy and structural complexity in these tumors indicate errors in DNA repair, mitotic segregation and cell cycle checkpoints [2,3] and may cause (N)-CIN. Structural rearrangements emerge by anomalous DNA repair pathways that cause abnormalities in both homologous and non-homologous end-joining of doublestranded DNA [4,5]. (S)-CIN may also appear through telomere-mediated events, where decisively short telomeres get identified as DNA breaks capable of recombining (either homologously or nonhomologously) when DNArepair pathways get compromised and leads to activation of telomerase [6]. The mechanism leading to aneuploidy is distinct from structural changes and aneuploidy arises by disruptions in cell cycle checkpoints and errors in mitotic segregation [2,7].
CIN is clinically important as it is associated with poor outcome in patients with cancers of lung, breast and colon [8][9][10] leading to loss or gain of chromosome segments, deletions, translocations, and DNA amplifications [11]. Various studies have reported the correlation between chromosomal aberration and tumor grade and prognosis [8,12]. Cytogenetic studies in cancer cells have recognized the complexity of genomic rearrangements in cancer cells [13] and have reported recurrent abnormalities in a broad range of tumors [14,15].
A link between aneuploidy and/or CIN and poor clinical outcome has been identified by several studies [16]. Cancer cells can be targeted based on the whole chromosome instability (W-CIN) phenotype they carry. National cancer Institute (NCI), USA screened compounds having anticancer activity by examining the data-rich drug discovery panel of NCI-60 cancer cell lines and enlisted potential agents with anticancer activity which targeted chromosomally unstable and aneuploid cancer cells [17][18][19]. NCI also provided a confirmation of the possibility of discovering potential anticancer agents based on the link between their activity and the karyotypic state. An association between aneuploidy and chromosomal instability with distinctive clinical and histopathological features and poor prognosis has also been reported in various cancers [20][21][22]. Thus, the need to target CIN with new combinatorial strategies has been suggested [23].
Data from large scale genome wide projects have unveiled common core signaling pathways which lead to the development of various cancers [24][25][26][27][28]. Studies to delineate pathways involved in pathogenesis of cancers like colon and glioblastoma multiforme [29], have provided characterization of the genes involved in the pathogenesis of the disease, thus making it significant to focus on pathways which involve various genes [30,31]. Genomic aberrations disrupt signaling cascades or pathways in tumor cells thereby causing the tumor to proliferate or dedifferentiate uncontrollably [32]. For instance, deletion in any of the components of TGFβ pathway paves way for some of the breast cancers [33][34][35][36][37]. Therapeutic targeting of pathways that are directly involved in initiation of CIN has also gained clinical interest [20,38]. Pathways-based analysis has gained much importance in the past decade as it is able to, firstly identify the actual genes associated with the phenotype and demarcates them from other false positive hits [39] and secondly marks the biological pathways affected by the genes [40].
The bioinformatics approach can further help in identifying networks of interactions between the genes of interest as well to simultaneously identify biologically informative "linker" genes so as to get a greater biological context of all genes affected by chromosomal instability. This can help to stratify breast cancer patients for choosing optimal treatments and therapies.
Karyotyping aids in efficient single cell screening and identifies important genomic aberrations in normal or diseased samples [41]. A copy number alteration (CNA) is represented by any alteration in banding pattern [42]. This has been indicated by studies which have reported a relation between chromosomal anomalies in peripheral blood lymphocytes (PBLs) and risk prediction in cancers [43][44][45][46]. Blood-test screening is considered a non-invasive, cost effective technique [41]. Also, genetic aberrations in a non-target tissue like PBLs may display related events in target tissue [47].
The present study therefore aimed to identify chromosomal anomalies in PBLs of breast cancer patients to: a) identify the recurring aberrant chromosomal lesions and chromosomal loci that are frequently involved in breast cancer; b) determine the genes harboured by these regions, and to delineate the biological pathway which is enriched by them by bioinformatic tools.

Methods
In the present study 150 patients with confirmed malignant breast cancer were included. The patients were clinically investigated at Sri Guru Ram Das Institute of Medical Sciences and Research, Vallah, Amritsar, Punjab, India. This study was conducted after approval by the institutional ethical committee of Guru Nanak Dev University, Amritsar, Punjab, India. Patients with confirmed malignant breast cancer without any history of any other cancer were included in the study whereas patients having received any kind of therapy (chemotherapy, hormone therapy, radiotherapy or surgery) or blood transfusion, prior to sampling were excluded from the study. After informed consent relevant information including age, gender, occupation, personal history, habitat, habits and disease history were recorded in pre-designed questionnaire. The blood samples of 150 patients and 150 sex and gender matched healthy controls (with no family history of cancer) were collected in a heparinized vial. Peripheral Lymphocyte Culturing was performed by standard 72 h culture method using phytohemagglutinin as mitogen. GTG banding was performed and karyotyping was done following ISCN 2016 [48]. Chromosomal anomalies were assessed in 50-100 metaphases for each subject.
The genes (Table 4) present on the chromosomes involved in anomalies were retrieved from Atlas of Genetics and Cytogenetics in Oncology and Hematology [49] and Genatlas database [50]. On the homepage of Atlas of Genetics and Cytogenetics in Oncology and Hematology, the chromosome number was selected from 'ENTITIES: by chromosomal band' and then the genes present on the particular location/band were identified. On the homepage of Genatlas Database, the list of genes present on a particular chromosomal location/band was retrieved by entering the chromosome number and band in "SEARCH in GENATLAS GENES" search field.
Reactome database from Cytoscape software version 3.7.1 was used to perform functional interaction and gene enrichment analysis on the genes (query genes) that were present on the chromosomal regions that were frequently involved in cytogenetic anomalies in the current study. In the Apps menu on Cytoscape software 'Reactome FI' was selected. After clicking on this menu, six sub-menus appeared out of which 'Gene Set/Mutation Analysis' was selected for performing FI (Functional interaction) analysis on a set of genes. Functional Interaction analysis revealed the involvement of various genes (linker genes) that were linked to the query genes through different networks (Fig. 1). Pathway enrichment analysis was further done on a set of genes that are not linked together by checking 'show genes not linked to others' in FI Network Construction Parameters. Linkers were not used for pathway enrichment analysis as it leads to bias in results. Right clicking on the empty space in the network view panel led to a pop-menu from which following options were subsequently chosen: Reactome FI -Analyze Network Functions -Pathway Enrichment.
The stage-wise comparison of cytogenetic profile of breast cancer patients with controls has been shown in Table 2. The chromatid type aberrations observed in patients included premature centromeric division, chromatid break and gap while the chromosome type aberrations included polyploidy, chromosomal gap, pulverizatrion, telomeric associations, chromosomal break, endoreduplication, robertsonian translocations, acentric fragments, ring chromosomes, deletions. Association between the acrocentric chromosome 13, 14, 15, 21 and 22 were scored separately in all metaphases. Acrocentric associations and telomeric bridges were also scored but not counted in the total aberrations. Telomeric associations were commonly seen in acrocentric chromosomes. Apart from acrocentric chromosomes, chromosome 1, 2, 16, 18, 20 and X were also frequently involved in telomeric associations. Breaks and gaps were the most frequent structural chromosomal aberration observed in various regions of different chromosomes. The chromosomes frequently involved in aberrations like loss, gain, deletion, addition and translocations have been shown in Table 3.
Chromosomal aberrations present in 2% or more that 2% of metaphases in an individual were considered as clonal anomalies. Both structural and numerical clonal chromosomal anomalies were observed in 28 breast cancer patients (Additional file 1: figure S8, S9). Clonal structural chromosomal anomalies observed in 5 cases were: [(46,XX,add(1)(pter → q21::?::q21 → qter)], [45,XX,del (2) The control subjects had predominantly normal karyotype and the chromosomal aberrations found were lesser in frequency as compared to cases. Moreover, no specific or recurring anomaly was observed in controls. Frequency of non-clonal chromosomal aberrations observed in control were: telomeric association 26.3%; robertsonian translocation 14.3%; premature centromeric division    To identify the genes harbored by the chromosomal regions showing increased aberration frequency in present study sample, data was retrieved from Atlas of Genetics and Cytogenetics in Oncology and Hematology [49] and Genatlas database [50] (Table 4).

In-silico analysis
Functional Interaction analysis revealed the involvement of various genes (linker genes) that are linked to the query genes (observed to be harboured by the chromosomal region frequently involved in anomalies in the present study) through different networks (Fig. 1). Pathway enrichment for invasive ductal breast carcinoma was performed to identify the genes invloved in IDC as majority of the patients in the present study sample (89.3%) had IDC of breast (Fig. 2). Linker genes that were involved in IDC were SMAD4, EP300, PIK3CA, TP53, HIF1A and AKT1.
We analyzed pathways on a set of genes that are not linked together by checking 'show genes not linked to others' in FI Network Construction Parameters. Pathway Enrichment analysis revealed that genes HDAC3, NCOA1, NLRC4, COL1A1, RARA, WWTR1, and BRCA1 are enriched in the RNA Polymerase II Transcription pathway (Fig. 3).
Genomes with CIN are characterized by various forms of structural genomic aberrations like amplifications, insertions, reciprocal and non-reciprocal translocations and deletions [5]. In the present study the frequency of various structural (both chromatid type and chromosomal type) and numerical chromosomal aberrations in patients were significantly higher than controls.
Chromosomes that were observed to be frequently involved in aberrations in patients in the present study were 1, 2, 3, 4, 5, 8, 9, 17 and X. Similar aberrations in these chromosomes have been associated with invasive Table 4 Genes harboured by the chromosomal regions recurring in anomalies in present study sample a Source: Atlas of Genetics and Cytogenetics in Oncology and Hematology [49] and Genatlas database [50]  ductal carcinoma of breast and other subtypes [67][68][69]. Among these, chromosomes 8, 14, 4, 18, X, 3, 10, 20, 9 and 1 have also been observed to contain aberrant regions in breast cancer patients [70]. Large retrospective and prospective studies have given the evidence that the patients having tumors with high aneuploidy have a reduction in recurrence free survival rate that is half as long as those in patients with diploid distribution [71,72]. Apart from describing the ploidy of DNA content, i.e. diploid or aneuploid, the ploidy-based classification has also been used to understand the degree of genomic instability which reveals the inconsistency of the DNA content in the tumor cell population [73,74]. In patients with mosaic variegated aneuploidy, premature sister chromatid separation is observed in more than 50% of lymphocytes. In various tissues aneuploidy is seen in more than 25% cells and this enhanced level of aneuploidy leads to higher chances of cancer in these patients [59,75].
The pathway analysis was performed by Reactome FI to find the linker genes. Pathway enrichment was then performed to further narrow down to the linker genes that were specifically involved in IDC of breast and the genes identified here were SMAD4, EP300, PIK3CA, TP53, HIF1A and AKT1. SMAD4 has been known to be mainly involved in pancreatic and colorectal cancer [76]. Mutations in EP300 have been frequently found in skin squamous cell carcinoma and various types of lymphomas [77]. PIK3CA has been reported in higher frequency in endometrial, breast and bladder cancers [78]. TP53 is a tumor suppressor gene and has been found to be mutated in a variety of cancers [79]. As a result of loss of function of various tumor suppressors, the levels of HIF1A increase, indicating that higher HIF1 activity is a common pathway in the pathogenesis of various human cancers [80]. Mutations in regulators of AKT1 signalling pathway have been known to induce oncogenic transformation in human cell. These have been observed mainly in glioma and endometrial cancer but infrequently in cancers like prostate cancer, melanoma, non-small cell lung cancer, breast cancer and hepatocellular carcinoma [81].
Finally, pathway analysis was performed not taking linked genes into account this time. Pathway Enrichment in Analyze Network Functions was performed in Reactome FI application of Cytoscape to find which cellular pathway is enriched by our query genes and the analysis narrowed to 7 genes: HDAC3, NCOA1, NLRC4, COL1A1, RARA, WWTR1, and BRCA1 which were identified to be involved in RNA polymerase II transcription pathway. It was revealed that the genes were significantly enriched in RNA Polymerase II transcription pathway (p = 0.002, FDR = 0.01). RNA Pol II is involved in gene transcription by playing significant role in recruitment, initiation, elongation and dissociation [82,83]. The role of RNA polymerase II transcription in tumorigenesis has been elucidated in previous studies [84]. It was observed in mouse lymphoma models that tumor cells develop more sensitivity to apoptosis when compared to wild-type cells after treatment with RNA polymerase II transcription inhibitors [85][86][87]. Enhanced transcription of oncogenes and various transcription factors is associated with transformation in cancer cells [88]. Components of transcriptional apparatus, various oncogenes and ribosomal genes get over expressed in tumor cells in order to maintain proliferation [89][90][91]. RNPII transcription additionally is required to meet the high need of transcripts like oncogenes and anti-apoptotic factors, which is required to support fast growth and resistance to apoptosis [92].
Majority of the subjects, patients (67.3%) and controls (84%), in the present study were obese with increased central obesity. In the context of obesity, the tumor microenvironment induces an enhanced level of tumorinfiltrating myeloid cells with an activated NLRC4 inflammasome which further activates IL-1b, thus driving progression of disease through adipocyte-mediated VEGFA expression and angiogenesis [99]. Obesity might aid the progression of cancer through the pathways linked with NLRC4 and VEGFA. Thus, prevalence of obesity can have implications for breast cancer risk in the present study sample also. Cellular expression of COL1A1 has been reported to possibly promote breast cancer metastasis. This became evident from a study which reported that high levels of COL1A1 were associated with poor survival and a better response to cisplatin-based chemotherapy was observed in ER + breast cancer patients who had increased COL1A1 levels [100]. Breast cancers displaying RARA amplifications show sensitivity to retinoic acid [101] and thus these subtypes of breast cancers can be treated with targeted therapies [102]. WWTR1 also plays a significant role in migration, invasion and carcinogenesis of breast cancer cells [103]. BRCA1 interacts with a variety of other proteins to carry out multiple functions at cellular level like controlling cell cycle, DNA damage repair, regulation of transcription, replication, recombination and chromatin hierarchical control [104]. In breast cancer patients from same geographical region of north India no association of breast cancer risk with BRCA1 variants c.190 T > C, 1307delT, g.5331G > A and c.2612C > T was observed [105].
Previous reports have also highlighted the significance of integrative analysis of copy number variations and gene expression profiles in breast cancer [106,107]. The current study employs the information inferred from chromosomal instability for determining the genes and the pathways associated with breast cancer. The genes/ pathways deduced can be further extrapolated by looking for potential mutations that act as key players in breast carcinogenesis. Following up on a lead from the present study, gene expression of the same individuals can be performed. This expression profiling can pinpoint the relevant functional impact of chromosomal instability.

Conclusion
Breast cancer is a heterogenous disease where mutations in various genes can lead to disease progression. Therefore it becomes important to mark out the cellular pathways involving multiple genes for getting a deeper insight of cancer causation. The present study is a first of its kind where the results of conventional cytogenetics have been exploited to perform gene enrichment analysis. The in silico pathway analysis based on chromosomal instability in PBLs of breast cancer patients hinted towards the RNA polymerase II transcription pathway. Association with breast cancer risk of variants in some of the genes (p53, HIF, BRCA1 and VEGF) involved in this cellular pathway has been reported from the same population of North India. Further experimental work can help in identifying mutated genes in the pathway and sub-networks to find their relation with breast cancer progression and metastasis.

Supplementary information
Supplementary information accompanies this paper at https ://doi. org/10.1186/s1292 0-020-00811 -z. Research, Vallah, Amritsar, Punjab for allowing access to patients and other facilities helpful in carrying out the present study.
Authors' contributions VS and KG conceptualized and designed the experiment. AK performed the experiments. AK and VS analyzed the results and prepared the manuscript. All the clinicians NRS, MSU, MM, and MS were involved in the acquisition, analysis, correlation and interpretation of the clinical data of patients. NRS and MSU diagnosed the cancer patients and carried out the relevant clinical testing and surgery. The tumor sample excised surgically was used to classify the patients into different stages by MM who carried out the histopathology and hormone receptor related tests. MS correlated the clinical data, the epidemiological data and coordinated the therapy regime of patients. All authors read and approved the final manuscript.

Availability of data and materials
The raw datasets generated and/or analyzed during current study are not publicly available in order to protect participant confidentiality. The genes present on the chromosomes involved in anomalies can be accessed from Atlas of Genetics and Cytogenetics in Oncology and Hematology [49] https ://atlas genet icson colog y.org/ and Genatlas Database [50] https ://genat las.medec ine.univ-paris 5.fr/. On the homepage of Atlas of Genetics and Cytogenetics in Oncology and Hematology, the chromosome number was selected from 'ENTITIES: by chromosomal band' and then the genes present on the particular location/band were identified. These genes have the following atlas ID(s) on the Atlas of Genetics and Cytogenetics in Oncology and Hematology-1p32:

Ethics approval and consent to participate
This study was conducted after approval by the institutional ethical committee of Guru Nanak Dev University, Amritsar, Punjab, India. All the subjects gave their written consent to participate in the study and volunteered to provide 5 ml of their blood sample and their personal information. All the study subjects were more than 20 years of age; no children were included in the study. Therefore, parental consent was not obtained.