Comprehensive analyses to identify LINC02310 as an enhancer in lung adenocarcinoma

Lung adenocarcinoma (LADC) is a major subtype of non-small cell lung cancer (NSCLC) and has one of the highest mortality rates. An increasing number of long non-coding RNAs (lncRNAs) were reported to be associated with the occurrence and progression of LADC. In order to find potential biomarkers for LADC therapies among lncRNAs, 2431 differentially expressed lncRNAs (DElncRNAs) over LADC and adjacent normal samples were identified from The Cancer Genome Atlas (TCGA) expression data. Lnc-YARS2-5, lnc-NPR3-2 and LINC02310, which were the top-3 significant DElncRNAs related to overall survival, were selected for further analysis. Their overexpression indicated poor prognostic. PCR experimental results also showed that they were mainly up-regulated in the LADC tissues from Xiangya Hospital. Clinical analysis of these cases suggested that LINC02310 was significantly correlated with TNM-stage and T-stage, which was further validated by the clinical analysis based on TCGA. Reasonably, we assumed that LINC02310 acted as an enhancer in LADC, which was demonstrated by CCK-8 assay and colony formation assay. Moreover, 3 targeted miRNAs of LINC02310 and 414 downstream differentially expressed mRNAs (DEmRNAs) were predicted. The competing endogenous RNA (ceRNA) regulatory network was constructed with LINC02310, 3 miRNAs and 113 DEmRNAs involved. The downstream mRNAs were then enriched in 405 GO terms and 11 KEGG pathways, which revealed their potential functions and mechanisms. The protein-protein interaction (PPI) network showed the relationships between the downstream mRNAs. In conclusion, this study verified LINC02310 as an enhancer in LADC and conducted comprehensive analyses on its target miRNAs and downstream mRNAs.


Introduction
NSCLC is a major cause of cancer-derived death worldwide, accounting for 85% of all lung cancers [1].
It can be classified into four subtypes: Lung adenocarcinoma (LADC), lung large cell carcinoma (LCLC), lung squamous cell carcinoma (LSCC) and lung neuroendocrine tumor (Lung NET) [2]. LADC, as a major type of NSCLC, has one of the highest mortality rates. Although LADC treatments including surgery, chemotherapy, radiotherapy and other molecular target therapies have been improved, the long-term survival remains dim. Therefore, it is quite necessary to find new biomarkers for precise LADC treatments and predicting prognostic.
LncRNAs, which were previously considered as transcriptional noise, have been found to participate in regulatory biological processes such as transcriptional regulation, RNA processing and chromatin interaction. LncRNAs also participate in the stability and translation of mRNA and influence cell signaling [3,4]. In addition, lncRNAs have been reported to promote LADC cell proliferation, invasion, metastasis and tumor progression, such as lncRNA NEAT1 [5], DGCR5 [6], H19 [7]. Moreover, previous studies showed that lncRNAs play an important role in cancer drug resistance [8][9][10][11]. Therefore, it is reasonable to find certain new prognostic biomarkers for LADC in lncRNAs.
In this research, differentially analysis over lncRNA expression data from TCGA were conducted to screen DElncRNAs. Next, the prognostic value of DElncRNAs in LADC patients was estimated by survival analysis. The top-3 significant DElncRNAs associated with overall survival were selected for further analysis. Then, PCR experiments and the relevant clinical analyses were conducted to validate the part these three DElncRNAs play in LADC, among which LINC02310 was the most meaningful.
Moreover, CCK-8 assays and colony formation assays were performed to demonstrate that LINC02310 acted as an enhancer in LADC. Furthermore, ceRNA network was constructed to find the targeted miRNAs and downstream mRNAs. In addition, GO and KEGG pathway analyses of the downstream mRNAs provided a comprehension of their functions and mechanisms. Finally, PPI network was constructed to visualize the interactions between these downstream mRNAs. Overall, this study identified LINC02310 as an enhancer in LADC by bioinformatic analyses and experimental validation and conducted a comprehensive research on its targeted miRNAs and downstream mRNAs, which might benefit LADC diagnoses and therapies.

LADC data sets and preprocessing
The LADC data sets, including RNA expression quantification data and related clinical data were downloaded from TCGA database (https://portal.gdc.cancer.gov/). The expression of mRNAs and lncRNAs were obtained from HTSeq-counts files on Illumina platform. A total of 533 LADC samples and 59 adjacent normal samples were collected. The IDs of mRNAs and lncRNAs were converted using GENCODE Human Release 31 (GRCh38.p12; https://www.gencodegenes.org/human/).

Survival analysis
The R package survival (version 2.44-1.1) was used to evaluate the prognostic of DElncRNAs in LADC patients from TCGA. The expression data were normalized by the R package DESeq2. The LADC samples were divided into two groups: high expression (normalized counts > median) and low expression (normalized counts < median) in terms of each DElncRNA. The Log-Rank tests of survival probability in the two groups were conducted with P-value < 0.05 considered to be significant. The

PCR experimental validation
LADC and adjacent normal tissues were obtained from 22 patients who suffered from LADC and treated in 2015 in Xiangya Hospital, Central South University. All subjects had written consents, and the protocol was approved by the ethics committee of Xiangya Hospital (No. 201503303). All procedures conducted in the study were in accordance with the Ethics Standards Institutions Research Committee and the 1964 Helsinki Declaration and its subsequent amendments or similar ethical standards. All patients had no history of other malignancies and never received radiotherapy or chemotherapy. All tissue samples were immediately frozen in liquid nitrogen and stored at -80 °C.
All tumors and matched normal tissues were confirmed by two pathologists. Total RNA was extracted using the Trizol Reagent kit (Invitrogen) and was reversely transcribed into cDNAs by using HiScript II Q Select RT SuperMix for qPCR (Vazyme). Real-time PCR was performed on the cDNA templates using specific primers (TSINGKE) and the GeneCopoeia BlazeTaq™ SYBR® Green qPCR mix (GeneCopoeia).
The lncRNAs relative expression levels were calculated as a ratio normalized to U6 expression.

Clinical analysis
The clinical analyses of the top-3 significant DElncRNAs were conducted based on the expression levels obtained from the above PCR experiments and the clinical tissues of the 22 patients of LADC.
Moreover, LINC02310 expression and related LADC clinical data from TCGA were combined to verify the clinical relevance. Fisher's accurate test method was used to calculate the P-value.

Colony formation assay
For the colony formation assays, the cell line H1299 (1×103) transfected with OE, NC, SI and SI-NC were transferred into 6-well plates and then incubated at 37°C for 10 days. Colonies were dyed with dyeing solution containing 0.1% crystal violet and 20% methanol. Cell colonies were then counted and analyzed.

GO functional annotation and KEGG pathway analysis
The R packages clusterProfiler (version 3.10.1) [34] and org.Hs.eg.db (version 3.7.0) were used for statistical analysis and visualization of functional profiles for the predicted downstream DEmRNAs.
The Gene Ontology (GO) functional annotation, including molecular function (MF), biological process (BP) and cellular component (CC) [35], and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were conducted with P-value < 0.05. The mRNA IDs were converted for the preparation of KEGG according to HGNC database (https://www.genenames.org/).

PPI network
Interactions of the downstream DEmRNAs were predicted and organized according to the STRING database (version 11.0; https://string-db.org/). The interactions were screened with combined score > 0.7 and then divided into different modules using MCODE (version 1.5, an app for Cytoscape, http://apps.cytoscape.org/ apps/mcode). The module of highest combined score was selected to construct the PPI network, which was visualized using the software Cytoscape.

DElncRNAs in LADC
A total of 12459 lncRNAs were extracted from TCGA database. Figure 1 shows the volcano plots of RNAs in LADC and adjacent normal samples with a cutoff |LFC| > 1.5 and FDR < 0.05. As a result, 2431 DElncRNAs (Additional file 1: Table SI, 1931 up-regulated and 500 down-regulated) were identified. The top-50 DElncRNAs were visualized in the heatmaps (Figure 2).

Survival analysis of the DElncRNAs
The Log-Rank tests of survival probability between high expression and low expression groups in terms of each DElncRNA were conducted. Setting P-value < 0.05 as a cutoff, 389 DElncRNAs were identified as significantly related to overall survival (Additional file 2: Table SII). Lnc-YARS2-5, lnc-NPR3-2 and LINC02310 were the top-3 significant DElncRNAs related to survival. The KM survival plots of these three significant DElncRNAs were shown in Figure 3. It can be observed that high expression of the three DElncRNAs lead to poor survival rates.

PCR experiments and clinical analyses
Lnc-YARS2-5, lnc-NPR3-2 and LINC02310, which were the top-3 significant DElncRNAs related to overall survival, were further investigated by PCR experiments based on the expression level in LADC tissues and its related clinical features. The expression quantification ( Figure 4) were obtained with the 2 −ΔΔCt method and showed that the three DElncRNAs were mainly up-regulated in the LADC tissues, which was consistent with the survival analysis results. The expression level was classified as high expression group (LFC > 0.5) and low expression group (LFC < -0.5). The clinical analysis (Table   1) showed the three lncRNAs were correlated with clinical T-stage and TNM-stage.  It can be assumed that LINC02310 promotes cell proliferation in LADC since the expression of LINC02310 is positively associated with LADC progression. To verify this assumption, the linear transcript expression vector was constructed. Next, OE, NC, SI and SI-NC were transfected into H1299 and PC-9 cell lines, respectively. Subsequently, the effect of LINC02310 on cell proliferation was examined. CCK-8 assays showed that forced expression of LINC02310 enhanced proliferation of H1299 and PC-9 ( Figure 5). Similarly, colony formation assay indicated that H1299 cell proliferation was significantly increased by forced expression of LINC02310 ( Figure 6). It can be concluded that LINC02310 overexpression substantially promoted cell proliferation and depressing LINC02310 expression conspicuously inhibited cell proliferation.

CeRNA regulatory network construction
Binding sites of miRNAs with LINC02310 were illustrated in Figure 7 according to lncRNASNP2 database. Binding miRNAs with reads of exon model per million mapped reads (RPM ≥ 1) were selected to interact with 2227 DEmRNAs (|LFC| > 2 and FDR < 0.01, Additional file 3: Table SIII) via starBase. As a result, has-miR-1270, has-miR-506-3p and has-miR-330-3p were involved in 561 interactions with 414 DEmRNAs (Additional file 4: Table SIV). The downstream DEmRNAs with |LFC| > 3.5 and FDR < 0.01 were selected from these interactions to construct the ceRNA network ( Figure 8) using Cytoscape. The ceRNA network contained 117 nodes and 149 edges with LINC02310, 3 miRNAs and 113 DEmRNAs involved.

GO functional annotation and KEGG pathway analysis
GO analysis result (Additional file 5~7: Table SV~SVII)  with DNA binding activities. In addition, the most enriched CC was spindle ( Figure 9C). KEGG pathway analysis ( Figure 9D, Additional file 8: Table SVIII) showed that 11 pathways were significantly enriched with P-value < 0.05, which indicated that the downstream DEmRNAs were conspicuously associated with metabolism activities.

Protein-protein interaction network analysis
A total of 1081 pairs of DEmRNA interactions with 226 downstream DEmRNAs involved were identified by the STRING database (Additional file 9: Table SIX). These interactions were analyzed and divided into different modules using MCODE. 35 DEmRNAs involved in the module with the highest combined score were selected to construct the PPI network ( Figure 10). It can be observed that CDK1 has the highest degree (degree = 57).

Discussion
LADC is a major subtype of NSCLC and has a very high mortality rate [36]. It is quite necessary to provide diagnostic biomarkers for LADC treatments. This study identified LINC02310 as an enhancer were visualized, most of which were reported to associate with cancers. For example, nuclear division, which was the most significant BP term, is associated with promoting malignant progression in colorectal cancer [38]. Transcription factors play important roles in various cancers [39][40][41]. More potential biomarkers for LADC may be found by investigating these downstream DEmRNAs.
Furthermore, the PPI network of the downstream DEmRNAs was constructed and the genes of the highest degrees were obtained, including CDK1, TOP2A and KIF11, which were reported to take a remarkable part in proliferation and drug resistance [42][43][44][45][46][47].

Conclusions
In this study, LINC02310, which has rarely been investigated previously, was identified as an

Conflicts of Interest
The authors declare no conflicts of interest.       Binding sites of miRNAs with LINC02310.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.