It is well accepted that the accumulation of multiple genetic events in different genes and molecular pathways is the main cause of OSCC evolution [3, 32]. Previous studies have identified various types of genetic aberrations in OSCCs and oral dysplasias, the precursors of OSCCs, including somatic mutations in the D-loop of mtDNA sequence  and in exons nine and 20 of Phosphatidylinositol 3-kinase gene (PI3K) , common deletions on chromosome 3p such as the 3p14 locus that harbors FHIT (Fragile Histidine Triad) [35–37], as well as gene copy number increases in certain oncogenes such as EGFR and CCND1[31, 38, 39]. Evidence from microarray studies has also revealed differentially expressed genes in oral cavity tumors [40–44], suggesting multiple dimensions of genetic aberrations contributing to OSCC development. Here, we presented a whole transcriptome analysis to identify exonic somatic mutations in two OSCC samples. To overcome the small sample size, we have developed a stringent bioinformatic pipeline with multiple filters to reduce false positives. In total, we have identified 515 SMGs which were significantly mutated, and 156 TDGs with disruptive mutations in both tumor samples. We also measured gene expression and found SMGs were enriched in differentially expressed genes, implying that the accumulation of genetic aberrations may regulate corresponding gene expression and further affect tumor evolution.
Five of six genes identified in both SMGs and TDGs are known driver genes in COSMIC database, and the remaining gene GTF2H5 stimulates the ATPase activity of ERCC3, a nucleotide excision repair gene, to trigger DNA opening during DNA repair. Since genes involved in DNA repair functions are commonly associated with oral cancer [45–47], it is very likely that GTF2H5 is also related to carcinogenesis. Collectively, these observations indicate our bioinformatic pipeline has substantial power to identify tumor-related genes.
Of 515 SMGs, several membrane-related GO terms were enriched, including intrinsic to membrane, integral to membrane, intrinsic to plasma membrane and integral to plasma membrane. Interestingly, the original study found that the term of intrinsic to plasma membrane was enriched in mis-regulated genes in tumor samples , suggesting that disruption or mis-expression of genes related to plasma membrane may be involved in tumor development. We also found six tumor related genes, TUSC2, TP53I3, TSSC4, RAB23, RAB39A, and ERG, which function as either tumor suppressor genes or oncogenes. Additionally, we identified FGF2, a fibroblast growth factor in the FGF signaling pathway, which was reported to be important in OSCCs . Another pathway potentially associated with OSCC is the cell adhesion molecules (CAMs), which is also enriched in SMGs. CAMs are essential component to maintain the structure of stratified squamous epithelium and a critical mediator of tumor progression in OSCC [48, 49], and mis-expression or dysfunction of CAMs are shown to contribute to malignant tumors . The excess of CAMs in SMGs further suggests the critical role of the CAM pathway in OSCCs. Again, cell adhesion was found to be enriched in mis-regulated genes in the original study , confirming that tumor development may involve both mis-expression and dysfunction of CAMs.
Tumor driver genes are normally considered as with high somatic mutation rate, thus 156 TDGs identified without information from mutation rate are intriguing. Besides six genes also identified as SMGs, we also found that 57 (37%) TDGs significantly mutated in one tumor sample but not in the other tumor sample. Considering that only two patients were used in this study and a large proportion of TDGs were significantly mutated in only one sample, it is possible that some TDGs are in fact SMGs, but failed to be identified here due to the small sample size. We thus suggest that screening TDGs may be an alternative way to identify candidate cancer driver genes when sample size is limited.
Although a few pioneer studies demonstrated that RNA-Seq is suitable for identifying somatic mutations [14–17, 50, 51], there is a concern that RNA-Seq is prone to error  and may generate a high false discovery rate due to incorrect alignment of reads, sequencing errors or extremely high or low read coverage. To minimize the false positive rate, we have applied a series of stringent filters. First, we only used reads in which each base has a Q-score ≥ 20, which reduces the influence of sequencing errors. Next we filtered out read alignments with a mapping quality lower than 30, which avoids reads mapped to multiple locations alignments with low similarity. Then we required each qualified variant must have a read depth between three and 500. Our strategy to identify somatic mutations also automatically removed the effect of systematically incorrect alignments which present in both tumor and matched samples. Hence we believe that somatic mutations identified in this study provide a substantial list of candidates for biomarker development. However, it should also be noted that we only focused on exonic regions captured by RNA-Seq, somatic mutations in regulatory regions will not be identified here, therefore out list also presents a portion of somatic mutations in OSCCs.