We use the following three metrics to measure NMD activity in a biological sample: (1) the mRNA expression level of NMD target genes, (2) the usage (i.e., percentage) of NMD-inducing splicing isoform in NMD target genes which have both NMD-inducing and NMD-free isoforms, and (3) the abundance ratio of mRNAs derived from the NMD-inducing allele to the NMD-free one of the same gene. We name these three metrics as R
mRNA
, R
isoform
, and R
allele
, respectively. In principle, if NMD activity is strong in a sample, these metrics will have small values, because the NMD-inducing forms are more effectively degraded. To calculate these metrics, we collected RNA-seq data from a large-scale study [9] which produced data for 72 patients with each patient having both tumor and adjacent normal tissue samples sequenced (Additional file 1: Table S1).
We first describe the procedure to calculate the metric R
mRNA
, i.e., the mRNA abundances of NMD target genes. The first step was to identify NMD target genes. For that, we collected NMD target genes from four studies [15,16,17,18]. These studies measured gene expression changes after inhibiting NMD. We classify all genes whose expression was upregulated upon NMD inhibition as NMD targets. Note the targets by this approach may include direct as well as indirect NMD targets, but our purpose is to identify genes which can be used as an indicator of NMD activity, so any genes with upregulated expression after NMD inhibition are informative. In total, 951 genes are classified as NMD targets by at least one study, and 50 genes are supported by at least three studies (see Methods section). For accuracy, we use these 50 genes as our NMD target gene set (Additional file 2). To eliminate gene expression variation across samples due to systematic bias, we normalize the expression of NMD targets by dividing it with the median expression value of the 4074 non-target genes (Additional file 2; also see Methods). Then R
mRNA
is calculated for each gene using the following Eq. (1):
$$ {R}_{mRNA}={mE}_{NMD}/ median\_{mE}_{nonNMD} $$
(1)
where mE
NMD
is the mRNA expression of an NMD target gene and median_mE
nonNMD
is the median value of all 4074 non-target genes from the same sample. To infer NMD activity in tumor relative to in a normal tissue, R
mRNA
(tumor)/R
mRNA
(normal) is calculated, i.e., the ratio of R
mRNA
between a tumor and the corresponding normal tissue. As shown in Fig. 1, the ratio R
mRNA
(tumor)/R
mRNA
(normal) varies both among genes and among patients, and in a few patients it deviates from unity significantly (Fig. 1; Additional file 1: Table S2). This result suggests that most tumor samples have NMD activity comparable to their normal baselines and that some tumor samples experienced dramatic changes in NMD activity.
Next, we calculated R
isoform
, the usage of NMD-inducing alternative splicing (AS) isoforms in a given gene, as follows:
$$ {R}_{isoform}={sE}_{NMD}/\left({sE}_{NMD}+{sE}_{nonNMD}\right) $$
(2)
where sE
NMD
and sE
nonNMD
are the abundances of NMD-inducing and NMD-free splicing isoforms, respectively. Compared to the metric R
mRNA
, R
isoform
is supposed to be more sensitive for two reasons: (1) it can detect NMD activity changes when the changes only affect the relative abundance of NMD-inducing splicing isoform but not the total mRNA abundance of a gene; (2) R
isoform
uses the abundance of NMD-free isoforms of the same gene as a normalizing factor which is better than using the expression of other genes as this factor (as in R
mRNA
), because it is possible that the expression dynamics across samples may vary among genes. For calculating R
isoform
, we identified 776 AS events from 734 genes. Five hundred and twenty-eight and 248 cases trigger NMD when the alternative exon is included and excluded, respectively (see Methods for details). Among them, 14 AS events occur in all the samples (Additional file 3). To calculate R
isoform
in each sample, we use either all informative AS events occurring in the sample or only the 14 shared AS events. Similar to calculating R
mRNA
, we use the ratio of R
isoform
(tumor) to R
isoform
(normal) to account for the baseline difference among patients. As shown in Fig. 2 and Additional file 1: Table S3, when using all informative AS events in each sample, the ratio R
isoform
(tumor)/R
isoform
(normal) significantly deviates from unity in dozens of patients. The number of significant deviations is much larger than that when using R
mRNA
, verifying the inference of higher sensitivity by R
isoform
. When using the 14 shared AS events, we basically observe the same pattern (Additional file 1: Figure S1, Spearman’s Rho = 0.6127131, P = 1.063e-08). Although the metric R
isoform
is more sensitive than R
mRNA
, the ratios of tumor to normal samples from the two metrics are positively correlated (Additional file 1: Figure S2A, Rho = 0.3690621, P = 0.001422). The relationship remains when using the 14 shared events only (Additional file 1: Figure S2B, Rho = 0.2581838, P = 0.02882).
Finally, we calculate R
allele
as follows:
$$ {R}_{allele}={aE}_{NMD}/{aE}_{nonNMD}, $$
where aE
NMD
and aE
nonNMD
are the abundances of mRNAs derived from the NMD-inducing and NMD-free alleles of the same gene, respectively. For this analysis, all detected heterozygous nonsense mutations are included (Additional file 4). And again, the ratio of R
allele
(tumor) to R
allele
(normal) is calculated for each patient to infer NMD activity change. This approach has been used in multiple studies to check NMD efficiency [20,21,22]. However, in our analysis we do not detect significant difference between tumor and normal samples for any patient, probably due to a limited number of sites (Fig. 3; Additional file 1: Table S4). Nevertheless, a positive correlation between the median ratios of R
allele
(tumor)/R
allele
(normal) and R
mRNA
(tumor)/R
mRNA
(normal) across patients is observed (Additional file 1: Figure S3A, Rho = 0.2134399, P = 0.07183). And similarly, a slightly better correlation between (tumor)/R
allele
(normal) and R
isoform
(tumor)/R
isoform
(normal) is observed (Additional file 1: Figure S3B, Rho = 0.3099928, P = 0.00805). These results indicate that the metric R
allele
can capture the NMD activity change in tumors, but more sites are needed for making the metric effective enough.
Next we examine how these metrics are correlated with the expression of nine NMD effectors, including Upf1, Upf2, Upf3a, Upf3b, Smg1, Smg5, Smg6, Smg7 and Pnrc2 [23, 24]. Theoretically, we expect a negative correlation between the metrics of NMD activity and the expression of these factors, because a smaller metric value means stronger NMD activity conferred by higher expression of NMD effectors. We found that R
mRNA
is negatively correlated with the expression of smg1, smg7, upf2 and pnrc2, though not always statistically significant (Additional file 1: Table S5). Surprisingly, R
mRNA
is positively correlated with the expression of upf1, upf3a, and smg6 in some cases. These results suggest that NMD effectors may have contributed to NMD activity in a discordant way, with some effectors being more important than other in affecting NMD activity. However, it is unclear why some effectors such as upf1 are expressed at higher level where the NMD activity appears weaker. We also examine similar correlations by using R
isoform
, and none of these is significant and the signs of correlation are not always consistent with expectation (Additional file 1: Table S5). These together suggest that the change of NMD activity may not be consistent with the expression change of all NMD effectors, and thus a better understanding of NMD effectors in a regulatory network is needed for inferring NMD activity from the effectors’ expression.