Identification of biomarkers for amyotrophic lateral sclerosis by comprehensive analysis of exosomal mRNAs in human cerebrospinal fluid.

BACKGROUND
Exosomes are a subset of extracellular vesicles 30-200 nm in diameter secreted from cells, which contain functional mRNAs and microRNAs. Cerebrospinal fluid (CSF) is the primary source for liquid biopsy to examine diseases in central nervous system. To date, there is no available method to analyze exosomal mRNAs comprehensively in human CSF.


METHODS
The main purpose of this study is to established the methodology of comprehensive analysis of exosomal mRNAs in CSF by a highly sensitive next-generation sequencing. The signatures of CSF exosomal mRNAs were then compared between four normal healthy donors and four sporadic amyotrophic lateral sclerosis patients to identify disease-related biomarkers. Differentially expressed genes were identified by DESeq2.


RESULTS
RNA sequencing from CSF exosomes was successfully performed, that was demonstrated by the high pearson's product-moment correlation coefficient (r = 0.993) in the technical replicates. Also, position coverage analysis revealed that most detected mRNAs retained their integrity throughout their full-length in CSF exosomes. In CSF exosomes from normal healthy donors, an average of 14,807 genes were detected, of which 4580 genes were commonly detected among four individuals, including neuron-enriched genes such as TUBB3 and CAMK2A. In comparison with exosomal mRNAs in CSF from four patients with amyotrophic lateral sclerosis, 543 genes were significantly changed, as represented by CUEDC2. Gene Ontology analysis and pathway analysis with these genes revealed functional enrichment of ubiquitin-proteasome pathway, oxidative stress response, and unfolded protein response. These pathways are related to pathomechanisms of amyotrophic lateral sclerosis.


CONCLUSION
We successfully established the methodology of comprehensive analysis of exosomal mRNAs in human CSF. It was shown to be useful to identify disease biomarkers for central nervous system. Several genes, such as CUEDC2, in CSF exosomes were suggested to be candidate disease biomarkers for amyotrophic lateral sclerosis.


Background
Extracellular vesicles (EVs) are secreted from many types of cells to extracellular space [1]. Exosomes are known as a representative subset of EVs and are seen as cup-like shaped nanovesicles (30-200 nm) by transmission electron microscope, characterized by the presence of exosome-enriched protein markers (e.g., CD9, Flotillin-1) [2,3]. Secretion of exosomes from the cells is considered to be dependent on the formation of multivesicular bodies preceded by sorting of cargo contents [4]. Rab GTPase and ESCRT proteins may play roles in these processes [5]. At first, exosomes were considered as a "trash box", in which to throw away unnecessary proteins from the cells, particularly in the maturation of erythrocytes [6]. Lötvall's group reported the presence of mRNAs and microRNAs in exosomes secreted from mast cell lines. These exosomal RNAs were transferrable and functional in recipient cells [7]. That report shed light on the exosomes as a novel mechanism of cell-to-cell communication. Exosomal mRNAs were stable against RNase and freeze-thaw process [8,9]. These findings led to the hypothesis that exosomes play some important physiological roles. Therefore, perturbation in exosomal cargos in any disease state can be biomarkers for diagnosis, prognosis, and patient stratification. In addition, their elucidation will lead to better understanding of disease mechanisms. To date, many studies have reported specific signatures of microRNAs in plasma or serum as candidate diagnostic biomarkers [10].
In studies or examination of central nervous system, the biofluid primarily focused on is cerebrospinal fluid (CSF), as it comes into direct contact with brain tissue within the blood-brain barrier. Liquid biopsy of CSF is particularly important, as it is unfeasible to collect brain tissue from living individuals [11]. Exosomes are present in CSF and considered to reflect biological information of their source cells, especially brain tissue.
Amyotrophic lateral sclerosis (ALS) is an adult onset, fatal neurodegenerative disease characterized by the loss of motor neurons [12]. There are large unmet medical needs of disease-modifying therapy (DMT) for ALS. Indeed, nearly 50 randomized controlled trials for DMT have failed to show positive results in the past half-century. A lack of biomarkers related to the disease mechanisms is a barrier to the DMT development [13]. It is therefore highly desirable that there exist biomarkers to reflect the rate of disease progression, particularly towards implementation in clinical trials. For this reason, many efforts to identify molecular biomarkers for ALS have been made [14], for example, the protein levels of CSF neurofilament light chain and phosphorylated neurofilament heavy chain were proposed [15,16], and recently, NAIP protein level in peripheral mononuclear cells was also reported to be correlated with the rate of disease progression [17]. Nonetheless, no specific molecular biomarker has been established for ALS. Molecular biomarkers include not only proteins, but also RNAs. Although there have been several reports on micro-RNA signatures in CSF as biomarkers for ALS, the studies showed inconsistent results [18][19][20]. Furthermore, there is no report regarding mRNAs in CSF from ALS patients. This is, at least in part, due to an insufficient detection sensitivity for exosomal mRNAs in CSF, and consequent low reproducibility. Aberrant expressions of protein-coding mRNAs are more closely linked to disease mechanisms compared with microRNAs that have post-transcriptional gene silencing effects on wide range of coding genes [21]. Therefore, an analysis of mRNAs in CSF will offer the opportunity of biomarker identification based on the disease mechanisms. However, there is no available method to analyze exosomal mRNAs comprehensively by next-generation sequencing approach in CSF exosomes.
The purpose of the current study was to establish the methodology for comprehensive analysis of exosomal mRNAs in CSF to identify disease-related biomarkers for ALS. Although ultracentrifugation is the most frequently used means to isolate EVs, there may be large technical variability in the process and it is difficult to apply for large numbers of specimens. In addition, simple and robust procedures are required for clinical settings. Therefore, we adopted the affinity spin column-based exoEasy system for the isolation of EVs. In addition, a highly sensitive RNA-seq technology was applied to those exosomal mRNAs. Here, we demonstrate the feasibility of next-generation sequencing for exosomal mRNAs (exoR-NA-seq) in CSF from four normal healthy (NH) donors and sporadic ALS patients. The resultant mRNAs differentially expressed between NH and ALS enable us to speculate ALS pathomechanisms.

Characterization of isolated EVs as exosomes
EVs from human CSF were analyzed to determine their diameters and exosome-enriched protein markers, CD9 and Flotillin-1. Nanoparticle tracking analysis (NTA) revealed that isolated EVs had a mean diameter of 186 nm ± 70.4 nm, with the highest peak, representing the most frequently measured value, at 165 nm (Fig. 1a). Furthermore, these EVs were positive for CD9, CD81, and Flotillin-1 on western blotting (Fig. 1b). CD9 and CD81 are tetraspanin proteins, whereas Flotillin-1 is an endosome-associated protein. As the EVs were concentrated by ultrafiltration using 100 kDa MWCO filters, CD9, CD81 (~25 kDa each), and Flotillin-1 (48 kDa) were suggested to be trapped on the filter after ultrafiltration, not as free proteins but within EVs. The presence of these proteins of different classes strongly suggested that the isolated EVs were characteristic of exosomes.

Preparation of Illumina library for exoRNA-seq
Generally, cellular total RNA or universal human reference RNA (UHR) gives prominent peaks of 18S and 28S ribosomal RNAs in the quality check test by Agilent Bioanalyzer, and RNA integrity number is calculated from this ratio. However, RNA peaks were hardly detected in exosomal RNAs purified from 930 μL of human CSF from NH donors (Fig. 2a) and ALS patients. We did not follow the RNA purification procedure provided in the exoR-Neasy manufacturer's instructions, because that method also purifies small RNAs that interfere with library preparation with ultra low input of RNA, resulting in a compromised sequencing performance. Concentrations of RNA were below or as low as the detection limit of Agilent RNA6000 pico kit (50 pg/μL) in almost all samples. To overcome this limitation, switching mechanism at 5′ end of RNA template (SMART), one of the most sensitive preparation methods for whole transcriptome amplification, was applied for the first-strand cDNA synthesis and amplification of cDNA [22]. At a glance, this scheme appeared not to succeed because there was no detectable peak in Agilent High Sensitivity DNA assay (Fig. 2b), while positive control UHR with input of 100 pg generated an amplified peak at 2500 bp (Fig. 2c). As expected, preparation of Illumina library did not turn out well with 100 pg of DNA input to Nextera XT kit, which was described in the manufacturer's instructions. Alternatively, increase of DNA input to 1 ng successfully generated the Illumina Fig. 1 Characterization of EVs in human CSF. EVs were isolated from human CSF by exoEasy. a After buffer exchange to PBS, nanoparticle tracking analysis (NTA) was performed using Nanosight NS500. Mode diameter and mean diameter are described with standard deviation. b Presence of exosome-enriched protein markers (CD9, CD81, and Flotillin-1) were analyzed by western blotting. CD9 and CD81 corresponded to~25 kDa protein band. Flotillin-1 corresponded to 48 kDa protein band Fig. 2 Library preparation for exoRNA-seq with ultra low input of RNA. a Exosomal mRNAs were purified and investigated. b After first-strand synthesis of cDNA from purified exosomal RNA, amplified cDNA by SMART-seq v4 was investigated. c As a positive control, Universal Human Reference RNA (UHR) was used. d Successfully prepared Illumina library for sequencing is shown library. The average size was approximately 500 bp, which is suitable for next-generation sequencing.

Sequencing for exosomal mRNAs in CSF from NH donors
First, we assessed the analytical variability of our exoRNA-seq. Generated reads from sequencing on MiSeq were trimmed for the Illumina adaptor sequences and mapped to the human genome reference B37.3. In the analysis of technical replicates of mRNAs from CSF exosomes, they showed the high reproducibility demonstrated by the pearson's product-moment correlation coefficient (r = 0.993) in the plot by count per million mapped reads (CPM) (Fig. 3a). Next, we compared exosomal mRNAs in CSF between NH and ALS. An average of reads mapped to the human genome reference B37.3 was 37.2 million reads. An average of 14,807 genes were detected, ranging from 9814 genes to 20,598 genes (data not shown, see Additional file 1). This large variation might be partially attributed to the limited volume of CSF and extremely low input of RNA. Position coverage analysis revealed that most detected genes were covered from 5′ to 3′ end. At the same time, genes with longer transcripts showed biased coverage, possibly because of reduced efficiency in long distance-polymerase chain reaction (LD-PCR) (Fig. 3b). Considering the first-strand cDNA was primed by oligo dT primers complementary to poly-A tail, a lot of mRNAs retained their integrity throughout their full lengths. To investigate variation of contents among individuals, exosomal mRNAs in CSF exosomes from four NH donors were compared. As a result, 4580 genes were commonly detected in CSF exosomes from all four donors (Fig. 3c), accounting for 22-46% of genes detected. The abundance of these 4580 genes in each donor was compared with other three donors (Fig. 3d). Although there still existed the large variability in the genes with low expression, all pairs of comparison showed the high pearson's product-moment correlation coefficients (r > 0.9). As shown in Table 1, total reads in NH4 specimens were much less than others. It might have biased effect on genes with low expression in normalization to CPM. As a result, the correlation coefficients of NH4 with other 3 NH specimens were smaller than other pairs of comparison. Commonly detected genes included cellular housekeeping genes, such as ACTB and GAPDH. These genes were abundant in the exosomes as is the case with cellular RNA (Fig. 3e). Neuron-enriched TUBB3 and CAMK2A were also detected in all samples, although in much lower amounts than ACTB. This result suggested that CSF contained exosome populations having neuronal origin. However, GFAP that is specifically expressed in astrocytes was also detected in CSF exosomes (data not shown, see Additional file 2). This meant our exoRNA-seq detected mRNA signatures not only from neuronal cells but also from other cell types in brain tissues. Not only protein-coding mRNAs, but also functional long non-coding RNAs, NEAT1 and MALAT1 [23,24] were observed. Probably, protein-coding mRNAs and these long non-coding RNAs have different cellular trafficking from each other. Protein-coding mRNAs move to cytosol to be translated into protein, while NEAT1 and MALAT1 stay within the nuclear envelope after transcription. The mechanisms by which they are loaded to exosomes remains to be elucidated.

Identification of differentially expressed exosomal RNAs between NH and ALS
Expression profile of exosomal mRNAs in CSF from ALS patients was compared with those from NH donors, as described in the previous section. Four age-and sex-matched CSF specimens from sporadic ALS patients were chosen. These patients scored 41 to 45 on ALSFRS-R, and duration of disease ranged from six months to five years. ALS group showed significantly less total reads than NH group ( Table 1). Numbers of commonly detected genes in ALS group were 3098 (cf. 4580 in NH group). Numbers of undetected genes in all four specimens were 11,719 and 13,727 in NH and ALS group, respectively. Thus, there might be some biases in the comparison between NH and ALS group. Based on the hypothesis that disease conditions in ALS would be different from healthy conditions, we did not exclude the uncommon genes in NH group in the following DESeq2 analysis. In false discovery rate (FDR) correction by Benjamini-Hochberg method, 5006 genes could be calculated for adjusted p-values (data not shown, see Additional file 3). Thus, the genes with no detection in any specimens or with outliers in some specimens were not calculated for adjusted p-values. DESeq2 analysis resulted in 543 genes that were significantly changed between NH donors and ALS patients groups, with adjusted p-values less than 0.05 (Fig. 4a). Among these genes, 133 genes were upregulated and 410 genes were downregulated in ALS patients group. The top 10 statistically significant DEGs are listed in Table 2. Several genes were dramatically changed between NH donors and ALS patients groups. Particularly, CUEDC2, CUE domain-containing protein-2, was only detected in ALS patients group (Fig. 4b). Interestingly, CUEDC2 has not been reported so far regarding relationship with ALS. CUEDC2 has ubiquitin-binding motif, as represented in its name, and regulates ubiquitin-proteasome pathway [25]. It is also known to be related to inflammatory response such as SOCS3 [26]. In ALS patients group, all samples covered all exon sequences of CUEDC2, suggesting that there were full-length CUEDC2 mRNAs without fragmentation in this group. However, there was no sequence mapped to this gene in NH donors group. Again, since reverse transcription relied on the presence of poly-A tail, there were two possibilities. One is that there were CUEDC2 mRNAs without in 3′ end region in NH group. Exosomes have also been considered to contain fragmented mRNAs. The other is that there was no detectable amount of the gene transcript. Next, representative DEGs were compared between the two groups. In contrast, ACTB and GAPDH, described in Fig. 3, were again depicted. ACTB and GAPDH resulted in comparable CPM in both groups ( Fig. 4 and Additional file 2). As with CUEDC2, RAB11A was highly presented in ALS patients group, while low CPM was also detected in NH group. RAB11A is a member of the small GTPase superfamily, and plays roles in the transport of proteins, lipids, and vesicles. In contrast, CCT7 and TMEM222 were only detected in NH donors group. CCT7 encodes a molecular chaperone that plays roles in protein folding. The result of RNA-seq should be validated by other detection method. The abundance of ACTB and CUEDC2 in  exosomal mRNAs was examined by probe-based quantitative real time polymerase chain reaction (qRT-PCR). As observed in exoRNA-seq, ACTB showed comparative level among samples although Ct values of ACTB were slightly larger in ALS group than those of NH group (Fig. 4d). It might be correlated with the reduced total reads in RNA-seq in ALS group compared with NH group. CUEDC2 could be detected in all ALS specimens and it could not be detected in NH group with exception of one specimens with detectable CUEDC2 (Fig. 4e). The difference of Ct values in CUEDC2 between NH and ALS groups gave p-value of 0.0569 by Welch's t-test after confirmation of the Gaussian distribution by Kolmogorov-Smirnov test. Thus, although the statistically significant difference in CUDEC2 was not observed between NH and ALS groups, a similar expression pattern was observed in qRT-PCR. To investigate whether there was functional enrichment in DEGs, GO analysis was performed. Decrease in ligase activity was ranked at the top (Table 3). Subsequently, decreases in response to oxidative stress and ubiquitin-protein ligase activity were included. Next, pathway analysis was also performed, by IPA. As in GO analysis, protein ubiquitination pathway and superoxide radical degradation, which are related to ALS pathomechanisms, were listed (Table 4). In regards to oxidative stress, mutation in SOD1 gene is the second major cause of familial ALS next to C9orf72 repeat sequence in untranslated region [27]. Although the amount of SOD1 transcript was not different between NH donors and ALS patients groups, SOD2 and SOD3 were detected as DEGs. Additionally, unfolded protein response was also specified.

Discussion
In this study, we achieved to develop exoRNA-seq that could identify DEGs in CSF between ALS and NH groups. Also, this is the first report to show the integrity of exosomal mRNAs in CSF. EVs in CSF were isolated using spin column-based exoRNeasy technology instead of ultracentrifugation. The validity of exoRNeasy was previously reported in comparison with ultracentrifugation for RNA analysis in serum and plasma EVs [28]. We demonstrated that EVs isolated from human CSF were also characterized as exosomes. In addition to our finding here that neuron-enriched genes were present in CSF exosomes, there was a report that repertoire of exosomal microRNAs in human CSF reflected those in brain tissues [29]. Although it is difficult to prove physiological roles of these exosomal mRNAs and microRNAs in adult humans, exosomes secreted from neurons are considered to play a role in synaptic activity [30]. Thus, exosomes in CSF are thought to be a useful source for investigation of alteration in neuronal cells as a surrogate material of brain. Besides traditional proteomic or microRNA profiling analysis, our exoRNA-seq provides the unique and novel approach to methodologies of biomarker identification. We reported that more than 4500 genes were commonly detected in exosomes in CSF from NH donors, and also showed that housekeeping genes such as ACTB and GAPDH in the exosomes were also abundant and consistent among four individuals. DESeq2 can be applied to RNA-seq with normalization by logarithm of geometric mean of all genes. Upon further validation in RT-PCR, how the data is normalized should be considered because there is no information in housekeeping genes in exosomes. If ACTB  and GAPDH were to show stable amounts among individuals in a large cohort, they could be used as internal controls, as is the case with cellular RNAs. We demonstrated that exosomal mRNAs can feasibly be biomarkers for diagnosis of ALS. CUEDC2 was the most dramatically increased exosomal mRNA in CSF from ALS patients. At least, inflammatory cytokines such as IL-6 were elevated in CSF from ALS patients compared with control donors and patients with multiple sclerosis [31]. The increase in CUEDC2 might reflect neuroinflammation related to the regulation of SOCS3. Also, ubiquitin-proteasome pathway is related to pathomechanisms of ALS. In lesions in ALS, ubiquitin-immunoreactive inclusion bodies were observed [32]. TDP-43 protein coded by TARDBP gene was found to co-localize in these inclusion bodies [33]. In pathologic condition, TDP-43 is considered to be ubiquitinated, truncated, phosphorylated, and misfolded [34]. The exact mechanisms of increase in exosomal CUEDC2 remains to be elucidated. Molecular chaperone-related CCT7 was also related to protein folding in cellular model of ALS [35]. Ubiquitin-proteasome pathway and unfolded protein response in pathway analysis may reflect these aspects of pathomechanisms. Oxidative stress response is also related to ALS pathomechanisms. Top-ranked sirtuin signaling pathway is also related to oxidative stress response. It was also reported that SIRT1 was upregulated in mouse model of ALS [36].
As this study focused on the technical development of exoRNA-seq and feasibility of biomarker identification by exoRNA-seq, it was performed with small sample size of four NH donors and four ALS patients. There existed two limitations in this study. One is low statistical power due to small sample size. It was considered as a main reason why so many genes were determined as DEGs. In DESeq2, FDR correction can be applied to the genes without no outlier identified cook's distance. Small sample size might not be enough to identify true outliers. The other is extremely low abundance of exosomal mRNAs that might cause the large variation of gene expression among four NH specimens in Fig. 3d. However, the reproducibility of our method was assured as shown in Fig. 3a. Thus, the large variation among four NH samples was not caused by technical instability but by another factors such as biological variation or artificial characteristics of specimens (i.e. Preservation period of the commercially available CSF after collection was different from each other.). The low abundance of exosomal mRNAs was observed especially in ALS. In comparison between NH and ALS group, there were more downregulated genes than upregulated genes. This is possibly due to some biases in genes with low expression upon normalization process. However, it was notable that upregulated genes were also detected in exoRNA-seq and same tendency was confirmed by qRT-PCR in respect to CUEDC2. Consequently, we identified biomarker candidates that were linked to pathomechanisms of ALS. The significance of these candidate genes should be verified with large sample size that ensure enough statistical power. Specificity among other neurodegenerative diseases is also important and should be verified.  PPARG,POLR1D,ATG12,TIMM44,ATG10, TIMM17B,MT-ND4,GLUD1,HIF1A,ESRRA,  STAT3,MT-ND4L,AGTRAP,XPA,SOD3, SOD2,MT-ND3,

Conclusions
We successfully developed and implemented exoRNA-seq in CSF. We found exosomal mRNAs in CSF as a promising biomarker to reflect pathomechanisms of ALS. Thus, we conclude that comprehensive analysis of exosomal mRNAs in CSF is useful to identify candidates of disease biomarkers for central nervous system with speculation of disease pathomechnism.

NTA
For NTA, purified EVs were concentrated and dispersion medium was exchanged with phosphate-buffered saline by Vivacon-500,100 kDa molecular weight cutoff (MWCO) spin column (Sartorius, Gottingen, Germany). Presence of nanoparticles and their Brownian motion were observed using a Nanosight NS500 (Malvern Instruments, Malvern, UK). Particle diameter was calculated from velocity of Brownian motion.

Western blotting
Intact EVs were isolated and concentrated as described in the procedure for NTA. They were mixed with 4x NuPAGE LDS sample buffer and 10x Sample Reducing Agent (Thermo Fisher Scientific, MA, USA). The mixture was incubated at 70°C for 10 min for reduction of disulfide bond. Proteins were separated by electrophoresis with NuPAGE SDS-PAGE system (Thermo Fisher Scientific) with protein size standard (Bio-Rad Laboratories, CA, USA) and blotted to PVDF membrane (Bio-Rad Laboratories). The blot was blocked with PVDF Blocking Reagent for Can Get Signal (TOYOBO, Osaka, Japan). To identify isolated EVs as exosomes, primary antibodies against exosome-enriched protein markers CD9, CD81 (mouse mAb from Novus Biologicals, CO, USA), and Flotillin-1 (mouse mAb from BD Biosciences, NJ, USA) were used. All of them were diluted 1000-fold in Can Get Signal solution 1. HRP-conjugated anti-mouse secondary antibodies (Cell Signaling Technology, MA, USA) were used with dilution of 5000-fold in Can Get Signal solution 2.

Library preparation for exo-mRNA-seq
To prepare a library for exo-mRNA-seq from extremely low amount of RNA, SMART-seq v4 Ultra Low Input RNA Kit for Sequencing was used (Takara Bio, Shiga, Japan). UHR was used as a positive control (Agilent Technologies). Input volume of RNA was adjusted to 9.5 μL, which is the maximal volume described in the instructions. Amplification of cDNA was performed for 15 cycles for LD-PCR. The concentration of purified cDNA was measured with Qubit dsDNA HS Kit (Thermo Fisher Scientific). Illumina sequencing adaptors and indices were ligated to tagmented DNA by Nextera XT DNA Library Prep Kit (Illumina, CA, USA) with input DNA amount of 1 ng. Size distribution and concentration of prepared library was evaluated using an Agilent High Sensitivity DNA Kit (Agilent Technologies) and Qubit dsDNA HS Kit, respectively.

Next-generation sequencing
Paired-end sequencing for 76 bp was run on MiSeq and NextSeq 500 system with MiSeq Reagent Kit v3 and NextSeq 500/550 High Output Kit, respectively (Illumina). The run for technical replicates of mRNAs from CSF exosomes was performed on MiSeq. ExoRNA-seq for the comparison between NH and ALS was run on NextSeq 500. Each read in generated FASTQ file was mapped to human genome reference B37.3 with the aid of RNA-seq pipeline implemented in OmicSoft ArrayStudio (QIAGEN). Relative expression of exosomal mRNAs was calculated as CPM. Venn diagram was constructed using VennDiagram package implemented in R (distributed by CRAN) under operation of its integrated development environment RStudio.

Statistical analysis
To identify differentially expressed genes (DEGs) in exosomal mRNAs, row count data was normalized by logarithm of geometric mean and size factor was calculated. DEGs were detected using DESeq2 function in Array Studio in which adjusted p-values for FDR were calculated by Benjamini-Hochberg method with alpha level of 0.05. The presence of the Gaussian distribution in Ct values generated from qRT-PCR was examined by Kolmogorov-Smirnov test. The difference of an average of Ct values generated from qRT-PCR was examined by Welch's t-test. Typical DEG was confirmed in genome browser for the mapped reads. Gene Ontology (GO) analysis was performed using BaceSpace Correlation Engine database (Illumina). Pathway analysis was performed using Ingenuity Pathway Analysis (IPA) (QIAGEN).