Study Population and Clinical Screening
Children aged 3-18 years, living in the area of Oss, the Netherlands, and being registered at the Youth Dental Care clinic (Jeugdtandverzorging Noordoost Noord-Brabant) participated in the study. Ethical approval was given by the independent Medical Ethics Committee of TNO. Inclusion criteria were good general health, no use of antibiotics in the last 6 months, no braces and written consent of the parents/caretakers of the child. The children were clinically examined in the dental clinic by one calibrated dental epidemiologist as a part of their regular dental check-up. The oral examination included a visual inspection of the oral mucosa, caries experience and plaque. The caries experience was expressed by the dmfs-index. This index was calculated by adding up the total number of decayed, missing and filled surfaces. Surfaces with early enamel lesions (white spots) were registered as sound. A tooth surface was registered as carious if caries lesion had reached dentin or if tooth enamel was undermined by underlying lesion resulting in at least a 0.25-mm-deep surface defect. The amount of dental plaque (plaque index) was assessed using criteria of Greene and Vermillion . In brief, six dental surfaces - buccal surface of both upper permanent or deciduous molars, labial surface of 11 and 31 (permanent dentition) or 51 and 71 (deciduous dentition), and lingual surface of 36 and 46 (permanent dentition) or 75 and 85 (deciduous dentition) - were inspected. If the selected tooth was missing, the contralateral tooth or the neighbouring tooth was assessed instead. Amount of plaque was determined visually by moving the probe along the tooth surface. The surface received score 0 if no plaque was visible, score 1 if plaque was present only at the cervical third of the surface, score 2 if plaque covered a cervical half of the surface, score 3 if plaque reached the incisal or occlusal surface of the tooth. A mean plaque index (average score per number of surfaces scored) was calculated per child.
Unstimulated saliva, as the representation of an average sample of the whole oral ecosystem [3, 4], was collected at the day of the dental check-up, at home, before breakfast and toothbrushing, by drooling into a DNA-free, sterile vial for 5 minutes. The parents were instructed to store the saliva refrigerated (4-7°C) and take the vial to the dental clinic, i.e., within 6 h after collection.
A 0.1-ml quantity of saliva sample was transferred to a sterile screw-cap Eppendorf tube with 0.25 ml of lysis buffer (AGOWA mag Mini DNA Isolation Kit, AGOWA, Berlin, Germany). Then 0.3 g zirconium beads (diameter, 0.1 mm; Biospec Products, Bartlesville, OK, USA) and 0.2 ml phenol were added to each sample. The samples were homogenized with a Mini-beadbeater (Biospec Products) for 2 min. DNA was extracted with the AGOWA mag Mini DNA Isolation Kit and quantified (Nanodrop ND-1000; NanoDrop Technologies, Montchanin, DE, USA).
PCR Amplification, Sample Pooling and Pyrosequencing
PCR amplicon libraries of the small subunit ribosomal RNA gene V5-V6 hypervariable region were generated for the individual samples. PCR was performed using the forward primer 785F (GGA TTA GAT ACC CBR GTA GTC) and the reverse primer 1061R (TCA CGR CAC GAG CTG ACG AC). The primers included the 454 Life Sciences Adapter A (forward primer) and B (reverse primer) fused to the 5' end of the 16 S rRNA bacterial primer sequence and a unique trinucleotide sample identification key per each sample group. This resulted in 12 distinctly labelled pools of samples - deciduous, early mixed, late mixed, and permanent dentition at healthy, treated or caries state. The amplification mix contained 2 units of Pfu Ultra II Fusion HS DNA polymerase and 1× PfuUltra II reaction buffer (Stratagene), 200 μM dNTP PurePeak DNA polymerase Mix (Pierce Nucleic Acid Technologies, Milwaukee, WI), and 0.2 μM of each primer. After denaturation (94°C; 2 min), 30 cycles were performed that consisted of denaturation (94°C; 30 sec), annealing (50°C; 40 sec), and extension (72°C; 80 sec). DNA was isolated by means of the MinElute kit (Qiagen, Hilden, Germany). The quality and the size of the amplicons were analyzed on the Agilent 2100 Bioanalyser with the DNA 1000 Chip kit (Agilent Technologies, Santa Clara, CA, USA) and quantified using Nanodrop ND-1000 spectrophotometer. The amplicon libraries were pooled in equimolar amounts and sequenced unidirectionally in the reverse direction (B-adaptor) by means of the Genome Sequencer FLX (GS-FLX) system (Roche, Basel, Switzerland).
Processing of the Pyrosequencing Data
GS-FLX sequencing data were processed as previously described . In brief, we trimmed sequences by removing primer sequences and low-quality data (sequences that did not have an exact match to the reverse primer, that had an ambiguous base call (N) in the sequence, or that were shorter than 50 nt after trimming). We then used the GAST algorithm  to calculate the percent difference between each unique sequence and its closest match in a database of 69816 unique eubacterial and 2779 unique archaeal V5-V6 sequences, representing 323499 SSU rRNA sequences from the SILVA database . Taxa were assigned to each full-length reference sequence using several sources including Entrez Genome entries, cultured strain identities, SILVA, and the Ribosomal Database Project Classifier . In cases where reads were equidistant to multiple V5-V6 reference sequences, and/or where identical V5-V6 sequences were derived from longer sequences mapping to different taxa, reads were assigned to the lowest common taxon of at least two-thirds of the sequences. Only sequences that were found at least 5 times were included in the analyses. This strict and conservative approach was chosen to preclude inclusion of sequences from potential contamination or sequencing artefacts.
Probe Design for the 16S rDNA Microarray
For the design of the taxonomic microarray literature inventory was made on the description of the normal oral microbiota [2, 3, 39–41]. The literature based inventory was compared to the 454 pyrosequencing data that were obtained in this study. While the majority of taxa observed by 454 sequencing were already included in the literature inventory, a number of taxonomic groups were found to be missing. These were included in the list. Based on the list, probes were designed using ARB . The probes were 20-22 nucleotides in size, with a predicted melting temperature of 60°C ± 5, and a GC level between 40 and 60%. The full list of oligonucleotides is provided in Additional file 4.
Microarrays for this study were produced in house, using the ArrayIt Nanoprint60 Microarray Robot. The 5'-amino-modifier-C6-linked oligonucleotides were diluted to a final concentration of 25 μM in a 50 mM Phosphate buffer (pH 7), and printed onto CodeLink Activated Slides. Following incubation at 80% humidity under ambient conditions, slides were blocked in a buffer of 0.1 M Tris Base and 50 mM ethanolamine (pH 9) for 45 minutes at 50°C while shaking in accordance with the manufacturer's recommendations. After that, the slides were rinsed twice in MilliQ purified water (MilliQ, Millipore), washed in 4 × SSC (600 mM sodium chloride/60 mM sodium citrate), 0.1% sodium dodecyl sulfate (SDS) for 30 minutes at 50°C, and washed in pre-warmed (50°C) MilliQ water. Following that, the slides were washed twice in MilliQ water at a room temperature, and dried under a stream of nitrogen. Slides were stored under nitrogen until use.
DNA Labelling and Hybridization
For taxonomic microarray analysis, 16 S rDNA was amplified by PCR as described above, using forward primers 8F (AGA GTT TGA TCH TGG YTC AG) and 8F-bif (TGG CTC AGG ATG AAC GCT G) and reverse primer 1061R (TCA CGR CAC GAG CTG ACG AC). Following PCR amplification, DNA was amplified by random Klenow amplification with the BioPrime DNA Amplification kit (Invitrogen) according to the manufacturer's recommendations. Klenow-amplified DNA was passed through an Illustra AutoSeq G-50 column (GE Healthcare G50) for purification and concentrated in a speedvac. Following, the DNA was labelled by Terminal Transferase coupling of Cy3-dUTP (Promega). Following Illustra AutoSeq G-50 column purification, and vacuum concentration, the labelled DNA was dissolved in 40 μl Easyhyb hybridization buffer (Roche), and denatured for 2 min at 95°C. Printed slides were pre-hybridized in 0.45 μm-filtered pre-hybridization buffer [1% BSA, 5 × SSC, and 0.1% SDS] at 42°C for 45 min with rotation, then washed twice with MilliQ purified water, dried with nitrogen, and pre-warmed at 42°C. The microarrays were placed in the ProPlate multi-array system (Grace Bio-Labs). The hybridization mixture was then pipetted in the individual wells, and incubated in a hybridization chamber for four hours. Following hybridization, slides were then thoroughly washed sequentially in 1 × SSC, 0.2% SDS for 10 sec at 37°C, 0.5 × SSC for 10 sec at 37°C, and twice in 0.2 × SSC for 10 min at room temperature. Slides were dried with nitrogen and scanned using a Scanarray Express 680013 Microarray Analysis System (Perkin Elmers Life Analytical Sciences Inc.). Images were obtained and quantified with ImaGene 4.2 software (Biodiscovery).
To validate microarray performance, a number of tests were performed. These included: 1) Replicated analysis of samples to test robustness of microarray data; 2) Spiking DNA of underrepresented species to a complex mixture followed by microarray analysis; 3) performing a direct comparison between qPCR and microarray data, and 4) performing a direct comparison between sequencing and microarray data.
Technical replicates were included to verify robustness of data. In the examples described below, three replicates were compared. Two were analyzed on slides of the same printing series - sMF02-11, and one was analyzed on an older slide series (sMF02-9). The Pearson correlation (r) between the three microarray analyses was high, and varied between 0.91 and 0.93, and was independent of the slide series. For comparison, correlation between different saliva samples ranged from 0.4 to 0.7.
To test the performance of the microarray with the complex oral microbial community, we performed a spiking experiment. For this, DNA was isolated from saliva samples of a healthy adult donor as well as from pure cultures of Lactobacillus casei (ATCC 334). Following, DNA was quantified, and pure culture DNA was added to saliva DNA at levels ranging between 0.01 and 1% of the total community. Then the DNA was labelled, and used to hybridize the microarray. A clear signal was detected in all spiked samples for the L. casei spot on the microarray in a dose responsive manner. At the family level, also a significant dose responsive increase in microarray spot intensity was detected (with lower sensitivity).
Direct Comparison between qPCR and Microarray Data
A third method for validating microarray data was by cross comparing species levels established by quantitative PCR with the results obtained using the microarray. In the current study, this was performed for Porphyromonas catoniae in saliva samples obtained from the children. During the study, 74 saliva samples were analyzed. The q-PCR data of P. catoniae showed that saliva samples contained between 105 and 108 of P. catoniae cells/ml saliva. The fluorescence intensity for the P. catoniae probes on the microarray ranged between 0 and 110. The number of P. catoniae cells correlated significantly with the signal of both Porphyromonas probes - o1503 and o1506. The Spearman's correlations were p < 0.001; r = 0.618 and p < 0.001; r = 0.580, respectively.
Direct Comparison between Sequencing and Microarray Data
To enable a direct comparison between microarray and sequencing data, both data sets were coupled by using the probe and sequence data. For this, a string search was used to identify matching probe sequences within the 454 pyrosequencing data. For 26 probes a direct match was identified. This relatively low number of matching probe sequences is due to the fact that the pyrosequencing data span the V5-V6 region, while the majority of probes were defined in the V2, V3 and V4 region of the 16 S ribosomal gene. We then calculated the Pearson correlation between the microarray and 454 sequencing data. Good (r = 0.85) to excellent (p > 0.9) correlation was found for probes corresponding to relatively abundant taxa (>0.1% of all reads; signal/background fluorescence value >20). The correlation for less abundant species was weak. No false positive probes were found. Only two probes (o1402 and o1445) were classified as false negatives.
Targeting of specific microorganisms by quantitative PCR
Quantitative PCR was performed on the Applied Biosystems 7500 Fast Real-Time PCR System. The total microbial load was determined using the universal primer set described by Nadkarni et al . The P. catoniae and P. gingivalis primer-probe sets were designed using Primer Express (Applied Biosystems). To quantify P. catoniae, the primer-probe set of P. catoniae-16S-F (CGG TTG CCA TCAG GTA ATG C), P. catoniae-16S-R (CAC CTT CCT CAC GCC TTA CG) and P. catoniae-16S-probe (TCC GTA GAG ACT GCC G), a minor-groove binding probe (MGB) labeled with 6-carboxy-fluorescein (FAM) was used. The P. gingivalis primer-probe set was composed of P. gingivalis-16S-F (GCG CTCA ACG TTC AGC C), P. gingivalis-16S-R (CAC GAA TTC CGC CTG C) and the FAM labeled minor-groove binding (MGB) probe P. gingivalis-16S-probe (CAC TGA ACT CAA GCC CGG CAG TTT CAA). Quantitative PCR was performed using the Diagenode Universal Mastermix, in accordance with the manufacturer's recommendations. Standard controls included a serial dilution series of purified genomic DNA of P. gingivalis ATCC BAA-308 and P. catoniae ATCC 51270.
To identify the probes that contribute significantly to the different response variables (oral health status, the stage of dentition), we performed the Significance Analysis of Microarrays (SAM analysis) - a non-parametric statistical technique for finding significant differences between microarray data that are grouped based on experimental conditions . To reduce the dimensions of the array data we performed the principal component analysis (PCA). Similarities among individual sample profiles were calculated using the sample distance matrix. The SAM and PCA analyses were performed using the MeV software package, as part of the TM4 microarray software suite . The sample distance matrix was calculated using Pearson correlation coefficient and visualized using hierarchical clustering with the average linkage method. Independent samples T-test, Pearson and Spearman correlations were calculated using SPSS (Version 17.0). Abundances of probes that showed significant differences in SAM and contributed most to the PCA were tested by ANOVA, Games-Howell post-hoc test using SPSS (Version 17.0).