In the present study, the ComBat method was highly effective in removing batch effects due to different acquiry dates of the microarrays; this was demonstrated by i) unequivocal clustering of the RA and OA patients in the batch-corrected data for almost all time points investigated (shown by hierarchical clustering dendrograms and PCA); ii) integration of the outlier EB87, resulting in reduced standardized variances for a number of genes; iii) identification of a large number of genes differentially expressed between RA and OA with highly significant p-values (up to 5.31E-25); iv) identification of numerous overrepresented GO terms (with an increased number of members and strongly improved p-values).
The 7 different batches (aquiry dates) were clearly grouped into the two main clusters representing microarray analyses from the years 2006 and 2009. These 2 main clusters are a clear example of inevitable batch effects since the supplier of the hybridization agents changed the chemical components used in the production process in the year 2007 (personal communication; Affymetrix). The only possibility to avoid such batch effects would have been to collect all the samples for simultaneous analysis, an approach technically impossible for a total of 120 samples at the time point of analysis (and possibly even today). In addition, unequivocal clustering of RA and OA patients was only achieved when correcting for 7 batches and not only for 2 batches (2006 and 2009), indicating that both the technical change of the supplier and a basic variance among the acquiry dates contributed to the dissimilarity of the arrays. This should be taken into consideration for future microarray analyses.
The batch correction had no major influence on the underlying data, as demonstrated by i) almost unaffected means (maximal change 4.087%) in the variance/mean plots for a number of different gene sets; ii) marginally changed, but mostly reduced standardized variances for the same gene sets; iii) substantial overlap (> 65%) of the genes differentially expressed between RA and OA at time point 0 hours before and after batch correction; iv) preservation of 5/7 of the top-ranked GO categories after batch correction. These findings show that ComBat is highly suitable for batch correction of data derived from small sample sizes and does not lead to inappropriate modification of the underlying data as previously published .
The genes up-regulated at time point 0 hours (identified following batch correction) reflected a key feature of RA, i.e., SFB-driven fibrosis of the affected joints . In particular, the enhanced expression of collagens type I and III (COL1A1, COL1A2, COL3A1), as well as biglycan (BGN) by RA-SFB has been previously published [27–29]. In the context of RA, collagens types V and XI are less well known as mediators of fibrosis, but are targets of matrix metalloproteinase-mediated proteolytic activity [30, 31]. However, together with collagens type I, II, and III, these collagens are defined as (fibrillar) interstitial collagens  and represent major components of the extracellular matrix , thus suggesting a potential role in fibrosis also for these proteins. The fibrillar type XXVII collagen (COL27A1) and lipocalin-7 (also known as tubulointerstitial nephritis antigen-like 1; TINAGL1), which were identified as up-regulated molecules in RA-SFB in this study for the first time, are not intrinsic extracellular matrix molecules, but both exhibit differentiation potential. Type XXVII collagen, for example, is predominantly expressed in cartilaginous tissues and generally involved in skeletogenesis , whereas matricellular lipocalin-7 appears to be a (positive) regulator of angiogenesis , potentially influencing enhanced angiogenesis in the synovial membrane . Taken together, the enhanced formation of these matrix components may contribute to joint fibrosis in an attempt to counteract the progressive destruction of cartilage and bone.
Strikingly, differential expression of the 8 genes of interest from the top-ranked GO term “extracellular matrix structural constituent” (DEG_interest) was not only observed at the time point 0 hours, but also at all time points following stimulation with either TNF-α or TGF-β1. This suggests that RA-SFB show a constitutively altered “rheumatic phenotype”, which is preserved upon stimulation with TNF-α and TGF-β1 (“spacer effect”). Possible reasons for such an “imprinted” RA phenotype may include both genomic changes, e.g., mutations or chromosome aberrations, or epigenetic modifications, e.g. methylation or histone acetylation status [37–39].