Analysis of a gene co-expression network establishes robust association between Col5a2 and ischemic heart disease

Background This study aims to expand knowledge of the complex process of myocardial infarction (MI) through the application of a systems-based approach. Methods We generated a gene co-expression network from microarray data originating from a mouse model of MI. We characterized it on the basis of connectivity patterns and independent biological information. The potential clinical novelty and relevance of top predictions were assessed in the context of disease classification models. Models were validated using independent gene expression data from mouse and human samples. Results The gene co-expression network consisted of 178 genes and 7298 associations. The network was dissected into statistically and biologically meaningful communities of highly interconnected and co-expressed genes. Among the most significant communities, one was distinctly associated with molecular events underlying heart repair after MI (P < 0.05). Col5a2, a gene previously not specifically linked to MI response but responsible for the classic type of Ehlers-Danlos syndrome, was found to have many and strong co-expression associations within this community (11 connections with ρ > 0.85). To validate the potential clinical application of this discovery, we tested its disease discriminatory capacity on independently generated MI datasets from mice and humans. High classification accuracy and concordance was achieved across these evaluations with areas under the receiving operating characteristic curve above 0.8. Conclusion Network-based approaches can enable the discovery of clinically-interesting predictive insights that are accurate and robust. Col5a2 shows predictive potential in MI, and in principle may represent a novel candidate marker for the identification and treatment of ischemic cardiovascular disease.

Occlusion of the LAD was confirmed microscopically by discoloration of the ischemic area below the ligation-node. Sham-operated animals underwent the same procedure without occlusion of the LAD. After the surgical intervention, mice were kept again under usual care for 4 weeks.

Determination of collagen content
For histological analysis, transverse mid-ventricular tissue slices were fixed with 4% formaldehyde and paraffin-embedded. Five µm paraffin slices were stained with Sirius Red to detect connective tissue. LUCIA software (Nikon) was used to determine degree of fibrosis in the infarction area. The animals were categorized in groups with a collagen content of 15-40% (small infarction), 41-55% (mid-sized infarction) and 56-85% (big infarction) in the anterolateral wall.

RNA samples
Total RNA was extracted from frozen tissue samples with a TRIzol (Invitrogen, Carlsbad, CA) isolation protocol. Homogenisation of samples was performed with a Polytron® (Bohemia,USA) and insoluble material from the homogenate was removed by centrifugation at 800rcf for 10min at 4°C. Total RNA was purified with a RNeasy mini kit combined with an on-column DNase treatment following the manufacturer's instructions (Qiagen, Valencia, CA). RNA quantity was assessed with a Nanodrop (Thermo Scientific, Wilmington, USA) and quality was evaluated using the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA). RNAs used in the present study were of good quality and un-degraded. See Supplementary File 2 for quality RNA samples. All nucleic acid samples were stored at -80°C until use.

Reverse transcription
1μg of RNA were reverse transcribed into cDNA using the SuperScript II reverse transcriptase with the following protocol: RNAs were mixed with the 5X RT buffer, random hexamers, dNTPs and DTT in a total volume of 19μl. Samples were then heated to 42°C for 2 min, and 1μL of SuperScript II was added to a total volume of 20μl. RT was allowed for 50 min at 42°C and was followed by enzyme inactivation at 70°C for 15 min.

A-CODE algorithm for network community detection
This approach is based on the notion that strong communities are built around strong edges in the community. Moreover, candidate communities should also represent tightly interconnected webs of neighboring relationships ( Figure S1). Thus, A-CODE searches for strong, highly-interconnected communities around each edge in the network ( Figure S2).  Candidate communities are characterized by their co-expression compactness, which is here based on the mean co-expression value observed in the candidate community. To reduce possible bias towards highly variable co-expression patterns, compactness is computed as the mean co-expression value divided by the standard deviation of the values found in a candidate community. The expected rate of false discoveries, q, for each observed compactness value is computed with a statistical test based on random permutations. Thus, strong candidate communities are those displaying high co-expression compactness with corresponding low q values. At each search step, A-CODE adds a new edge to the candidate community. Each new edge is derived from the direct neighborhood of the current candidate community. Similarly, at each search step the neighboring edge with the highest coexpression value, ρ, is selected for inclusion. This process continues until either a minimum q (min_q) cannot be obtained or until a maximum number of edges in the candidate community has been reached. Experiments reported here are based on min_q = 1E-4 and a maximum number of 20 edges in each candidate community. The latter was suitable to assist expert visualization and interpretation. Also the min_q value selected is stringent enough to filter out communities for which more than 1 permutation experiment (out of 10000 implemented) reported compactness values equal or higher than that observed in the candidate community.
At the end of this process, each network edge gives rise to a candidate community.