Muscle Research and Gene Ontology: New standards for improved data integration
- Erika Feltrin1Email author,
- Stefano Campanaro2,
- Alexander D Diehl3,
- Elisabeth Ehler4,
- Georgine Faulkner5,
- Jennifer Fordham4,
- Chiara Gardin1,
- Midori Harris6,
- David Hill3,
- Ralph Knoell7,
- Paolo Laveder2,
- Lorenza Mittempergher1,
- Alessandra Nori8,
- Carlo Reggiani9,
- Vincenzo Sorrentino10,
- Pompeo Volpe8,
- Ivano Zara1,
- Giorgio Valle1 and
- Jennifer Deegan née Clark6
© Feltrin et al; licensee BioMed Central Ltd. 2009
Received: 03 April 2008
Accepted: 29 January 2009
Published: 29 January 2009
The Gene Ontology Project provides structured controlled vocabularies for molecular biology that can be used for the functional annotation of genes and gene products. In a collaboration between the Gene Ontology (GO) Consortium and the muscle biology community, we have made large-scale additions to the GO biological process and cellular component ontologies. The main focus of this ontology development work concerns skeletal muscle, with specific consideration given to the processes of muscle contraction, plasticity, development, and regeneration, and to the sarcomere and membrane-delimited compartments. Our aims were to update the existing structure to reflect current knowledge, and to resolve, in an accommodating manner, the ambiguity in the language used by the community.
The updated muscle terminologies have been incorporated into the GO. There are now 159 new terms covering critical research areas, and 57 existing terms have been improved and reorganized to follow their usage in muscle literature.
The revised GO structure should improve the interpretation of data from high-throughput (e.g. microarray and proteomic) experiments in the area of muscle science and muscle disease. We actively encourage community feedback on, and gene product annotation with these new terms. Please visit the Muscle Community Annotation Wiki http://wiki.geneontology.org/index.php/Muscle_Biology.
Technical innovations in recent years have enabled the production of vast amounts of scientific research data, using a variety of methods, and covering many different species. These innovations provide the opportunity for evaluation of large datasets to generate and/or support novel hypotheses. In dealing with this wealth of data, scientists are held back by differences in technical language among research communities, compounded by the absence of computable information on the relationships between biological processes.
An example of linguistic ambiguity within the muscle biology community is seen in the use of the word 'plasticity'. This word could mean the quality of adaptability, but is often used to indicate the process of adaptation. In addition to complicating the work of research scientists, such ambiguity also presents real difficulties for those who wish to write data mining software. Such software attempts to automatically handle information about the relationships between biological processes and between gene products. There is a particular need for good data mining software in high-throughput work, which is a prominent part of current muscle biology research. The aim of the Gene Ontology (GO) project [1, 2] is to provide a standard language for the description of gene products, thus enabling scientists and software engineers to resolve language problems.
To provide this standard language, the GO project is developing ontologies and using them in annotation of gene products. There are three non-overlapping ontology domains, so that gene products may be categorized according to GO terms representing the molecular functions they carry out (using the Molecular Function ontology), the cellular locations where they act (using the Cellular Component ontology), and the biological processes in which they take part (using the Biological Process ontology). The three ontologies are separate, but within each ontology the GO terms are related to one another. These relationships indicate where one category is a part (part_of relationship) or type (is_a relationship) of another category, or where one category regulates (regulates, positively_regulates, or negatively_regulates relationships) another category. For a more comprehensive explanation see . Each ontology can be used as a standard terminology to facilitate a biologically meaningful description of the roles of genes and their products in any organism. Gene products can be annotated to any number of GO terms within one or more of the ontologies to capture information about their various roles within these given domains. The Gene Ontology has for several years included a number of terms describing muscle biology, and the GO has already been used extensively for statistical data analysis in muscle biology studies. For example, the GO was used in an analysis of the global transcriptional changes that take place in skeletal muscle in relation to estrogen status , and in an expression profiling study of the transcription factor MyoD during myogenic differentiation . However, to fully support the current needs of the muscle research community, especially with respect to the study of disease, a considerable expansion of the terms relevant to muscle biology is required. We describe here an effort to improve the structure of muscle terms in the GO biological process and cellular component ontologies. The work was carried out as a collaborative project that brought together the GO Consortium, the Genomic Research Group of CRIBI Biotechnology Center at the University of Padua, and several research groups involved in muscle biology. We sought to improve GO terms that would specifically support muscle biology research in areas relevant to the investigation of muscle-related disease. By bringing together muscle and ontology experts, the GO structure was systematically improved in five areas: muscle contraction, plasticity, development and regeneration; and for cell regions in the sarcomere and membrane-delimited compartments.
Following the example of the Immunology Content Meeting  and of other GO ontology development meetings, the muscle-related GO Content Meeting brought together experimental biologists and ontology developers to define the terms relevant to this specific research field.
Content-oriented meetings facilitate large-scale changes in specific areas of the Gene Ontology. A content meeting is usually organized as a multi step process. The first steps are normally done by the ontology developers, gathering information related to the field of interest from books, reviews and scientific papers and organizing it into an ontological format. The ontology is finally presented to the experts in the field during the content meeting, to be discussed and refined. The work described in this paper was carried out using a rather different approach. The initial steps were mainly carried out by the muscle biology research community, while the final discussion and refinement was carried out during a meeting with invited ontology developers. This approach was possible because a member of the muscle community (EF) spent six months working in ontology development and annotation at the GO editorial office and gained further experience by taking part in a previous GO content meeting. She then rejoined her research community and led the ontology development effort. Throughout the initial community editing process, GO Consortium editors provided technical assistance and advice on representation of language by means of frequent web-based ontology editing meetings. Following this editing phase, ontology developers from the GO Consortium were invited to meet with the group of domain experts for further discussion and revision of the structure of the GO. This two-day meeting was entirely devoted to live editing the GO, during which further changes were made to its structure and content. The community-based ontology development model was extremely positive and productive, as it enabled the lead ontology developer to access a wide range of domain experts, mostly locally available, and all with cutting-edge knowledge of the field.
At the end of October 2007, the changes were incorporated into the GO, and are now available for all GO users.
In addition to creation and improvement of GO terms, cross-references were made with a number of other resources. The Adult Mouse Anatomical Dictionary  was used as a source of definitions and ontological structuring for muscle contraction anatomical terms, and the Cell Ontology  was consulted for cell type definitions. The resulting terms were cross-referenced to these other ontologies and, where appropriate, new definitions were contributed to the other ontologies. For example, a new definition of the satellite cell type was created and introduced in both the Cell Type and GO ontologies, and the resulting terms were cross-referenced. Gene Ontology term names are given in bold in this text.
The muscle research community, in collaboration with the GO Consortium, has completed an initiative to greatly expand the muscle biology representation in the GO biological process and cellular component ontologies. The work focused on improving and adding terms urgently needed for current priority areas in research.
The main focus of the work was skeletal muscle, with specific consideration given to the processes of muscle contraction, muscle plasticity, muscle development and regeneration; and to the sarcomere and membrane-delimited cell compartments. Our aims were to update the existing structure to reflect current knowledge, and to resolve in an accommodating manner, the ambiguities in the language used by the muscle community. This collaborative effort drew on the knowledge of an extensive community of muscle experts, and resulted in the addition of 159 new terms and the improvement of 57 existing terms.
These different areas of muscle biology were addressed to support specific research needs. In the following text, the motivation for the changes and the details of each set of changes are described.
There are two different possible biological meanings of the commonly used phrase 'muscle plasticity', such that plasticity could be either the quality of adaptability or the process of adaptation. In ontology development it is essential to be clear about which term represents which process; and to ensure that the language is unambiguous, whilst still reflecting community usage. The existing muscle plasticity term was ambiguously named, risking incorrect use in annotation or text mining. However, as the term was clearly defined to describe the process of adaptation, we were able to resolve the problem by renaming the term muscle adaptation (leaving muscle plasticity as a related synonym, to help researchers find the term). This action resolved the ambiguity, but accommodated the common uses of the word 'plasticity' in domain literature by retaining the word as a searchable related synonym.
Using these new more granular terms, biologists will be able to annotate gene products in more detail. Prior to our work, if a gene was thought to be involved in muscle atrophy, the user had only the option of annotating directly to the general term 'muscle plasticity'. As a result of our contribution, the ontology now includes child terms representing muscle atrophy, hypertrophy and hyperplasia. It also includes generic regulation terms under each of these processes, and under these regulation terms the actual regulatory processes are grouped. To illustrate the advantage of creation of these new terms the muscle experts have contributed some annotations that could previously only be made to the general muscle plasticity term (Figure 1A) and that can now be distributed amongst the more specific child terms for much greater reasoning power (Figure 1B). This small amount of annotation clearly shows how much better this enhanced structure is for distinguishing sets of gene products involved in the various processes that contribute to the general process of muscle adaptation. Though we have shown only a handful of gene products, it can easily be imagined how much more powerful the system will be in automated analysis of the activity of thousands of gene products, as is the case in a microarray experiment. For example, once the relevant gene products are fully annotated, it will be possible to detect by microarray experiment those stimuli that upregulate hundreds of genes involved in muscle hypertrophy, whilst barely affecting the regulation of genes involved in muscle atrophy.
This new set of terms should assist in the annotation of gene products involved in the control of muscle fiber-type diversity, providing potential new targets for the treatment and prevention of different disorders ranging from metabolic to neuromuscular diseases, for example Type 2 diabetes and muscular dystrophy . We have explained this example of muscle plasticity very fully to illustrate the motivation behind our ontology development work. The work carried out on other areas of the ontology will bring similar benefits with regard to other critical areas of research, and we describe these pieces of work somewhat more briefly below with reference to the areas of research that they are intended to support.
The definition of the term muscle contraction, which previously existed in the GO, has been considerably improved and all of its descendants have been reorganized. The new structure represents several forms of muscle contraction and their relationships with the various types of muscle. To reflect this, there is also a greatly expanded set of terms describing the different contractile capacity of muscle. Striated muscle contracts and relaxes in short, intense bursts, whereas smooth muscle sustains longer or even near-permanent contractions. This difference was captured by the creation of is_a children, phasic smooth muscle contraction and tonic smooth muscle contraction, under the parent term smooth muscle contraction. Since the process of smooth muscle contraction varies with the anatomical location of muscles, terms such as vascular muscle contraction and gastro-intestinal muscle contraction were also created.
Muscle contraction is actively regulated by a series of events, for which appropriate regulation terms have been added. These include several processes such as cross-bridge formation, cross-bridge cycling, and filament sliding, which are necessary for force generation during muscle contraction. Multiple molecular components, such as sarcoplasmic proteins, have a role in regulating the muscle contraction. For instance mutations in several Z-disc proteins in the sarcomere, that are important for the cross-linking of thin filaments and transmission of force generated by the myofilaments, have been shown to cause cardiomyopathies and/or muscular dystrophies . To accommodate this, a definition of the sarcomeric Z-disc has been added to the component ontology and extended to include recently discovered novel attributes associated with this structure, such as mechanosensation and mechanotransduction, thereby allowing users to view the Z-disc not so much as a static, but now as a flexible structure with important implications for signal transduction as well .
Muscles can be divided into striated and smooth types. Smooth muscle or 'involuntary muscle' is found within structures such as the oesophagus, stomach, intestines, bronchi, uterus, and blood vessels. Unlike skeletal muscle, smooth muscle is not under conscious control. Cardiac and skeletal muscles are striated in that they contain sarcomeres and are packed into highly regular arrangements of bundles.
Skeletal muscles are further divided into two subtypes, slow-twitch and fast-twitch muscle, depending on their contractile capacity. The biology of these two muscle types is key in current research, so we worked to represent it correctly as part of the biological process ontology. Improvements were made to the representation of these areas, to ensure that the usage of the words 'skeletal' and 'striated' was representative of that in the community. Importantly, these terms were also cross-checked by a cardiovascular physiology community group, whose ontology development effort took place at the same time, and which also touched on voluntary/involuntary muscle processes (David Hill, personal communication).
Muscle Development and Regeneration
Myofibers, the functional unit of skeletal muscle, are long cylindrical multinucleated cells that vary in their morphological, biochemical, and physiological properties. They are derived from myoblasts: cells committed to the skeletal muscle lineage. Upon fusion, myoblasts form myotubes, which are further remodeled into myofibers . The skeletal muscle development subtree has been enhanced during our work with a new hierarchy of terms describing myoblast, myotube, and myofiber development, and the mechanisms of their regulation. To accommodate recent data, a distinction was introduced between head and trunk muscle development .
Many terms have been added to cover the process of cell regeneration and its regulation in skeletal muscle tissue. These include terms such as satellite cell activation involved in skeletal muscle regeneration and satellite cell compartment self renewal involved in skeletal muscle regeneration. Satellite cell processes are considered particularly important, since their activation is involved in muscle regeneration. Satellite cell proliferation, differentiation, and self-renewal are essential for proper myofiber turnover; an ongoing process that maintains proper muscle tissue viability . Moreover, in adult skeletal muscle, the self-renewing capacity of satellite cells contributes to muscle growth and adaptation . Skeletal muscle is capable of complete regeneration due to the presence of stem cells that reside in skeletal muscle and non-muscle stem cell populations. However, in severe myopathic diseases such as Duchenne Muscular Dystrophy, this regenerative capacity is exhausted . We have attempted to support research into these areas by addition of the relevant terms.
We have described an ontology development effort that provides a valuable resource for functional annotation of gene products related to muscle biology. New terms supporting critical research areas are now available, and existing terms have been improved and reorganized to reflect their usage in muscle literature. There are a number of important advantages to a research community in having their field accurately represented in the GO. Our revised ontology structure should facilitate the interpretation of high-throughput experiments (e.g. gene expression microarrays) in the areas of muscle science and muscle disease. Such studies yield a very large number of data points, so that investigation of how genes specifically contribute to a disease phenotype is challenging . However the use of GO ontologies and annotations in statistical analysis should greatly simplify this .
Obviously, a critical component of such analysis is the comprehensive annotation of relevant gene products. To enable community annotation, we have provided a Muscle Biology Community Annotation Wiki http://wiki.geneontology.org/index.php/Muscle_Biology. The wiki contains editable annotation pages for 172 genes associated with muscle development and function.
Users can review existing Gene Ontology annotations for any gene of interest, and add information about any aspect of the biology of a gene from any species. They can also contribute annotations of gene products involved in muscle biology to all GO terms, thereby supporting the next step in research into these critical areas of muscle biology.
The Gene Ontology Consortium is supported by NIH – NHGRI grant HG02273 and by EMBL core funding. The Muscle Biology Content Meeting was supported by Italian Telethon Grant GSP042894B. We We thank Judith Blake, Michael Ashburner and Chris Mungall for critical reading of the manuscript.
- Gene Ontology Consortium: The Gene Ontology (GO) project in 2006. Nucleic Acids Research. 2006, D322-326. 10.1093/nar/gkj021. 34 Database
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics. 2000, 25: 25-29. 10.1038/75556.View ArticlePubMedPubMed CentralGoogle Scholar
- Gene Ontology Consortium: The Gene Ontology project in 2008. Nucleic Acids Research. 2008, D440-444. 36 Database
- Pollanen E, Ronkainen PH, Suominen H, Takala T, Koskinen S, Puolakka J, Sipila S, Kovanen V: Muscular transcriptome in postmenopausal women with or without hormone replacement. Rejuvenation Research. 2007, 10: 485-500. 10.1089/rej.2007.0536.View ArticlePubMedGoogle Scholar
- Bean C, Salamon M, Raffaello A, Campanaro S, Pallavicini A, Lanfranchi G: The Ankrd2, Cdkn1c and calcyclin genes are under the control of MyoD during myogenic differentiation. Journal of Molecular Biology. 2005, 349: 349-366. 10.1016/j.jmb.2005.03.063.View ArticlePubMedGoogle Scholar
- Diehl AD, Lee JA, Scheuermann RH, Blake JA: Ontology development for biological systems: immunology. Bioinformatics (Oxford, England). 2007, 23: 913-915. 10.1093/bioinformatics/btm029.View ArticleGoogle Scholar
- Hayamizu TF, Mangan M, Corradi JP, Kadin JA, Ringwald M: The Adult Mouse Anatomical Dictionary: a tool for annotating and integrating data. Genome Biology. 2005, 6: R29-10.1186/gb-2005-6-3-r29.View ArticlePubMedPubMed CentralGoogle Scholar
- Bard J, Rhee SY, Ashburner M: An ontology for cell types. Genome Biology. 2005, 6: R21-10.1186/gb-2005-6-2-r21.View ArticlePubMedPubMed CentralGoogle Scholar
- Schiaffino S, Sandri M, Murgia M: Activity-dependent signaling pathways controlling muscle diversity and plasticity. Physiology (Bethesda, Md). 2007, 22: 269-278.View ArticleGoogle Scholar
- Frank D, Kuhn C, Katus HA, Frey N: The sarcomeric Z-disc: a nodal point in signalling and disease. Journal of Molecular Medicine (Berlin, Germany). 2006, 84: 446-468.View ArticleGoogle Scholar
- Knoll R, Hoshijima M, Chien K: Cardiac mechanotransduction and implications for heart disease. Journal of Molecular Medicine (Berlin, Germany). 2003, 81: 750-756.View ArticleGoogle Scholar
- Berchtold MW, Brinkmeier H, Muntener M: Calcium ion in skeletal muscle: its crucial role for muscle function, plasticity, and disease. Physiological Reviews. 2000, 80: 1215-1265.PubMedGoogle Scholar
- Nori A, Valle G, Bortoloso E, Turcato F, Volpe P: Calsequestrin targeting to sarcoplasmic reticulum of skeletal muscle fibers. American Journal of Physiology. 2006, 291: C245-253. 10.1152/ajpcell.00370.2005.View ArticlePubMedGoogle Scholar
- Bassel-Duby R, Olson EN: Signaling pathways in skeletal muscle remodeling. Annual Review of Biochemistry. 2006, 75: 19-37. 10.1146/annurev.biochem.75.103004.142622.View ArticlePubMedGoogle Scholar
- Grifone R, Kelly RG: Heartening news for head muscle development. Trends Genet. 2007, 23: 365-369. 10.1016/j.tig.2007.05.002.View ArticlePubMedGoogle Scholar
- Scime A, Rudnicki MA: Anabolic potential and regulation of the skeletal muscle satellite cell populations. Curr Opin Clin Nutr Metab Care. 2006, 9 (3): 214-219.View ArticlePubMedGoogle Scholar
- Anderson JE: The satellite cell as a companion in skeletal muscle plasticity: currency, conveyance, clue, connector and colander. The Journal of Experimental Biology. 2006, 209 (Pt 12): 2276-2292. 10.1242/jeb.02088.View ArticlePubMedGoogle Scholar
- Shi X, Garry DJ: Muscle stem cells in development, regeneration, and disease. Genes & Development. 2006, 20: 1692-1708. 10.1101/gad.1419406.View ArticleGoogle Scholar
- Timmons JA, Larsson O, Jansson E, Fischer H, Gustafsson T, Greenhaff PL, Ridden J, Rachman J, Peyrard-Janvid M, Wahlestedt C, Sundberg CJ: Human muscle gene expression responses to endurance training provide a novel perspective on Duchenne muscular dystrophy. Faseb J. 2005, 19: 750-760. 10.1096/fj.04-1980com.View ArticlePubMedGoogle Scholar
- GO Consortium Tool web page. [http://www.geneontology.org/GO.tools.shtml#micro]
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1755-8794/2/6/prepub