Ontology Development Progress Report December 2010

From GO Wiki
Revision as of 11:03, 24 November 2010 by Jl242 (talk | contribs)
Jump to navigation Jump to search

Ontology Development

Metrics

GO term statistics

December 1, 2009

Current Defined Obsolete Total
Function 8657 8474 798 9455
Process 17533 17465 508 18041
Component 2613 2613 117 2731
All 28803 29976 1424 30227

November 24, 2010

Current Defined Obsolete Total
Function - - - -
Process - - - -
Component - - - -
All - - - -


SourceForge statistics (Dec 1, 2008 - Nov. 30, 2009)

  • items opened: 1085
  • items closed: 1071

SourceForge reports (on SF site)

Ontology development

In 2008, we introduced three new relationship types -- regulates, negatively_regulates and positively_regulates -- to represent biological regulation in the Biological Process (BP) ontology. We have now extended the use of these relations, adding regulates relationships within the Molecular Function ontology and between the Biological Process and Molecular Function ontologies; the latter are the inter-ontology links in the GO vocabularies. The new links make previously implicit regulatory relationships explicit between 'regulation of molecular function' Biological Process terms and the corresponding Molecular Function terms [e.g. regulation of kinase activity (BP) regulates kinase activity (MF)], and between terms within the Molecular Function ontology [e.g. calcium channel regulator activity (MF) regulates calcium channel activity (MF)]. Like the original implementation of the regulates relations, this work used reasoning and quality control reports to ensure internal consistency between molecular functions and the molecular functions or biological processes that regulate them. For example, if a term 'regulation of function X' exists in the ontology, it must be a valid subtype of 'regulation of molecular function', and must have a regulates relationship with 'function X'. This project resulted in new terms for 402 processes that regulate molecular functions.

Additional Molecular Function-Biological Process links have also been introduced, and are being added gradually. The first such links to be incorporated are part_of relationships based on molecular functions that are only involved in a single process (e.g. 'protein kinase' part_of 'protein phosphorylation'). Additional part_of links will be created based on mining external pathway resources such as Reactome and MetaCyc, and curated relationships for some metabolic pathways. To accommodate cases where one molecular function may be involved in any of several different biological processes, we will add new terms for biological process-specific molecular functions; each will have an is_a relationship to a more generic molecular function term, and a part_of link to a biological process term.

We have recently introduced another new relationship type, has_part, which represents a partrspective of the parent, and is thus the logical complement of the part_of relation. As with part_of, the GO relation has_part is only used in cases where A always has B as a part, i.e. where A necessarily has part B. The has_part relation allows GO to accurately represent cases where different wholes may have parts in common (e.g. one biological process may always be a subprocess of two different larger processes).

GO curators have thoroughly reviewed several sets of "internal" cross-products (i.e. those that refer to terms that are represented by the combination of other terms within the three GO ontologies), and will begin releasing them into the authoritative version of the GO in January 2010. The first internal cross-products to be released will provide computable definitions for regulation terms and for some biological process terms that can be defined by referring to other terms within the same branch of GO. Additional internal cross-products defining cellular component terms will be released later in 2010. Cross-products between GO and external ontologies such as ChEBI and the OBO Cell ontology are pending; using a pilot set of GO-ChEBI cross-products, we have added missing links and corrected some inconsistencies within the Biological Process ontology.

A content meeting on heart development was held in London, UK, on September 22, 2009. Participants added 250 new terms and modified 12 existing terms. The ontology now describes heart development beginning with the signalling events that specify the heart field and ending with the physiological hypertrophy that takes place after birth. Representatives from Flybase and ZFIN attended the meeting to ensure that the new terms, definitions and relationships were consistent with usage in their respective organisms.

We also tried a new strategy in ontology development this year in which we sent GO ontology developers to The Society for Developmental Biology Meeting for the specific purpose of developing the ontology in the area of developmental biology. Curators attended sessions at the meeting to ensure that the developmental biology portion of the BP ontology was consistent with current views in the field. New terms were added to the ontology and older terms were updated or re-arranged to fit with current understanding. This effort resulted in the addition or modification of 150 terms.

This strategy is also being employed by ontology developers attending the American Society for Cell Biology meeting in mid-December 2009.

To support the recently funded Renal GO Annotation Initiative, we have begun adding terms describing kidney development. A content meeting is planned for January 2010.

Other major changes included reorganization of Biological Process terms describing cellular component organization and addition of Biological Process terms for branching organ development. We have continued to note correlations between GO terms and taxon information, and have begun using these correlations as a set of triggers for annotation quality control. Further work is in progress on a number of topics begun in 2008, including extensive revisions to the signal transduction branch of the Biological Process ontology, as well as work on transcription terms, terms representing viral biology, and new terms for the Plant-Associated Microbe Gene Ontology (PAMGO) project.

Ontology Quality Control

We have continued to use the ontology quality control procedures previously reported, including built-in and custom OBO-Edit text and ontology structure checks, as well as reasoner-based checks that are run periodically, external to the ontology editing cycle, and generate reports that curators use to correct errors in the ontology. The release of internal cross-product definitions will allow some checks that are now run separately to be incorporated into the ontology editing and release cycle, making error correction more efficient and errors less persistent.

Ontology file management

We have introduced a release pipeline for the publication of the ontology files, allowing us to provide an "extended" version of GO (in OBO 1.2 format) to make innovative changes available to the public, while still providing versions that are compatible with existing parsers and other tools. At present, has_part relationships and all inter-ontology relationships (part_of and regulates links between Molecular Function and Biological Process) are filtered from the "standard" release of GO, as are a few OBO tags capturing term creation metadata. The standard release is available in OBO 1.2 and OBO 1.0 formats.