Ontology Development Progress Report December 2014

From GO Wiki
Jump to navigation Jump to search

Ontology Development Progress Report

GOC Meeting Dec 2014

Prepared and Submitted by Jane Lomax and David Hill

Personnel

  • David Hill (MGI)
  • Tanya Berardini (TAIR)
  • Heiko Dietze (LBL)
  • Harold Drabkin (MGI)
  • Becky Foulger (EBI) (left Jan 2014)
  • Jane Lomax (EBI)
  • Chris Mungall (LBL)
  • David Osumi-Sutherland (EBI)
  • Paola Roncaglia (EBI)

Ontology Editing

SourceForge Requests

SF items opened (SF items closed)

Jan 2014 39 (44)
Feb 2014 50 (81)
Mar 2014 59 (41)
Apr 2014 74 (52)
May 2014 71 (75)
Jun 2014 79 (70)
Jul 2014 66 (58)
Aug 2014 76 (115)
Sept 2014 110 (52)
Oct 2014 52 (73)
Nov 2014 83 (69)
Dec 2014 26 (5)
Total 2014 746 (691)

Term Statistics

Total number of GO terms added Jan 2014 to Dec 2,2014: 1868
 Total number of GO terms added manually Jan 2014 to Dec. 2,2014: 549              
 Total number of GO terms added via TermGenie template Jan 2014 to Dec 2,2014: 1046
 Total number of GO terms added via TermGenie freeform Jan 2014 to Dec 2,2014: 273
Total number of GO terms obsoleted Jan 2014 to Dec 2,2014: obsolete 57, merged 34

Major Projects

Transition to OWL

Ontology editors are routinely using the OWL version of GO to check for logical consistency in the ontology and to create terms with logical definitions. OWL is also used as the underlying format for creating new terms via the termgenie tool.

TermGenie templates

Since Jan 2014 we have added 6 new templates to our template-based term addition tool, TermGenie. These are:

new templates:

  • cell_migration (CL)
  • biosynthesis_from (CHEBI)
  • biosynthesis_via (CHEBI)
  • catabolism_to (CHEBI)
  • catabolism_via (CHEBI)
  • metazoan_development (UBER)

templates with additional functionality:

  • add plant cells for cell_differentiation (PO)
  • add option to also create transmembrane transport for chemical

transport from to (CHEBI)

These templates utilize both classes within GO and classes from external ontologies, shown in parentheses. PO = plant ontology, CL = cell ontology, CHEBI = chemicals of biological interest ontology, UBER= Uberon.


GO relations

The ontology group has been working in coordination with the annotation group to refine and clarify the use of relations in annotation extensions. This ongoing work requires identifying a set of well-defined easy-to-use relations that will integrate with the relation ontology (RO) and will be used consistently in the ontology and by gene annotators. These relations will be used in accordance with future annotation models using the common annotation tool and will permit the folding and unfolding of contextual data.

Improved Biological Representation

Transport

We developed design patterns for transport processes with specification of start and end location and barriers transported across as well as the nature of the entity transported. Newly added OWL axioms allow inference of start and end location over part relations in the GO. 11 different TermGenie templates for transport processes are now available to users.

Membrane proteins

We developed design patterns for classes used to record the relationship of proteins and protein complexes to membranes (integral, anchored, peripheral etc).


Apoptosis

Ontology work was carried out to clean up and revise the ‘execution of apoptosis’ node. This was accompanied by a re-annotation effort and by addition of documentation to the apoptosis curation manual (http://wiki.geneontology.org/index.php/Apoptosis_Curation_Manual). Further discussion was carried out about the general ‘cell death’ term.

Cilia

This ontology development project is run in collaboration with the SYSCILIA Consortium (http://syscilia.org). It aims at reviewing and enriching the GO to better represent ciliary substructures in the cell (CC branch) as well as processes that cilia are involved in (BP branch). This is especially relevant as research on ciliary functions and ciliopathies is an emerging area of biomedical study. 51 terms were added or revised, mostly in the CC branch; work on the BP branch will take place in 2015 depending on resources.

Giardia/Dinoflagellate components

This work aims at extending the GO CC branch to cover unicellular species that have specialized, yet-unrepresented subcellular structures. The Giardia project is run in collaboration with Scott Dawson's lab at UC Davis. 33 new terms have been added so far to aid annotation of protein localization to structures unique to Giardia species (unicellular protozoan parasites). Giardiasis is the most common pathogenic parasitic infection in humans worldwide. Another taxonomic group with distinct cellular substructures is that of Dinophyceae (Dinoflagellates), flagellate protists with >2,000 living species. 10 new GO terms have been added to represent dinoflagellate-specific structures, in collaboration with Anne Thessen, with a few more requested recently.

Extracellular vesicles

This project aims at revising and enriching the GO to include new extracellular RNA-related terms. This work is carried out in collaboration with the Extracellular RNA Communication Consortium (ERCC), with feedback from the International Society for Extracellular Vesicles (ISEV) and American Society for Exosomes and Microvesicles (ASEMV), as well as input from the wider extracellular vesicle (EV) community. Discussion is underway and we anticipate that the new terms will be added early in 2015.

Viruses

The ontology work on this project has reached completion. There are now total of 344 classes under ‘viral process’ (GO:0016032) and 65 classes of ‘virion part’ (GO:0044423). A viral GO slim was also created. A paper 'Representing microbe-host interactions in the Gene Ontology' Foulger & Lomax et. al. has been submitted to BMC Microbiology.

Metabolic Pathways (Glycolysis)

The representation of glycolytic pathways is now complete in the ontology and we have begun working on the representation of glycolytic fermentation using the existing glycolysis framework.

Ubiquitin and other small conjugating proteins

As a result of a request from the BioGrid group, we have refactored the molecular function terms that represent the E1, E2 and E3 enzymatic activities for the enzymes that attach ubiquitin to proteins. This work was recently extended to include all other small conjugated proteins.

Infrastructure

Cell Type supplement

  • Continued to work on integration between CL and GO, we hold biweekly meetings of CL editors (current attendees: LBNL, OHSU, ZFIN).
  • Integrated Uberon logical definitions and TermGenie templates.
  • Continued semi-automated alignment of Uberon with the implicit GO anatomy in various areas, e.g. renal[Alam-Faruque 2014], and performed additional integration with other ontologies [Haendel 2014]
  • Created a Cell Ontology TermGenie instance to support both OMICS consortia (in use by ENCODE) and to support GO editors and annotators.
  • Created Continuous Integration job for the cell ontology as a part of the Jenkins pipeline
  • Performed link-filling and new term requests to support FANTOM5 project[Anderrson 2014]

Core GO

  • Published TermGenie paper [Dietze et al. 2014]
  • Created workflow for relation editing and relation constraint editing
  • Extensions to Relations Ontology
  • Provided support for ontology sourceforge Jamboree
  • Worked closely with ontology group and maintaining and refactoring various aspects of ontology
  • Initiated a project to unify GO biological process branch and NCI Thesaurus
  • Restore E-mail reports for active requests on Sourceforge (migrating scripts to Jenkins and using current SourceForge API)
  • Refactored pipeline for different GO builds
  • Protege Plugin for OBO-annotations in OWL, improved usability to edit OBO compliant OWL annotations for labels, references and similar in Protege
  • Commenced work on persistent cached link ontology manager (Protege plugin of high priority for GO workflow)
  • Documented and published on use of OWL in GO [Mungall 2014owled]
  • TermGenie improvements: Commit to OWL, Recent submissions page, quick ontology state check, support and use of SSH keys for SVN authentication, 7 new templates add to the GO TermGenie, Tree-based view for available templates in GO TermGenie

Publications

  • Alam-Faruque Y, Hill DP, Dimmer EC, Harris MA, Foulger RE, Tweedie S, Attrill H, Howe DG, Thomas SR, Davidson D, Woolf AS, Blake JA, Mungall CJ, O'Donovan C, Apweiler R, Huntley RP. Representing kidney development using the gene ontology. PLoS One. 2014 Jun 18;9(6):e99864. doi: 10.1371/journal.pone.0099864. eCollection 2014. PubMed PMID: 24941002; PubMed Central PMCID: PMC4062467.
  • Huntley RP, Harris MA, Alam-Faruque Y, Blake JA, Carbon S, Dietze H, Dimmer EC, Foulger RE, Hill DP, Khodiyar VK, Lock A, Lomax J, Lovering RC, Mutowo-Meullenet P, Sawford T, Van Auken K, Wood V, Mungall CJ. A method for increasing expressivity of Gene Ontology annotations using a compositional approach. BMC Bioinformatics. 2014 May 21;15:155. doi: 10.1186/1471-2105-15-155. PubMed PMID: 24885854; PubMed Central PMCID: PMC4039540.
  • Heiko Dietze, Tanya Z Berardini, Rebecca E Foulger, David P Hill, Jane Lomax, David Osumi-Sutherland, Paola Roncaglia and Christopher J Mungall TermGenie – a web-application for pattern-based ontology class generation, Journal of Biomedical Semantics [PMCID in progress]
  • Mungall, C. J., Dietze, H., & Osumi-Sutherland, D. (2014). Use of OWL within the Gene Ontology. In M. Keet & V. Tamma (Eds.), Proceedings of the 11th International Workshop on OWL: Experiences and Directions (OWLED 2014) (pp. 25–36). Riva del Garda, Italy, October 17-18, 2014. doi:10.1101/010090
  • The Gene Ontology Consortium. 2014. Gene Ontology Consortium: Going Forward. Nucleic Acids Res., In Press (doi: 10.1093/nar/gku1179)