Annotation Advocacy progress report for 2015
Rama Balakrishnan (SGD) - thru mid-October 2015; David Hill (MGI) and Kimberly Van Auken (WB) beginning mid-October 2015
We continue to discuss annotation issues on our biweekly annotation calls. Subjects can be anything from new annotation guidelines, Annotation Extension guidelines, quality control checks, ontology or evidence code-related discussions, tool development or updates from annotating groups.
Some of the discussion we have had this year include:
- Mechanics of these annotation calls, how to make them productive, effective. We switched to using Bluejeans. We also decided to have a annotation consistency exercise every month
- ECO presentation by Marcus Chibucos
- Jenkins GAF reports
- Availability of RNAcentral IDs for annotating RNA gene products
- Behind the scenes of how data/GO annotations (GAF) are received, processed and disseminated, how to run the GAF filtering script
- Transferring col-16 data while doing an ISS annotation
- Ontology issues with ion channels (specific vs non-specific)
- new guidelines for creating complexes in GO to be in sync with IntAct
- moving SourceForge ontology tracker to github
- GAF 2.1 release (in this release, the with/from column can handle pipe or comma)
- changing the term string for transcription terms so they are more readable
- No more merges in the Ontology.
- report on how cell lines are handled by MGI
- review of Obsolete relations in col-16 and rehousing existing annotations
- New ECO term for inter-ontology inferencing pipeline
- update on new TermGenie templates
- miRNA guideline
- how to annotate regulation of activity (unresolved)
- review of col-16 documentation on github (https://github.com/geneontology/annotation_extensions)
- What IDs should be allowed in col-16 and column-8 (with/from)?
- When to tag a term with the Do Not Automatically annotate tag
Collaboration with External Annotation Groups
- Synapse group from Broad Institute
- Please see Ontology Report on Synapse for more details.
- MaizeDB has submitted annotations
- Ongoing collaboration with a group at Peking University to annotate human lncRNAs
- Initiated collaboration with the SFLD group at UCSF to provide annotations for enzymes
Annotation to Macromolecular Complexes
A working group was formed to come up with guidelines for annotating complexes as objects. Minutes from these working group discussions are available:
GO Help desk
The Annotation Advocacy group manages the GOhelp desk with help from various consortium members. The GO help desk receives at least one email a day. For more statistics on the GO Help Desk please see the Outreach Progress Report.
Project with Trey Idekar's group
Rama Balakrishnan worked with Trey Idekar's group (UCSD) on identifying missing yeast annotations and missing terms in the ontology.
The new version of the GAF format was released. GAF 2.1 allows the use of pipes (|) and comma (,) in column 8 (with/from column) compared to GAF 2.0 which allows the use of pipes only. Pipe will indicate 'OR' and Comma will indicate 'AND'.
Technology support and development
- We have made major improvements to the gene association submission, validation and ingestion pipeline. All submissions are validated by a continuous integration system, which performs a number of validations on the association files. Many of these are ontology-driven, such as taxonomy constraint checks.
- We also perform automated inference on the association files, using OWL reasoning to ‘deepen’ annotations to more specific classes, making use of annotation extensions (Huntley et al).
- Additionally, we implemented a new metadata system, in which all gene association files submitted to the Gene Ontology Consortium have a JSON file describing the contents of the file and submission details. This now drives the selection of association files available on the website, and helps automate many of the association file processing tasks.
Plans for 2016
- Continue with the annotation consistency exercises
- Consolidate QC checks (integrate QC checks from GOA and Mike's filtering script with Jenkins)
- Document guidelines for annotating complexes on the GOC website
- Move annotation submission pipeline to Berkeley
- Replace MySQL legacy database with graph database
- Improve our production statistics reporting (available on the public website now)
- Tighten and improve the infrastructure for responding to challenges to existing literature annotations
- Continue to support and extend on an ongoing basis our Integration System for monitoring, quality control, and publication of annotations (using Jenkins). The monitoring site includes statistics and metrics of data quality.