Annotation Advocacy progress report for 2015: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
Line 54: Line 54:
The new version of the GAF format was released. GAF 2.1 allows the use of pipes (|) and comma (,) in column 8 (with/from column) compared to GAF 2.0 which allows the use of pipes only. Pipe will indicate 'OR' and Comma will indicate 'AND'. <br>
The new version of the GAF format was released. GAF 2.1 allows the use of pipes (|) and comma (,) in column 8 (with/from column) compared to GAF 2.0 which allows the use of pipes only. Pipe will indicate 'OR' and Comma will indicate 'AND'. <br>
http://geneontology.org/page/go-annotation-file-gaf-format-21
http://geneontology.org/page/go-annotation-file-gaf-format-21
===Technology support and development===
* We have made major improvements to the gene association submission, validation and ingestion pipeline. All submissions are validated by a continuous integration system, which performs a number of validations on the association files. Many of these are ontology-driven, such as taxonomy constraint checks.
* We also perform automated inference on the association files, using OWL reasoning to ‘deepen’ annotations to more specific classes, making use of annotation extensions (Huntley et al).
* Additionally, we implemented a new metadata system, in which all gene association files submitted to the Gene Ontology Consortium have a JSON file describing the contents of the file and submission details. This now drives the selection of association files available on the website, and helps automate many of the association file processing tasks.


===Plans for 2016===
===Plans for 2016===
Line 59: Line 64:
# Consolidate QC checks (integrate QC checks from GOA and Mike's filtering script with Jenkins)
# Consolidate QC checks (integrate QC checks from GOA and Mike's filtering script with Jenkins)
# Document guidelines for annotating complexes on the GOC website
# Document guidelines for annotating complexes on the GOC website
# Technology
## Release AmiGO 2.4
## Develop additional improvements to Term Enrichment
## Move annotation submission pipeline to Berkeley
## Replace MySQL legacy database with graph database
## Improve our production statistics reporting (available on the public website now)
## Tighten and improve the infrastructure for responding to challenges to existing literature annotations
## Continue to support and extend on an ongoing basis our Integration System for monitoring, quality control, and publication of annotations (using Jenkins). The monitoring site includes statistics and metrics of data quality.

Revision as of 01:38, 8 December 2015

Management

Rama Balakrishnan (SGD) - thru mid-October 2015; David Hill (MGI) and Kimberly Van Auken (WB) beginning mid-October 2015

Annotation Consistency

Annotation calls

We continue to discuss annotation issues on our biweekly annotation calls. Subjects can be anything from new annotation guidelines, Annotation Extension guidelines, quality control checks, ontology or evidence code-related discussions, tool development or updates from annotating groups.

Some of the discussion we have had this year include:

  1. Mechanics of these annotation calls, how to make them productive, effective. We switched to using Bluejeans. We also decided to have a annotation consistency exercise every month
  2. ECO presentation by Marcus Chibucos
  3. Jenkins GAF reports
  4. Availability of RNAcentral IDs for annotating RNA gene products
  5. Behind the scenes of how data/GO annotations (GAF) are received, processed and disseminated, how to run the GAF filtering script
  6. Transferring col-16 data while doing an ISS annotation
  7. Ontology issues with ion channels (specific vs non-specific)
  8. new guidelines for creating complexes in GO to be in sync with IntAct
  9. moving SourceForge ontology tracker to github
  10. GAF 2.1 release (in this release, the with/from column can handle pipe or comma)
  11. changing the term string for transcription terms so they are more readable
  12. No more merges in the Ontology.
  13. report on how cell lines are handled by MGI
  14. review of Obsolete relations in col-16 and rehousing existing annotations
  15. New ECO term for inter-ontology inferencing pipeline
  16. update on new TermGenie templates
  17. miRNA guideline
  18. how to annotate regulation of activity (unresolved)
  19. review of col-16 documentation on github (https://github.com/geneontology/annotation_extensions)
  20. What IDs should be allowed in col-16 and column-8 (with/from)?
  21. When to tag a term with the Do Not Automatically annotate tag

Collaboration with External Annotation Groups

  • Synapse group from Broad Institute
  • MaizeDB has submitted annotations
  • Ongoing collaboration with a group at Peking University to annotate human lncRNAs
  • Initiated collaboration with the SFLD group at UCSF to provide annotations for enzymes

Annotation to Macromolecular Complexes

A working group was formed to come up with guidelines for annotating complexes as objects. Minutes from these working group discussions are available:
http://wiki.geneontology.org/index.php/Protein_Complex_Conference_Call_June19,_2015 http://wiki.geneontology.org/index.php/Protein_Complex_Conference_Call_July15,_2015

GO Help desk

The Annotation Advocacy group manages the GOhelp desk with help from various consortium members. The GO help desk receives at least one email a day. For more statistics on the GO Help Desk please see the Outreach Progress Report.

Project with Trey Idekar's group

Rama Balakrishnan worked with Trey Idekar's group (UCSD) on identifying missing yeast annotations and missing terms in the ontology.

GAF 2.1

The new version of the GAF format was released. GAF 2.1 allows the use of pipes (|) and comma (,) in column 8 (with/from column) compared to GAF 2.0 which allows the use of pipes only. Pipe will indicate 'OR' and Comma will indicate 'AND'.
http://geneontology.org/page/go-annotation-file-gaf-format-21

Technology support and development

  • We have made major improvements to the gene association submission, validation and ingestion pipeline. All submissions are validated by a continuous integration system, which performs a number of validations on the association files. Many of these are ontology-driven, such as taxonomy constraint checks.
  • We also perform automated inference on the association files, using OWL reasoning to ‘deepen’ annotations to more specific classes, making use of annotation extensions (Huntley et al).
  • Additionally, we implemented a new metadata system, in which all gene association files submitted to the Gene Ontology Consortium have a JSON file describing the contents of the file and submission details. This now drives the selection of association files available on the website, and helps automate many of the association file processing tasks.

Plans for 2016

  1. Continue with the annotation consistency exercises
  2. Consolidate QC checks (integrate QC checks from GOA and Mike's filtering script with Jenkins)
  3. Document guidelines for annotating complexes on the GOC website
  4. Technology
    1. Release AmiGO 2.4
    2. Develop additional improvements to Term Enrichment
    3. Move annotation submission pipeline to Berkeley
    4. Replace MySQL legacy database with graph database
    5. Improve our production statistics reporting (available on the public website now)
    6. Tighten and improve the infrastructure for responding to challenges to existing literature annotations
    7. Continue to support and extend on an ongoing basis our Integration System for monitoring, quality control, and publication of annotations (using Jenkins). The monitoring site includes statistics and metrics of data quality.