Annotation Advocacy Progress Report December 2010
An annotation jamboree happened in the first week of November. Curators annotated two papers on yeast transcription, to coincide with the improvements to the transcription part of the ontology, and the annotations were collated for comparison. A call-based discussion happened on 8th November.
The minutes of the jamboree are here.
A new monthly annotation call was initiated. The intended purpose is to include all curators within the GO Consortium to discuss issues that are suggested by the curators themselves. Suggested topics are placed on the GO wiki. The first call discussed the improvements to the transcription part of the ontology. Subsequent topics have been;
- the annotation jamboree on transcription as described above
- the has_part relationship
Future topics will include;
- the regulates relationship
- improvements to the signaling part of the ontology
- further annotation jamborees
New annotation datasets
Reference Genomes PAINT annotations
The GO Consortium's Reference Genomes project now provide gene association files of annotations propagated between orthologous species, based on the PANTHER family data. The Annotation Advocacy and Coordination group have been assisting annotation groups in incorporating these annotation files into their databases.
Inferred annotation from the Molecular Function-Biological Process links
Gene Association Files have been provided by the GO Consortium which contain annotations inferred from pre-existing experimentally evidenced GO annotations using the Molecular Function-Biological Process links present in GO. Annotating groups have been encouraged to incorporate these annotations into their own files.
The files of inferred annotations are here
Manual annotations to Mycobacterium tuberculosis
In September, GOA was contacted by a researcher requesting that annotations he had personally made to M. tuberculosis proteins be included in the GOA database. The annotations were reviewed by Rachael Huntley (GOA) and Rama Balakrishnan (SGD) and were deemed to be of high-quality. Almost 5800 annotations to M. tuberculosis from MTBbase were released in the GOA-UniProt file as part of GOA's September release.
GO Camp A GO Annotation Camp was organised for June in Geneva. Prior to the Camp working groups were formed for those areas where the Consortium needed to focus their efforts for improvement to annotation guidelines. The working groups defined the problems that needed to be addressed and met regularly by conference call to produce guidelines and quality control checks that would be presented for approval at the GO Camp. The seven working groups were;
- Downstream processes
- Protein complexes
- Annotation to 'Response to' terms
- Use of regulation terms
- Annotation propagation
- Annotation of high-throughput data
The guidelines that were decided as a result of the GO Camp can be viewed here
Links to slides and discussions can be viewed here
A call focusing on the transcription overhaul happened on 17th September. The slide presentation is here
A call focusing on the changes to the signaling part of the ontology happened on 18th October. The minutes are here
Improvements to the GO Consortium's annotation documentation
We are working on cleaning up the existing documentation and adding new guidelines to the GOC website.
Annotation Extension column (Column 16)
The aim of the Annotation Extension column is to allow curators to provide additional information within a single annotation. This information could take the form of gene products, GO IDs or terms from other OBO ontologies.
Since there is the possiblity of adding disparate sources of information in this column there is potential for confusion over the proper use of the column. It has therefore been decided to populate the Annotation Extension column incrementally to ensure that documentation is complete for each data type before it is incorporated into the GOC database.
The first data type to be included is cell type data using identifiers from the Cell Type Ontology.
The use of Column 16 with cell type data has been documented and approved and the guidelines are currently available here
Annotation quality control
A number of new quality control checks have been put in place. There are two types of check;
- 'Soft' QC checks, which provide annotations that need to be reviewed for consistency. These annotations will not be filtered out of the MOD GAf file. Annotations that fall under this check will be dumped out and annotating groups will be alerted to review those annotations.
- 'Hard' QC checks, which provide annotations that are incorrect and should be removed. These checks have been added to the current filtering script and will be run on the GAF files in the GOC Submissions directory. Any offending annotations will be filtered out before they are loaded into the database (and the filtered GAFs will be checked into the main gene_associations directory). An email will be sent to the submitting groups about annotations that were filtered.
For more details on the QC checks, please see Annotation Quality Control Checks
Mailing list consolidation
The GO Consortium was hosting several mailing lists, many of which were dedicated to a particular area of biology or annotation and many were not actively in use. The decision was made to merge these lists into one 'GO-discuss' list. Subscribers are strongly encouraged to use the subject line effectively to indicate the subject being discussed. This will allow subscribers to read/scan or ignore a posting. The topic being discussed should be mentioned within square brackets in the subject line. For example issues related to:
- evidence code will say [evidence]
- annotation will say [annotation]
The use of a single email list has allowed the GOC to keep everyone informed of GOC's annotation practice and guidelines and will continue to give all subscribers an opportunity to see what sort of annotation topics/issues are being discussed.
The Gene Association File format was updated to version 2.0 on the 1st June. The Annotation Advocacy and Coordination group have been involved in discussions on the file format and encouraging all annotating groups to provide their files in the new format.
Documentation on the GAF2.0 file format
Evidence Code Ontology
Michelle Gwinn-Giglio and Marcus Chibucos have sent the first proposal for the large-scale changes to the Evidence Code Ontology to the GO-discuss email list. A second proposal, that incorporates comments received from the first, is due out before the end of 2010.
Plans for 2011
In the upcoming year we will;
- Continue to promote annotation consistency through regular annotation calls and jamborees and through up-to-date guideline documentation
- Enforce annotation guidelines by implementing further quality control checks as needed
- Complete the re-organization of the annotation documentation provided by the GO Consortium through its website
- Oversee the incremental incorporation of the various data types into the Annotation Extension column (Column 16) of the Gene Association File
- Continue to assist external annotating groups who wish to provide annotation sets to the GO Consortium by providing them with the GO Consortium's annotation policies and guidelines and reviewing their annotations as necessary
- Ensure use of the new Evidence Code Ontology is adopted by all annotating groups within the GO Consortium by promoting its use and providing up-to-date documentation
- Continue to keep curators informed of updates to the ontology by providing documentation and holding conference calls