Annotation Advocacy progress report for 2012
!!Report in progress!!
We continue to discuss annotation issues on our biweekly annotation calls. Subjects can be anything from new annotation guidelines, quality control checks, ontology or evidence code-related discussions, tool development or updates from annotating groups.
Some of the discussion we have had this year include;
New IKR evidence code: A type of manually-curated evidence derived from sequence analysis, characterized by the lack of key sequence residues. All annotations that apply this evidence code should use the 'NOT' qualifier.
Cell fraction-type terms: Since these terms are referring to artifactual components that are produced by disruptive biochemical techniques, it was agreed that these terms should be removed from the ontology. A mapping to normal cellular component terms was agreed for re-annotation of gene products.
Annotation redundancy: We have discussed what makes an annotation redundant focusing on the ideal contents of the annotation file (GAF) that represents all GO Consortium annotations for a given taxon (species-owner groups).
TermGenie: The new web-based GO term creation tool developed by the GO software group was demonstrated and the user manual was presented.
New Rule Engine: The new automated rule engine was presented by the software group that verifies the consistency and correctness of submitted GO annotations.
New annotation datasets
Transcription factor annotation by NTNU
We are working with Prof. Astrid Lægreid's group in Norway to manually annotate about 750 mammalian Transcription factors. We are helping them in the manual curation process which includes helping them identify the right granularity of term and right evidence code based on the evidence presented in the literature.
Incorporation of annotation extension information in GO annotations
The Annotation Extension field of the Gene Association File (GAF) 2.0 format allows GO terms to be post-composed, during manual curation, using gene product or chemical identifiers or terms from GO or other OBO ontologies. The aim of the information added into the Annotation Extension field is to provide more context to the GO term identifier entered into the GO ID field (Column 5) of the annotation file. During 2012 several groups, including UniProt, MGI and PomBase, have been creating annotations that include annotation extension information. We currently have over 30,000 (Rama can you check this figure in the GOC database? This is from our database, so there will be some differences) annotations that have annotation extension information.
Cardiac Conduction and Apoptosis
The GOC worked with experts in the community to refine these two areas of the ontology. Annotations to gene products in these processes are currently being reviewed and updated to reflect the changes in the ontology.
Transcription annotation manual The annotation manual is now available from the GOC website http://www.geneontology.org/GO.annotation.conventions.shtml#txn
Annotating to 'part' terms (nuclear_part, etc) Logically, the x-part terms are equivalent to their x counterparts for the purposes of standard GO annotation (i.e. without additional qualifiers). The part terms will be added to the 'high-level terms not for annotation' category. Groups will be expected to migrate their annotations.
Annotation Extension column (Column 16)
The aim of the Annotation Extension column is to allow curators to provide additional information within a single annotation. This information could take the form of gene products, GO IDs or terms from other OBO ontologies.
An introduction to the annotation extension column is now on the GOC website (http://www.geneontology.org/GO.annotation.extension.shtml). Further, more detailed documentation about each relation used for making annotation extension statements is available on the GO wiki (http://gocwiki.geneontology.org/index.php/Annotation_usage_examples_for_each_annotation_extension_relation)
Improvements to the GO Consortium's annotation documentation
The GOC has put together documentation on minimal requirements for submitting/supplying GO annotations to the GOC-
The GO evidence code documentation has been updated to expand the use of IC with multiple references and evidence codes as source. IKR documentation was also updated.
Annotation quality control
New checks were implemented in 2012 as follows;
1) All IC annotations should include a GO id in column 8 (with/from)
2) All IDA annotations should NOT include any id in column 8 (with/from)
3) ND-evidenced Annotations to root nodes only http://geneontology.org/GO.annotation_qc.shtml#GO_AR:0000011
4) ND annotations should NOT have PMID in the reference column
5) References in the GAF -Column 6 should be of the format SGD_REF:S000047763|PMID:2676709. References like PMID:PMID:14561399', or PMID:unpublished or GOC:unpublished should be filtered out.
6) Annotations to GO:0005488, 'binding' should be made with IPI and the interacting partner should be in the 'with' column.
7) Annotations with IPI evidence code made after Jan 1, 2012 that don't have an ID in the 'with' column should be filtered out (grand father old annotations)
8) Annotations with IKR ev.code should have NOT qualifier in Column 4 otherwise they will be filtered out.
9) Annotations with GO_REF:0000047 should have an ID in column 8 (with/from) otherwise they will be filtered out.
10) For groups that are not actively updating their annotations and are an authority for certain taxon IDs, the taxon ownership has been removed so that the most current annotations from UniProt-GOA can be uploaded into the GO Database and AmiGO.
GO helpdesk staff
Rama Balakrishnan, Rachael Huntley, David Hill, Harold Drabkin, Jane Lomax, Kimberly Van Auken, Tanya Berardini, Susan Tweedie, Rebecca Foulger, Prudence Mutowo-Meullenet, Poala Roncaglia.
GO help messages now managed using the JIRA system
Since September 2012, all queries to GO are now directed to one email list: firstname.lastname@example.org. Mails that come in to this list automatically create a JIRA issue. To divide up the work of answering gohelp queries, we use a gohelp rota where each person is 'on duty' for 1 week at a time. It is the responsibility of the person on duty to make sure that all queries are answered, either by answering it themselves, or by assigning the issue to someone else to answer.
The new system has been working well and it allows categorization and tracking of queries much simpler.
Statistics for the past year
Can we do this? - might take a long time.
GPAD/GPI: An alternative means of exchanging annotations. The GPAD 1.1 format is designed to be more normalized than GAF, and is intended to work in conjunction with a separate format for exchanging gene product information (GPI1.1). This allows separation of data on genes and gene products, objects being annotated, from the annotation data. The GPAD format also allows for use of Evidence Code Ontology (ECO) codes.
The GPAD/GPI format also;
- allows unannotated gene products to be submitted to the GO database
- reduces the amount of redundant gene product information in the GAF files
GP2RNA: A gp2rna file is a tab-delimited file that provides mapping between the MOD database object IDs and ncRNA gene/sequence IDs. Contribution of this file is a new requirement (starting 03/2012).
GP_unlocalized: If annotating groups have provided annotations to gene identifiers that have been manually curated from the literature, but where no sequence or genomic location is known (such genes have been variously described as 'unlocalised genes', 'single heritable traits' or 'phenotypic orphans'), then the group should also provide a gp_unlocalized file containing all the non-genome localized identifiers available in their database, including those not annotated to GO.
Towards the goal of integrating external annotations for a given taxon, we have come up with some guidelines to define when two annotations for the same gene can be called redundant annotations. These rules will be used when the annotations are all pooled in a central location (Common annotation framework).
Plans for 2011
In the upcoming year we will;
- Continue to promote annotation consistency through regular annotation calls and jamborees and through up-to-date guideline documentation
- Enforce annotation guidelines by implementing further quality control checks as needed
- Complete the re-organization of the annotation documentation provided by the GO Consortium through its website
- Continue to assist external annotating groups who wish to provide annotation sets to the GO Consortium by providing them with the GO Consortium's annotation policies and guidelines and reviewing their annotations as necessary
- Develop methods to track provenance of evidence and have them formulated correctly in Evidence Code Ontology (ECO)
- Ensure use of the new Evidence Code Ontology is adopted by all annotating groups within the GO Consortium by promoting its use and providing up-to-date documentation
- Continue to keep curators informed of updates to the ontology by providing documentation and holding conference calls