08 MAR 2011 RefGen Phone Conference (Archived)

From GO Wiki
Jump to navigation Jump to search

New Annotation Project

Apoptosis_Reference_Genome_Targets Project leaders: UniProtKB GOA team, Emily Dimmer

Tracking gene product annotation status form each database

http://wiki.geneontology.org/index.php/Annotation_File_Format_Proposal#Proposed_file_format (already implemented by GOA)


Reminder: this was an item brought up at the GO Consortium meeting in Stanford March 2010. Here are the minutes from the discussion:

GAF file

  1. GAF2 update
  2. Proposal for new gene product data file format for submitting gene/protein data independently of annotations
    • proposed GP data file format spec
  3. Proposal for new annotation file format containing only annotation data (complementary to the GP data file format above)
    • proposed annotation file spec
  4. Proposal to add a new column to the GAF to indicate if an annotation was reviewed by a curator or not.
    • Tony: xrefs should go in a separate file solely for mapping
    • Pascale: some reference genome stats could also go in this file
    • Judy: will take too long for all groups to catch up with this change - let's just go ahead with this change
    • Emily: GOA will start producing this file redundantly over the next couple of months
    • Tanya: is this a GAF 3.0 file? We've only just gone to 2.0!
    • Emily: no, it's the same format, the file is just split and gp information only provided once
    • It's a non-redundant GAF file (NAF) - easy to reconstitute a GAF 2.0 from the NAF + GAF
  • Action item: Amelia to write up documentation for split GAFs
  • Action item: We will roll out new split GAF for all groups

Proposal for definition of Comprehensively annotated: (Emily)

Comprehensively annotated - this tag must be accompanied by a timestamp. Definition: Indicates that a gene product has been the focus of a manual annotation effort whereby a curator has been able review the current literature providing characterizing information for the gene product and has annotated to the principal functions, process and component GO terms that describe its main activities and locations. Its not required for a curator should have annotated every single paper published on a gene product if such additional annotations would only duplicate information currently provided by the annotation set. The timestamp is an essential component of this label, as it indicates the date at which the curator last reviewed the entire annotation set and literature available for the gene product. Therefore it is possible that further information published after this date that would be suitable for GO annotation may not have been added, and similarly it might be the case that GO annotations with a later timestamp exist which would be created during routine update procedures, or as a by-product from annotation efforts focusing on different gene products. Each time a curator re-reviews the annotation set for a previously 'comprehensively annotated'-tagged gene product, the associated timestamp should be updated.

PAINT and M2P Inferenced annotations

  • Are all the groups familiar with where the PAINT and MF-BP inferences are on the FTP site? Are there any technical questions? -Rama
  • GOA has started to provide mf2bp inferences in the UniProt file for all sources/species (as we need to ensure we have a complete and consistent data set) Edimmer
  • Are the annotating groups ready to load the mf-Bp inferences.

o SGD has looked at some of these annotations and we have some concerns. o For example yeast ADK1 is a adenylate kinase and we already have an annotation to ADP biosynthetic proces. The MF-BP inference is to the term nucleotide phosphorylation. o In another example, yeast AGE1 is annotated to GTPase activator activity and an inference has been made to 'regulation of catalytic activity'. This is a high level term and not something we would capture in our manual curation pipeline. o But it is hard for us to review all these annotations to decide which to keep etc. I would like to know how other groups are dealing with these annotations.

  • GOA only includes RefGenome annotations from PAINT for human, chicken and other species. Should we take more files directly from GO - or wait for individual MODs to include the annotations and take them from the MOD GAFs? Edimmer
  • I believe GOA has already started to provide the PAINT annotations through the files and they come with ISS evidence code. In talking to Mike Cherry about this he thinks apart from the fact that these annotations are made from the Ref.genome project, the new evidence codes is the novelty part of these annotations. If the GOC adds documentation on these evidence codes (they are available in the ECO) can the annotating groups use these new evidence codes?