Annotation Conf. Call, March 13, 2012
SGD: Rama, Karen, Jodi
UK: Ruth, Varsha (BHF-UCL), Emily, Yasmin, Prudence (UniProt-GOA), Susan (FlyBase), Alex Mitchell (InterPro), Andy Yates and Javier Herrero Sanchez (Ensembl)
USC: Paul Thomas
MGI: Li, Mary
New documentation for TermGenie Help (TermGenie tool)(Yasmin)
Comments for new ISC 'ISS from IC evidence' evidence code
- Michelle and Marcus's concerns regarding the need for a new code.
* Naming preferences:
Inferred from Sequence Similarity, using Curator Judgement (makes it seem as if the curator is judging an alignment)
Inferred by curator based on sequence similarity (could still be mistaken to mean a curator reviewed a sequence alignment)
A. Inferred from Sequence similarity from Curator inferred annotation
B. Curator inference transferred based on Sequence Similarity
Discussion on the appropriate mechanism for filtering out 'redundant annotations'
- specifically aimed at the authoritative GO Consortium annotation files for a specific species. Background available here
Developing new documentation for IKR, continued from the 24th Jan GOC annotation call
Yasmin presented the new TermGenie documentation, if there is any information missing in this document, please email Yasmin.
Several felt that we are overloading the evidence code to indicate more than just the evidence.
Marcus (and Michelle) expressed concerns that the evidence code used in this fashion is trying to capture evidence from previous annotations (chain of evidence) and ECO is not the place to do it.
The issue is with the annotation system and we should document this to make sure we take care of it in the context of discussions on increased expressivity.
Decision on this evidence code is hence deferred and a working group should be set up to talk about the 'chain of evidence' issue, and present options back to the annotation group.
Emily will talk to Ruth about how to handle these in the mean time.
The goal for defining non-redundant annotation at this call was to generate a set of guidelines regarding the annotations that should be integrated or could be filtered out, from an annotation file that represents all GO Consortium annotations for a given taxon (species-owner groups).
Therefore we are talking about redundancy with respect to contents of the GAF file and not what the individual groups display at their sites. Website displays must often meet the needs of more naive users, who may want further filtering applied to the annotation display. It is important that this topic should be discussed (and guidelines offered to external databases integrating and displaying GO annotations), but at a different call.
- Attendess agreed that a unique 'GOID + genePID + evidence + with/from + reference' constitutes non-redundant annotation.
- Two annotations that are same in the above fields but differs in the Assigned_by column are considered redundant.
- groups should work with each other to establish how to deal with such annotation redundancy (where more than one group has curated the same paper). Hopefully the arrival of the common annotation framework tool will mean that such duplication of effort will be avoided more in future.
- Two annotations that are same in the above fields but differ in the 'with/from' column ID (for example 2 InterPro IDs mapping to the same GO term) are considered non-redundant, as useful information is captured in this field. In the GAF file, some groups already provide these IDs in the same annotation row separated by a pipe (|). However this filtering activity is optional
- There was some discussion about defining redundancy in terms of parent-child terms. Some groups keep only the granular annotations because annotations to the parent term contributes to the clutter. But most of us did not think this was a huge issue.
- Paul asked the InterPro and Ensembl group to talk about what their concerns:
- Andy Yates (EnsemblGenomes). When EnsemblGenomes integrates annotations, redundant annotations are obtained from different sources of data, and also where Compara transfers the same term from different orthologs. We do filter the redundant Compara annotations created. We would be interested in obtaining guidelins on how best to display a concise set of GO annotations.
- Alex (InterPro). It is useful to keep all of the InterPro identifiers that have predicted a GO annotation, as it provides provenance for the annotation. InterPro also is aware of the annotation redundancy that its mappings can provide, it can overwhelm some users. The InterPro group is intending to move with a new mapping method which would reduce such redundancy. However I have concerns that over-zealous parent-child filtering could remove importatnt information. Will supply examples where such filtering might supply a incomplete picture of a protein's functionnings.
IKR evidence code
Not discussed, due to lack of time.