Annotation Conf. Call, March 27, 2012

From GO Wiki
Jump to navigation Jump to search

Agenda

IKR discussion


Changes made since the last discussion:

1. Requirement for phylogenetic analysis removed.

  • Removed from the new IKR draft documenation, along with a definition change to refer to the "lack of" key residues rather than "loss of"key residues.
  • SourceForge ECO request, to ask for mapping to an alternative an ECO id which did not require phylogenetic evidence as the basis for determination of loss of key residues (c.f. ECO:0000320 phylogenetic determination of loss of key residues used in manual assertion).

2. Requirement for a 'with/from' identifier when using the GO_REF added to the draft documenation:

'Where an IKR annotation statement is made using the GO_REF, inclusion of an identifier in the 'with/from' column of the annotation format, that highlights to the user the lacking residues(e.g. an alignment or rule identifier) is strongly recommended'

Continue Discussion on Redundant set of annotations

Recap on previous discussion

1. The discussion was focused around the ideal contents of the annotation file (GAF) that represents all GO Consortium annotations for a given taxon (species-owner groups). There was little discussion regarding annotation web display.

2. unique 'GOID + genePID + evidence + with/from + reference' constitutes non-redundant annotation.

3. Two annotations that are same in the above fields but differs in the Assigned_by column are considered redundant.

Unresolved/new questions:

1. How do col-16 and col-17 fit in the definition of redundant set?

2. how much annotation filtering should be carried out to IEA predictions to parent/child terms?

e.g. http://www.ebi.ac.uk/QuickGO/GProtein?ac=A0A000


InterPro perspective (Alex Mitchell):

1. There may be good case for keeping annotation predictions to both parent and child terms in the GAFs, since returning multiple GO terms through matching several InterPro entries in a hierarchy is extra evidence for confidence in a match (ie, if you're hitting a parent + child + grandchild signature and getting increasingly specific GO terms as a result, that's stronger evidence that the most specific GO term is correctly assigned than if you get a hit to a grandchild signature and a specific GO term alone)

2. My other concern regards protein families, and stems from the fact our GO mappings aren't exhaustive. Take as an example, InterPro entry for the riboflavin ECF transporter S component RibU (typical family member:E5QVT2). The proteins mediate riboflavin uptake, so the InterPro domain is mapped to GO term GO:0032217 (riboflavin transporter activity). There is also a strong suggestion that they may also transport FMN and roseoflavin too - but probably not enough evidence that an InterPro curator would give the entry GO mappings relating to those functions. At the same time there is a more general family entry in InterPro that picks up ECF transporter S components as a whole (ie, it doesn't discriminate between those that bind different substrates). This has the GO term GO:0005215 transporter activity mapped to it.

What this means is, if I put E5QVT2 through InterProScan, I'd get the GO terms GO:0005215 (transporter activity) + GO:0032217 (riboflavin transporter activity). If we remove parent terms because we consider them redundant, I'd just getO:0032217 (riboflavin transporter activity).

The first result looks to me more in line with what the protein does (it has transporter activity, including riboflavin transporter activity) whereas thesecond result looks to me like this is a riboflavin transporter, full stop.