Annotation consistency: IEA, ISS, IC Usage Discussion

From GO Wiki
Revision as of 17:00, 6 December 2007 by Pascale (talk | contribs)
Jump to navigation Jump to search

Email exchange

Tanya Berardini, Emily Dimmer, Pascale Gaudet, Chris Mungall, Kimberly VanAuken


Hello,

I am going through the action items for next week's reference genome conference call. At the reference genome meeting we all eagerly volunteered to come up with some guidelines as to how to use ISS, IEA and IC (too much coffee??). Here's the major points I remember:

  • IC: came up during the example of a translation initiation factor annotated to a good function and component but had a root annotation for process. It was suggested maybe that could have got an IC annotation. (gene is mouse Eif2b2)
  • related is ISS and IEAs: David pointed out this gene probably had a good IEA annotation from interpro. The question was how to address this: dicty and pombe would make this an ISS annotation, to avoid the root annotation. The problem is, InterPro domains and mappings can change so maybe that's not such a good practice after all. However, since IEAs ware not displayed in AmiGO (and perhaps excluded in certain studies are poorly reliable), some valuable information is not used.

Was that your recollection as well? Can we make a plan to discuss that at some point and come up with suggestions? (It doesn't have to be before the call next Tuesday; but I wanted to write down the important discussion points).

Cheers, Pascale


Hi,

My memory of the Reference Genome discussion for IEA was that as many of the IEA methods used have increased in quality over the last 5 years, some groups are becoming more accepting of the data they provide. I thought that Judy's suggestion was that a group should overview these methods and decide which sets of data should be displayed in AmiGO (ie. if an annotation is supported by multiple independent IEA methods).

If that is a true memory it might be handy to draw up a list of the IEA methods that we want to discuss. From the GOA perspective, I would like to include:

  1. InterPro2GO
  2. SwissProt Keyword 2GO
  3. EC2GO
  4. HAMAP2GO
  5. UniProt subcellular location 2GO
  6. Ensembl Compara - projection of annotation from ortholog data.

Cheers, Emily


Hi,

Thanks for getting the ball rolling on this one, Pascale.

My recollection of this discussion was that it stemmed from a concern about annotation consistency and that Suzi was concerned that there seemed to be lots of 'missing' ISS annotations in the reference genome work. One response to this was that for many gene products, the IEA annotations were actually providing sufficient information and that perhaps IEAs suffer from past negative perceptions that are no longer accurate given that substantial feedback has greatly improved some of the mappings.

Just speaking for myself here, when curating I often make the very pragmatic decision to focus on getting as many experimental annotations in as possible and then, time permitting, go back and try to fill in ISS annotations where we don't have experimental data. In the meantime, though, I do look at our existing IEA annotations and find that many of them are just fine and, if included in AmiGO, would help plug some perceived annotation holes.

I agree with Emily that it would be worthwhile to look at the various IEA methods and come up with some metrics for evaluating their accuracy. I'm not sure what these could be, but (thinking out loud) perhaps there's a way to determine what percentage of these mappings are supported by experimental data in any organism.

Also, with the proposed changes to the ISS branch of the evidence codes, is it worth considering promoting some of these mappings to one of the new IS* evidence codes, if the method ultimately stems from sequence analysis?

Emily, does some of the BioCreAtIvE work speak to this subject as well? I seem to recall Evelyn talking about this issue at past GO meetings and commenting that the BioCreAtIvE work provided support for the idea that some electronic annotations were actually of high quality.

Cheers, Kimberly


I agree with everything Kimberly says, from the strategy of the curation to the visibility of IEA annotations. We also strive for experiments and only when we have nothing else do we search out ISS annotations. Finding experimental evidence in another organism to make an ISS with is very time-consuming. For many of the genes where the ISS seems 'obvious', we have IEA annotations to the appropriate terms.

David


I'll reply to points that Kimberly and David raised all in one email:

1. From Kimberly:

"Also, with the proposed changes to the ISS branch of the evidence codes, is it worth considering promoting some of these [IEA ]mappings to one of the new IS* evidence codes, if the method ultimately stems from sequence analysis?"

Doesn't this sound eerily like the THMM (or whatever that acronym is) discussion that's been swirling around the mailing list for ages? Is it IEA? Is it ISS? From what Emily has said, it seems like _some_ of the (currently) IEA methods might be in this same type of situation - the mappings are heavily curated.


2. From David: "Finding experimental evidence in another organism to make an ISS with is very time-consuming. "

While it might be difficult to find this type of information for many of the genes that we encounter day-to-day, for the RefGenomes genes, in particular, being able to look at all of the ortholog annotations through Mary's graph, shortcuts this time consuming process. Therefore I think that Pascale's point of 'why not make the ISS annotation?' is well taken. I think we should recommend that this be done in cases where the alternative is to have a 'root' annotation or an IEA one.

3. My own point: It seemed like one solution would be to display the IEA annotations in AmiGO so that the 'holes that are not actually holes but IEA annotations that are invisible' would not be so obvious. I'm sure that the AmiGO working group has this in their sights so it may already be in the works.

Tanya


It would be nice to have some type of alert that let us know when a possible ISS annotation could me made to a gene where we have an annotation to the root.

What about 'promoting' IEA annotations when there is not other info available. we do this now on our web site. If we have a manual annotation to the root and there is an IEA annotation, the display of the root annotation gets suppressed. Is there a way we can keep a zillion IEA annotations that are redundant with manual annotations from being displayed. This is just an idea I'm formulating. It would be nice if we could provide some type of best set of annotations to a user.

David