IC to/from ISS annotations

From GO Wiki
Jump to navigation Jump to search

Issue:

Some curators would like to supply a non-model organism with a full set of high-quality annotations, by transferring all appropriate knowledge from a well manually-curated gene product.

Currently only EXP-evidenced annotations can be propagated to other gene products via the ISS evidence code.

What should curators do when they would like to propagate the information from an IC-evidenced annotation to a range of gene products in a related organism?

History

This is an item that has reappeared in minuted GOC discussions over the years, and so it would be good to make a final decision...

November 2011 -- Raised by Ruth Lovering at the London GO Consortium meeting

April 2010 -- Raised as a possible item at the Geneva GO camp (Becky/Eleanor; UniProt), but does not appear to have been addressed.

December 2009 -- Managers call discussions mention that more detailed evidence codes (e.g.’IC-derived-from-ISS’) may be needed.

March 2009-- Minutes at Oregon GO Consortium meeting state that ‘documentation currently says IC from ISS ok’


Discussion:

In support of the IC-evidenced annotations quality:

IC-evidenced annotations are generated by a combination of:

domain-knowledge + EXP-level evidence + conservative curator judgement.

These annotations can only be made to a protein which has also been annotated with EXP-level evidence for a closely related GO term.

IC-evidenced annotations are usually made to ‘fill the gaps’ in a protein’s annotation record; to add functional information in which cannot be found from directly literature curation but which is very highly-supported from other functional data.

IC-evidenced annotation cannot be considered as high-quality as an EXP, but perhaps could be seen as superior to other non-EXP manual codes?

What impact does this current restriction have on our annotation sets?

Allowing curators to propagate information from an IC-evidenced annotation could create a maximum of 5,037 annotations from analysing current annotations (see end of page for the reasoning for this number)


Options:

1. Enable IC-evidenced annotations to be transferred via the ISS code

(Most popular option so far)

Issues:

• Could be seen as reducing the quality of the ISS code. By only allowing EXP-evidenced annotations to be transferred by ISS, means there is just one step from the supporting EXP-evidence. Enabling IC would mean that for some ISS’s there was two steps distance from supporting EXP-evidence.

Rejoinder: Although there would be two steps distance from the EXP-level annotations. IC’s are, by their nature, conservative and should be reliable.

It would be possible to trace back the annotation path using the gene product id included in the ISS’s with field, and then the GO identifier and reference referred to in the IC annotation. Once the GO Consortium moves to an annotation format that allows a more extensive evidence record, the whole annotation path can be described in the ISS annotation.

• If different non-model organism gene products needed to be supplied with the same set of annotations via the ISS code, then if IC-evidenced annotations could be transferred along with EXP-evidenced annotations in the same step, it would save curation time.

Annotation Example:

gpA is annotated to 'DNA helicase activity' by IDA, 'DNA duplex unwinding' by IC from helicase activity

gpB is similar to gpA

gpB gets 'DNA helicase activity' by ISS from EXP-evidenced annotation with gpA

gpB gets 'DNA duplex unwinding' by ISS from IC-evidenced annotation with gpA


Optional Extra: create an ECO code that identifies the specific subset of ISS's created using IC-based evidence?

In this case, the GO curation community would need to agree that propagation of IC-evidenced annotations was desirable, but that users should be informed that the GO term was propagated using non-experimental evidence.

As the GO Consortium is moving towards using ECO codes, new ECO codes under ECO:0000250 ‘Sequence similarity evidence that is used in a manual assertion.’ could be generated. One child term to describe the direct experimental support, and a separate one to indicate IC-evidence support.

However ECO ids are only included in the GPAD annotation format, which only a few groups supply. How would this data be presented in the GAF format?

2. Enable ISS annotations to be used as support for IC-evidenced annotations.

Issues:

• This option would cause some IC-evidenced annotations to be supported by a GO id that has been supplied by non-experimental evidenced ISS annotation, therefore again being two steps away from the original EXP-level support that was used to create the ISS annotation.

• The inclusion of just a GO identifier in the ‘with’ field of such an IC will not make it clear which ISS annotation (originating from which gp taxon) was chosen as the support

• Would there be concerns where curators have created ISS annotations to a wide range of species, with little thought to any taxon constraints. By then making a further IC statement could it mean compounding possible errors?

• Where a curator would like to supply ISS-evidenced statements to a range of proteins, going back individually into the target records and making IC annotations might be more time-consuming than being able to make a bulk ISS transfer, as option 1 would allow (this all depends on the facilities in your curation tool however!)


Annotation Example:

gpX is annotated to 'Ub-ligase activity' by ISS from a gpY annotation (gpY annotated to 'Ub-ligase activity' by IDA)

gpX is annotated to 'Ub-mediated catabolic process' by IC from 'Ub-ligase activity', using an ISS-evidenced (instead of EXP) source of evidence.

Optional Extra: Create a new ECO that identifies the specific subset of IC's created using ISS-based evidence?

However ECO ids are only included in the GPAD annotation format, which only a few groups supply. How would this data be presented in the GAF format?

3. Find EXP-data or create TAS/NAS annotations for the same GO term that can be supplied directly to the non-MOD identifier. If this is not possible, then do nothing.

• Accept that the target annotation record cannot be supplied with IC-evidenced data in just the same way that NAS/TAS/RCA/ISS annotations cannot be propagated between gene products.

Groups that may find transferring data from IC annotations of particular interest

This is particular interest to those groups who need to propagate annotations by ISS to supplement their EXP-evidenced annotation set for a species:

ISS annotations IC annotations Group
2109 3 AgBase
14 (ISA) 4 ASAP
619 (ISA, ISO, ISS) 5 ASPGD
3293 607 BHF-UCL
2441 80 CGD
9349 131 dictyBase
216 EcoCyc
77 12 EcoliWiki
13297 926 FlyBase
1 GDB
21 3 GeneDB_Pfalciparum
5615 1 GeneDB_Tbrucei
296 1799 Gramene
1690 117 HGNC
29811 55 JCVI
44,861 231 MGI
54 + 1058 + 2 2 PAMGO_GAT/VMD/MGG
10120 1439 PomBase
27,628 (all) RefGenome
95130 (ISS & ISO) 148 RGD
7 1 Roslin
2491 602 SGD
26 13 SGN
6068 29 TAIR
69572 205 TIGR
99287 782 UniProtKB
740 41 WormBase
526 72 ZFIN
426,418 7,309 TOTAL

* Reasoning wrt potential annotation benefit for transferring IC-evidenced data: A search of the UniProt-GOA database found 3,463 IC annotations for 2,486 distinct proteins which supply a GO term that is not duplicated (exact or children) by any other annotation to same entry. The other EXP annotations created to this set of 2,486 proteins have been used to propagate functional information (via ISS) to a further 5,118 proteins. Of those 5,118 proteins which have received ISS data– only 81 have been separately annotated to a GO term (exact or children) that was supplied by the non-transferrable 3,463 IC-evidenced annotations. Raising the possibility that a further 5037 annotations could have been created if IC-evidenced annotations could have been propagated or created using non-EXP evidence in the target entry.