Annotation Conf. Call, August 9, 2011
Present: Midori, Val, Antonia, Emily, Susan, Yasmin, Rama, Yuri, Dianna, Karen, Jodi, Judy, Li, Donghui, Kimberley, Stan, Shur-Jen, Varsha, Doug, Ruth, Lakshmi.
1. Enabling annotators to use more than one source of data to make more granular GO annotations. (action item from 26th July)
This issue was raised during the transcription-focused annotation call on the 26th of July:
- SGD and Pombe groups currently make IC-evidenced annotations to annotate where evidence available from multiple papers. What would be the best way of making this statement?
Current methods applied:
- one IC annotation that includes multiple GOIDs in the From column and uses an unpublished GOREF as the reference (e.g. GO_REF:00000001) - PomBase
- two IC-evidenced annotations to the same GO term, each one supplying a different reference and GO ID in the reference and with fields, indicating from main sources of data the two IC annotation used. - SGD.
- What format would be the best way of capturing this information now with the annotation format as it currently exists, that could be later inherited and displayed more appropriately by any future, more sophisticated GO annotation format?
- Could such annotations need evidence codes in addition to IC ?
- Should a dedicated GO_REF be created if the PomBase format is used?
This issue relates to the previous Chain of Evidence discussions:
Discussion - the consensus was to generate an IC code, with ALL of the supporting GO_IDs in the with field, citing a new GO_REF that would describe the general procedure used by the curator to make the annotation to the more granular term (this GO_REF would describe the curation procedure, it would not be specific to an annotation and so would not contain the supporting references). The GO_IDs in the with should all have been directly annotated to the same gene/gene product identifier. The user would have there reference to explain the methodology behind the annotation, and could use the GO IDs in the with to trace back the evidence used to the primary annotations that should exist in the same gene/gene production annotation record.
- this method was attractive as often an annotation object may have many EXP-supported annotations to the same GO_ID, that could be used to make the more granular IC-evidenced annotation statement. Therefore it would not make sense to pick out annotations/references as the source of the IC statement but to allow any of these to act as a supporting primary annotation.
- this annotation format is intended as a medium-term solution to allow curators to annotate to more granular terms and would exist until a new, more-expressive annotation format is available to curators that can fully and specifically capture the set of annotations used for the creation of each annotation.
- a QC check should be created to check IC annotations: all GO IDs cited in the 'with' field should have been applied in column 5 of other GO annotations that have been annotated to the same gene/gene product identifier.
Action: Rama and Emily to draft a GO_REF and circulate it around the GO list.
Action: Rama and Emily to modify the IC evidence code documentation
Action: Chris/Amelia to create new QC check for IC annotations, to carry out a check of the presence of an annotation that applies a GO ID in column, that has been used to support an IC annotation.
2. IPI evidenced annotations; future use of this code with the 'with' field (Rama)
- IPI annotation should have an ID in the with column moving forward (irrespective of the ontology). Any curators use guilt-by-association methods to associate BP Terms?
- If the interacting partner cannot be resolved to an unique ID, then IDA should be used.
- If multiple IDs are present in the 'with' column separated by pipe, then it implies that the paper shows that geneproduct A interacts with B and gene product A interacts with C. This is implied/obvious because of #2 requirement. There are differences in with column usage by groups:
- groups no longer carry out guilt-by-association BP annotations using the IPI evidence code.
- existing annotations that apply the IPI code without a identifier in the with, will be kept in the file however new annotations should be required to use an identifier in the with.
- If multiple IDs are separated by a pipe in the 'with' column, does that mean the gp interacts with 2 other gps simultaneously or A interacts with B and independently A interacts with C? Zfin has the ability to distinguish between these 2 cases, but some groups don't (SGD can't). However as the root cause of differences appears to be database restrictions behind differing annotation looks, realistically little can be done until the CAF is adopted. In any case, IPI should be used only for these scenarios where the binding partner is known. -IGI situation is different and will be addressed separately.
Action:Rama to email the GO list to check that the IPI restriction on new annotations is acceptable to all groups before implementation.
3. Usefulness of GO term intersections as a curation aid to assist annotation consistency? (Emily)
For terms that are intersections of other GO terms, a curator could be asked to consider additionally applying the more granular GO term to their annotation set if the gene product has already been co-annotated to all the GO terms described in the logical definition of the more granular term. For instance, if a gene product has been separately annotated to 'GO:0006915 apoptosis' and 'GO:0003179 heart valve morphogenesis' - curation tools could suggest additionally annotating to 'GO:0003276; apoptosis involved in heart valve morphogenesis, perhaps via the IC method agreed from the discussion of point 1 above.
Would any curation tool be able to/be interested in developing such an aid for their curators? Should such suggestions be integrated into the web-based GAF submission tool?
Discussion: - Curators felt this would be a useful aid.
- We shouldn't be automatically integrating these inferences as they are probabilistic inferences - But offering these suggestions as a post-processing step will not work because curators should be alerted to these terms while they are curating/at the time of annotation and not after the GAF is made. - PomBase and UniProt-GOA may be able to integrate the check in the annotation tool, however many other groups will need to make use of the suggestions that the web GAF submission tool returns. Another option is to use the rule engine's webservices while making the annotation.
Action: Rama and Emily to keep the annotation group updated with progress.