Annotation Conf. Call, May 8, 2012

From GO Wiki
Jump to navigation Jump to search

Agenda

1. Documentation for the IKR evidence code has been updated (Rama)

2. Further documentation for the annotation extension field (Emily)

3. GAF inferencing procedure (Chris)

4. Comment for cell fraction terms (Rachael)

GAF inferencing procedure

The GAF inference code runs weekly and generates annotations in BP and CC based on inter-ontology links.

  • Each inferred annotation uses the same evidence code and reference as the source annotation
  • The inferred annotation has assigned_by=GOC
  • Inferred annotations are not generated if they are strongly redundant within the sub-ontology. Here, strongly redundant means there is an annotation in the same ontology with an identical evidence code, to the same GO ID or a descendant.

Unfortunately this causes problems for MOD pipelines. In order to remove stale annotations (avoiding the cascading delete problem), the MOD clears out all annotations assigned_by=GOC before reloading inferences.

However, this causes a strange cycling effect in the total number of annotations. Consider:

t1: ZFIN has IDA annotation of gp1 to mf; no BP annotation
t2: GAF inference script runs and generates an IDA annotation of gp1 to bp1, based on a mf1 part_of bp1 link
t3: ZFIN clears all GOC annotations and loads the inferred annotations and adds it to their set ** gp1 now has a BP annotation **
t4: GAF inference script runs again. The same inference is generated, but rejected, as it it redundant with the annotation produced at t2
t5: ZFIN clears all GOC annotations and loads the inferred annotations and adds it to their set* * gp1 has lost the BP annotation **

The cycle repeats, with ZFIN gaining and losing the annotations week after week

Thanks to Doug for spotting this. We've confirmed this happens with both ZFIN and MGI

The workaround is to change the redundancy checking step such that assigned_by=GOC is ignored as far as redundancy checking is concerned. This means that at t4 in the above, the BP annotation will be generated, even though it is redundant with "a previous version of itself", so to speak. The MOD can safely clear and load. There is a danger that if the MOD doesn't clear assigned_by=GOC, then they will accumulate repeated annotations, but I think this danger is low.

Comment for cell fraction terms

Propose to add a comment to all terms under cell fraction (GO:0000267) Def: A generic term for parts of cells prepared by disruptive biochemical techniques.

This was discussed previously (http://wiki.geneontology.org/index.php/RefGenome10Feb09_Phone_Conference#Fraction_terms) and the conclusion was that if cell fraction is the only experimental evidence available for that gene product, then it is OK to annotate to these terms. If other experimental evidence exists, then do not use these terms.

As there is no guidance to this effect on the terms themselves, curators not aware of the previous discussion are using these terms to annotate inappropriately.

Currently the comment states; Note that this term refers to disrupted cells, and does not necessarily correspond to any specific structure(s) in an intact cell.

Proposed comment: Note that this term refers to disrupted cells, and does not necessarily correspond to any specific structure(s) in an intact cell. This term should only be used if no other experimental evidence exists for a natural subcellular location.