Responses to MGI from Karen

Karen's responses to MGI

Since I won't be able to make comments in person, I'll make them here.

Re: Evidence Code Hierarchy

I think the big point that was made was that the specific hierarchy diagram that we had on our web page for years, where specific evidence codes were said to be more reliable than others was completely not true. The old hierarchy diagram even put certain experimental codes above others. However, this has now been discussed in multiple different GO meetings and every time we come to the conclusion that there is so much variability of quality within the same evidence code that it is just not reliable or true to rank one experimental evidence code over another.

In the new documentation, we have put the evidence codes into groups based on type. However, I still don't think an experimental evidence code is necessarily always better than a sequence based prediction. It seems to me that sometimes a sequence based prediction for molecular function may be more informative that a mutant phenotype.

RE: Different rules for Evidence code usage based on database status

I don't think that we are making different usage rules for different groups. But we do need to acknowledge that some of the newer groups may not have the same level of funding and person power as the RGP groups.

QUOTE from MGI: "1. ISS: the object in the "with" field must point to an organism that an experiment backing the annotation has been made in. This is not a requirement for non-RGP databases."

This "rule" was agreed to at the 2006 Annotation Camp, in the absence of any input from some groups including TIGR. In discussions of the Evidence Code Committee, my take on it was that we agreed that this rule is appropriate when making annotations from pairwise BLAST comparisons, but is not appropriate for all types of sequence comparisons. My take on this "rule" is that we agreed to revert it at the last GO meeting on the basis of Michelle's arguments that ALL sequence based analysis should use the ISS code.

RE: TAS and NAS:

I never thought that the rule was that the RGP groups would never use NAS or TAS, but would use them as little as possible. Thus, instead of making TAS annots from a review or the introduction to a paper, we will track down the original citations and annotate from the primary sources instead.

Since the Ev Code Committee (ECC) recognized that TAS and NAS are not as good as tracking down experimental evidence for the organism being annotated, the new documentation explicitly states the problems of depending on TAS and recommends not using it. However, the ECC also recognized that for some groups, there will just not be resources to allow the original lit to be tracked down immediately every time.

RE: When to use IEA vs ISS

QUOTE from MGI: "The GO should restrict the ISS code situations where a comparison is made to a sequence from a source that has had an experiment done to back up the annotation. One can think of that situation as a computational method (the sequence alignment) plus experiment What do we do with situations, like sequence + structural analysis, such as programs that predict snoRNAs and tRNAs. These predict that a particular sequence may have certain functions, etc. because they look like something; the analysis appears to be solely based on computation. More like an IEA or RCA"

Having read the snoRNA papers multiple times, I'm not sure I'd agree that there is no experimental component to predicting these genes. In the analyses that have focused on developing a computational method to look for new ones, one of the steps is actually validating the method against the experimentally proven snoRNAs. With a large number of experimentally demonstrated snoRNAs having the same function, identifying that it belongs to the box C/D family of snoRNAs is pretty good evidence that it has methylation guide activity. It's not good evidence that it will be involved in rRNA processing, since methylation guide snoRNAs target other types of RNA as well, but it is pretty good evidence for the function.

Going further, I think that even with some of the domain or motif level predictions may have an experimental basis, when the relevant domain or motif is reliably associated with a specific function. I should add that I don't think this is necessarily true for all domains, but for some it seems to be reasonable, especially for making annotations that are relatively general.