Email sum up on ND
Requirement for all 'unknown' annotations to use ND code
A question was brought up about the requirement that ND be the only evidence code allowed for (unknown) annotations to the root nodes within the Evidence Code Committee, and was not resolved there. Discussion so far on the list is also mixed.
To me, the issue is that is at the Jan GO meeting we agreed that evidence codes are ONLY about the type of evidence used to make the annotation, and not about anything else. However, by saying that people can use the ND evidence code as a way to find all the unknown annotations, we are encoding an extra meaning into it.
The email discussion of this issue is below.
Requirement that ND be the only allowable evidence code for unknown annotations
proposed new rule for ND:
Even if an author states in a paper that there is no data available or nothing is known about the gene product in a particular GO aspect, annotation to the corresponding root node should be made with ND evidence code citing either the annotating group's internal reference or the GOC's reference on use of the ND evidence code, not a specific paper.
comment in red in draft document:
I realize that we agreed to the above statement at the last GOC meeting, but...
The more I think about it, the more I'm uncomfortable with the decision that we made that unknown annotations can only be made with ND, especially since the reason stated to do so has nothing to do with evidence, but is to help people better identify the unknown annotations.
I think this is encoding information into the evidence code that is about something other than the evidence itself. I think this is poor practice, especially when we spent so much time at the Jan GO meeting discussing that evidence codes would be JUST a statement of the method by which the annotation was made.
Jane Lomax (15 Jun 2007)
I was under the impression that we'd agreed 2. at the Jan meeting i.e. ND is now the only allowable evidence code for unknown annotations?
Midori Harris (15 Jun 2007)
I understand, and would add that it also loses the information that at the time of writing, the authors -- who are presumably pretty well informed about the genes/gene products they study --are aware of no relevant data. (Tho this concern is not as grave as that of overloading an evidence code.)
Valerie Wood (22 Jun 2007)
I'm not so sure because:
1. If authors have specifically asserted that there is no information, this is usually a statement which is made based on looking at the database (for example if the author is dealing with a gene set).
2. Papers are frequently published concurrently and it is clear that the authors have no knowledge of the parallel papers, so an author statement is not always necessarily a good indication that there is no functional data without a curator check.
3. I'm pretty sure that when the unknowns disappeared, we advised software developers that they could retrieve the unknown annotations using the ND evidence code.....
Although I agree it seems bad practice to put info in the evidence code other than the evidence itself, I think its more important that there is a very clear way to identify 'unknown' annotations.
It seems like not many of the softwares have caught up with the previous change to unknowns (for example I havn't yet managed to find a way to look at GO term enrichment which recognises the unknown annotations.... does anybody know of one?)