Annotation consistency: HTP
(anyone can add themselves)
- Emily Dimmer, Stacia Engel, Val Wood, Ruth Lovering, Varsha Khodiyar, Susan Tweedie
Do we need to create a 'HTP' tag? Why would we need that:
- allow users to distinguish between large scale and gene-by-gene experiments
- would give the possibility to exclude this data from some analyses if required
- HTP users who use GO to validate their results need to remove HTP data which uses the same methods as theirs
2. Proposed solution(s)
3. Comments/counter arguments
[09-09-2008] I am very much for the creation of an HTP tag. Investigators have asked specifically how we are dealing with such results, some would be interested in having the option of removing them from the annotation set they use to validate their own methods.
While I am not saying that all HTP methods necessarily produce less reliable than results (as with everything there is a range of quality), the huge number of annotations that such investigations can produce does have the potential to produce a greater number of erroneous data than small-scale experiments, particularly as unlike in a small focused experiment, investigators do not have the same time/opportunity to evaluate their all results in respect to the context of other knowledge.
In addition, for species/gene sets where we might not have very much other data, we could end up with a set of manual annotations which mainly originate from one/two investigations and which might produce some bias in the resulting GO annotation set? As we could get so much data from these sources, I'd like to be extra careful and once its in and tagged, we can then make the analysis as to the general confidence level - I get the feeling that experimentalists expect us to be careful as to how we present information from such large datasets.
[09-09-2008 @ the conference call] Chris recommends adding those to ECO if we plan on using them.
[10-14-2008] In general, I am opposed to the creation of an HTP tag. This is because the volume of the experiment does not necessarily say anything about the quality of the data. In addition, the source of the data is already stated by the reference, and anyone who wants to remove that data set from their subsequent analysis can easily do so by parsing out by the source reference. Also, just because it is high volume does not mean the same algorithm or method is used for the data analysis, so removing HTP from subsequent analysis may remove data generated by a different method than the one being evaluated.
to summarize, the source tag gives the user the ability to include or exclude any data set they choose. the HTP tag does not indicate quality or type of data, only volume, and volume does not attest to quality or value.
[10-14-2008] Maybe there should be an experimental code that could be attached to any annotation. It would indicate what type of experiment was behind the annotation. Researchers could sort the annotations for data analysis according to their specific interests. If they didn't want to include microarray data, then they could eliminate all annotations with a microarray tag (possibly "mic"). That would give more flexibility than just "HTP" or not.
- Examples where a HTP paper provides annotations that seem valid
- Examples where a HTP paper does not provide data that seems reliable enough to make an annotation