Email sum up on RCA

From GO Wiki
Revision as of 06:57, 20 September 2007 by Maria (talk | contribs) (New page: Scope of the RCA evidence code ------------------------------------------------------- Here is my analysis of and recommendations for the future of the RCA evidence code: Having reviewed...)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Scope of the RCA evidence code


Here is my analysis of and recommendations for the future of the RCA evidence code:

Having reviewed six papers of the type that originally prompted SGD to request the RCA evidence code, it is clear that all of these methods described within these papers include analysis of experimental data, e.g. expression data, two hybrid data, mass spec proteomic data, etc. Some also include sequence based data, but it is never the entire basis of the analysis. Two of the analyses (Troyanskaya et al, and Wade et al.) combined expression data with promoter sequence data, a type of sequence data not typically considered in analyses appropriate for the ISS code. Two other analyses (Baxter et al. and Alves et al.) combined structural analysis with either experimental results or with a mathematical model designed to test which mechanisms could reproduce existing published experimental results. Some RCA analyses also utilize existing functional annotations for characterized genes (Gat-Viks et al.).

To summarize, all of these analyses combined multiple types of data, generally including experimental data, such as expression data or protein-protein interaction data. Some include sequence data, in this set either promoter sequence info or structural information, but none are based solely on sequence based information.

Analyses based purely on sequence similarity based data, including sequence similarity with experimentally characterized gene products, as determined by pairwise or multiple alignment; prediction methods for non-coding RNA genes; recognized functional domains, as determined by tools such as InterPro, Pfam, SMART, etc.; predicted protein features, e.g., transmembrane regions, signal sequence, etc.; structural similarity with experimentally characterized gene products, as determined by crystallography, nuclear magnetic resonance, or computational prediction; should use the ISS evidence code (or the IEA code if it is not reviewed by a curator). The documentation does not currently list mapping files such as InterPro2GO, but I would include this as sequence-only based data since the basic analysis is all based on the sequence of the gene product and the hits by various sequence analysis methods.

As a curator-reviewed code, annotations made with the RCA code must be reviewed/assigned by a curator.

The documentation currently lists 'Text-based computation (e.g. text mining)' as acceptable for this evidence code. In the absence of specific examples of how this might be applied, I would suggest removing mention of 'Text-based computation' until we have an actual example or two to look at to see whether it fits into this evidence code or not.

Accepting these recommendations would bascially return the RCA code to its original intent. It would also be consistent with the recommendation of the Evidence Code Committee (ECC) to overturn the 2006 Annotation Camp's recommendation to use RCA for sequence similarity comparisons where you could not put an experimentally characterized ortholog into the with column and also with the January 2007 GOC meeting decision that all methods based on only sequence-based info should use the ISS code.

The GOC may not wish to consider renaming the evidence code, but having reviewed this set of papers, I think the phrase "Integrated Computational Analysis" would be a more descriptive name and more consistent with how authors of these types of methods describe them (the red highlighting in the sample papers page, url below, shows where the authors used that word). I'm not sure this is sufficient to make clear the distinction between these methods and sequence-only based methods, but it is better than "Reviewed Computational Analysis". In addition, right now the RCA documentation would exclude an analysis of this type if it was performed internally by a database group and not published. Thus, if the GOC is amenable to the idea of changing the name of the evidence code, I would suggest that we call it "Integrated Computational Analysis" with the abbreviation ICA.


Here are links to supplemental information regarding this evidence code:

Examples of the types of analyses the RCA code was intended to cover:

 http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCA-ExamplePapers.html

History of the RCA code:

 http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAhistory.html

Summary of controversy over RCA vs ISS in Evidence Code Committee:

 http://genetics.stanford.edu/~kchris/go/evCodeIssues/RCAvsISScontroversy.html

Proposed draft of new documentation for this code:

 http://www-dev.yeastgenome.org/draftGO/go/www/GO.evidence.new.shtml#ica
 (note that original RCA doc is still present for comparison)