Summary of ISS with (blank) proposal
Putting method/program names into the with field for ISS
I've reviewed several papers where ISS is the appropriate code, but for which only a method could be placed into the with field. Thus, I have some comments on how we might want to do this. I'll start with a little background.
At the last GO meeting, we agreed to "Always use a WITH column for IEA and ISS, containing a program name if necessary. For example, make a ref to tRNAscan." However, we did not work out how to implement doing this.
As phrased in the minutes, it sounds like the idea is just to put the name of the method in the with column. If that's all that is required then it's fairly simple to find an appropriate text string from a paper to put in the with column. However, I'm kind of assuming that we don't want to allow uncontrolled text strings in the with column mixed in with things of the format namespace:ID.
Currently, to put something in the with column, it must have a namespace as well as an ID, e.g. Swiss-Prot:P51587. For program names or methods, there are a couple problems with trying to put them into this type of format. One is that some of the methods to which research refer are not given an official name. The second, which applies to all the papers I've read so far, is that none of them have a namespace.
If we need to format these in a way that is compatible with the namespace:ID format, then GO could generate a 'database' of collected methods. An entry in the GO.xrf_abbs file like the one below could define a namespace for such a collection.
abbreviation: GO_CM database: Gene Ontology Database collected methods object: Accession (for collected method) example_id: GO_CM:0000001
Then for the second part, we'd have to start a collection of these various methods, probably just a file somewhat like the GO.xrf_abbs file. For this, there are a couple issues to deal with:
1) The authors of methods don't always give them a clear name.
2) There isn't always a single source reference. For programmatic methods, there is often a single source reference. However, for the consensus features for either box C/D or box H/ACA snoRNAs. I wouldn't be comfortable designating a single reference as the source. In these cases, I'd be happier if we could associate a number of relevant refs to the 'method'. In other cases, an algorithm is mentioned by name, but no reference is cited.
However, with those issues in mind, perhaps collecting this information would work.
- accession: accession ID given by GO
- method name: the name given to a program by the authors, when available, or a descriptive name based on the paper
- developed in reference: the ID, e.g. PMID:xxxxx, for the reference describing the development of a method, when applicable, but would not be required. Can be filled with Not Applicable) for cases like 'box C/D snoRNA consensus' where there isn't a specific program that was developed. I don't know how we want to deal with cases like 'TMpredict' where they cited a reference that appears irrelevant or 'Kyte-Doolittle algorithm' where I didn't see a citation for the algorithm.
- other references: Useful for cases like 'box C/D snoRNA consensus' where there isn't a specific program that was developed, but where you can cite 1 or more references which describe what the consensus is.
- method classification: maybe this tag isn't necessary, but I thought it might be useful, particularly if we ever get to a situation where we have this in a database where you can search on this field.
Below is what I would fill in for each field for the references listed at: http://genetics.stanford.edu/~kchris/go/evCodeIssues/withForISS-ExamplePapers.html
The comments in parentheses are just comments to correlate the info below with the Example papers, and would not be included in the proposed file.
accession: GO_CM:0000001 method name: box C/D snoRNA probabilistic model developed in reference: PMID:10024243 method classification: box C/D snoRNA gene prediction (would be used for example #1) accession: GO_CM:0000002 method name: box C/D snoRNA consensus developed in reference: Not Applicable other references: PMID:8674114; PMID:16484372 method classification: box C/D snoRNA gene prediction (would be used for example #s 2 & 3) accession: GO_CM:0000003 method name: snoGPS developed in reference: PMID:15306656 method classification: box H/ACA snoRNA gene prediction (would be used for example #4) accession: GO_CM:0000004 method name: box H/ACA snoRNA consensus developed in reference: Not Applicable other references: PMID:12007400 method classification: box H/ACA snoRNA gene prediction (would be used for example #5) accession: GO_CM:0000005 method name: TMpredict developed in reference: ? (paper #6 cites a reference, but seems incorrect did not find an appropriate citation via PubMed) method classification: protein hydrophobicity (would be used for example #6) accession: GO_CM:0000006 method name: Kyte-Doolittle algorithm developed in reference: ? (paper #7 does not cite a reference) method classification: protein hydrophobicity (would be used for example #7) accession: GO_CM:0000007 method name: tRNAscan developed in reference: PMID:1870126 other references: PMID: method classification: tRNA gene prediction (The Lowe & Eddy tRNAscan-SE ref referred to this program as "tRNAscan 1.3 by Fichant and Burks (12)" and cited this paper. However, this paper doesn't appear to name the algorithm at al. accession: GO_CM:0000008 method name: Pavesi et al. tRNA prediction algorithm developed in reference: PMID:8165140 method classification: tRNA gene prediction (they don't name their algorithm, so this name is derived from what they say, in conjuction with how it was referred to in the Lowe & Eddy paper on tRNAscan-SE.) accession: GO_CM:0000009 method name: tRNAscan-SE developed in reference: PMID:9023104 method classification: tRNA gene prediction