Summary of ISS with (blank) proposal

From GO Wiki
Jump to: navigation, search

Putting method/program names into the with field for ISS


I've reviewed several papers where ISS is the appropriate code, but for which only a method could be placed into the with field. Thus, I have some comments on how we might want to do this. I'll start with a little background.

At the last GO meeting, we agreed to "Always use a WITH column for IEA and ISS, containing a program name if necessary. For example, make a ref to tRNAscan." However, we did not work out how to implement doing this.

As phrased in the minutes, it sounds like the idea is just to put the name of the method in the with column. If that's all that is required then it's fairly simple to find an appropriate text string from a paper to put in the with column. However, I'm kind of assuming that we don't want to allow uncontrolled text strings in the with column mixed in with things of the format namespace:ID.

Currently, to put something in the with column, it must have a namespace as well as an ID, e.g. Swiss-Prot:P51587. For program names or methods, there are a couple problems with trying to put them into this type of format. One is that some of the methods to which research refer are not given an official name. The second, which applies to all the papers I've read so far, is that none of them have a namespace.

If we need to format these in a way that is compatible with the namespace:ID format, then GO could generate a 'database' of collected methods. An entry in the GO.xrf_abbs file like the one below could define a namespace for such a collection.

 abbreviation: GO_CM
 database: Gene Ontology Database collected methods
 object: Accession (for collected method)
 example_id: GO_CM:0000001

Then for the second part, we'd have to start a collection of these various methods, probably just a file somewhat like the GO.xrf_abbs file. For this, there are a couple issues to deal with:

1) The authors of methods don't always give them a clear name.

2) There isn't always a single source reference. For programmatic methods, there is often a single source reference. However, for the consensus features for either box C/D or box H/ACA snoRNAs. I wouldn't be comfortable designating a single reference as the source. In these cases, I'd be happier if we could associate a number of relevant refs to the 'method'. In other cases, an algorithm is mentioned by name, but no reference is cited.

However, with those issues in mind, perhaps collecting this information would work.

  • accession: accession ID given by GO
  • method name: the name given to a program by the authors, when available, or a descriptive name based on the paper
  • developed in reference: the ID, e.g. PMID:xxxxx, for the reference describing the development of a method, when applicable, but would not be required. Can be filled with Not Applicable) for cases like 'box C/D snoRNA consensus' where there isn't a specific program that was developed. I don't know how we want to deal with cases like 'TMpredict' where they cited a reference that appears irrelevant or 'Kyte-Doolittle algorithm' where I didn't see a citation for the algorithm.
  • other references: Useful for cases like 'box C/D snoRNA consensus' where there isn't a specific program that was developed, but where you can cite 1 or more references which describe what the consensus is.
  • method classification: maybe this tag isn't necessary, but I thought it might be useful, particularly if we ever get to a situation where we have this in a database where you can search on this field.

Below is what I would fill in for each field for the references listed at: http://genetics.stanford.edu/~kchris/go/evCodeIssues/withForISS-ExamplePapers.html

The comments in parentheses are just comments to correlate the info below with the Example papers, and would not be included in the proposed file.

accession: GO_CM:0000001
method name: box C/D snoRNA probabilistic model
developed in reference: PMID:10024243
method classification: box C/D snoRNA gene prediction
        (would be used for example #1)

accession: GO_CM:0000002
method name: box C/D snoRNA consensus
developed in reference: Not Applicable
other references: PMID:8674114; PMID:16484372
method classification: box C/D snoRNA gene prediction
        (would be used for example #s 2 & 3)

accession: GO_CM:0000003
method name: snoGPS
developed in reference: PMID:15306656
method classification: box H/ACA snoRNA gene prediction
        (would be used for example #4)

accession: GO_CM:0000004
method name: box H/ACA snoRNA consensus
developed in reference: Not Applicable
other references: PMID:12007400
method classification: box H/ACA snoRNA gene prediction
        (would be used for example #5)

accession: GO_CM:0000005
method name: TMpredict
developed in reference: ?
        (paper #6 cites a reference, but seems incorrect
        did not find an appropriate citation via PubMed)
method classification: protein hydrophobicity
        (would be used for example #6)

accession: GO_CM:0000006
method name: Kyte-Doolittle algorithm
developed in reference: ? (paper #7 does not cite a reference)
method classification: protein hydrophobicity
        (would be used for example #7)

accession: GO_CM:0000007
method name: tRNAscan
developed in reference: PMID:1870126
other references: PMID:
method classification: tRNA gene prediction
        (The Lowe & Eddy tRNAscan-SE ref referred to this program as
        "tRNAscan 1.3 by Fichant and Burks (12)" and cited this
        paper. However, this paper doesn't appear to name the
        algorithm at al.

accession: GO_CM:0000008
method name: Pavesi et al. tRNA prediction algorithm
developed in reference: PMID:8165140
method classification: tRNA gene prediction
        (they don't name their algorithm, so this name is
        derived from what they say, in conjuction with how
        it was referred to in the Lowe & Eddy paper on
        tRNAscan-SE.)

accession: GO_CM:0000009
method name: tRNAscan-SE
developed in reference: PMID:9023104
method classification: tRNA gene prediction