Inferred from Sequence or structural Similarity (ISS)

From GO Wiki
Jump to navigation Jump to search

Overview

  • The ISS evidence code or one of its sub-categories should be used whenever a manual, sequence-based analysis forms the basis for an annotation. If the annotation has not been reviewed manually, the correct evidence code is IEA, even if the evidence supporting the annotation is all sequence based. ISS should be used if a combination of sequence-based tools or methods are used. If only one particular type of sequence-based evidence is used then one of the more specific sub-categories of ISS may be more appropriate for the annotation. There are three sub-categories of ISS, ISA, ISM, and ISO.
  • ISS can also be used for structural similarity with experimentally characterized gene products, as determined by crystallography, nuclear magnetic resonance, or computational prediction. In practice, ISS annotations are rarely, if ever, made purely from structural information. When included, structural information is generally at the level of secondary structure modeling or prediction derived from sequence information. Secondary structure information is particularly useful as one component of RNA gene predictions and in some domain models.
  • Population of the with field is important when using the ISS code or one of its sub-categories. The entry in with is the accession of the object or model to which your query has similarity. It is mandatory for annotators to make an entry in the with field when using the ISS code or one of its sub-categories if the annotation is based on an alignment with other proteins (e.g UniProtKB) or a sequence model contained in a database (e.g. Pfam, InterPro). If the annotation is based on similarity to another gene product, then there must be experimental, of IC, evidence for the original annotation. If the annotation is based on a method such as tRNASCAN, which cannot be referred to with an accession number, the with field may be left empty. Entries in the with field should be in the format database:accession, where database is one of the abbreviations listed in the GO database abbreviations collection and accession is the accession number of the object the sequence similarity is with. Multiple entries in the with field should be separated by pipes.
  • If the searches and evaluation of the sequence-based data are described in a published paper, the ID (either one assigned by PubMed or one assigned by another database such as a Model Organism Database) of the paper should be placed in the reference column. However, if the group that is doing the GO annotation performed the searches and evaluation of the sequence-based data, and there is no published reference, a reference can be used from the GO Consortium's collection of GO references; if there is nothing appropriate in this set, the annotating group submit a description of the methods of data collection and evaluation used, and submit it to the GO Consortium. This will be added to the reference collection and will receive a GO_REF accession number for use in annotations. In all cases, the ID of the reference describing the methodology of the sequence analysis should be placed in the reference column.

Examples of ISS Usage

  • An ISS annotation is often based on more than just one type of sequence-based evidence. Often, a host of searches are performed for any given query protein. These searches might include BLAST, profile HMMs, TMHMM, SignalP, PROSITE, InterPro, etc. Evaluation of output from these search tools (bear in mind that every search may not yield results for every protein) leads an annotator to a particular ISS annotation for a particular protein. For example, a BLAST search might reveal that a query protein matches an experimentally characterized protein from another species at 50% identity over the full lengths of both proteins. After reading literature about the match protein, the curator sees that the match protein is known to contain a domain located in the plasma membrane and another domain that extends into the cytoplasm. It is also known from the literature that the experimentally characterized match protein requires the binding of ATP to function. TMHMM analysis of the query protein predicts several membrane spanning regions in one half of the protein (consistent with location in a membrane). In addition there are PROSITE and Pfam results which reveal the presence of an ATP-binding domain in the other half of the protein which TMHMM predicts to be cytoplasmic. These four search results taken together point to a probable identification of the query protein as having the function of the match protein.
  • It is advised that curators carefully review other supporting evidence for terms transferred using ISS, particularly when transferring from annotations originating from high throughput experiments (evidence codes HTP, HDA, HMP, HGI, HEP).
  • PMID:8674114 describes comparative analysis of several newly identified and previously characterized snoRNAs. They list a number of sequence features, both conserved sequence elements and a region of complementarity to rRNA, and spacings that are characteristic of box C/D snoRNAs. As the authors don't develop a predictive method, the analysis they describe isn't considered to be a model, so ISM is not appropriate. As being a member of the box C/D snoRNA family is predictive for being a methylation guide, one could make annotations for a number of snoRNAs based on this paper. Note that the yeast U24 gene (snR24) is also experimentally characterized in this paper. Thus, for snR24 from S. cerevisiae, it is possible to make annotations using both the ISS and the IMP evidence codes, or one might choose not to make the ISS-based annotation for snR24 since experimental evidence is available.

Use of the With/From Field for ISS

  • The ISS evidence code requires curators to enter a stable database identifier for the interacting entity in the With/From field of the Gene Association File (GAF).
  • If the With/From value is a gene or gene product, then there must be experimental evidence, or IC evidence, for the original annotation.
  • Acceptable types of entries in the With/From field include:
    • Genes or Gene Products (designated 'GP')
    • Protein-Containing Complexes (PCC)
  • and are illustrated in Table 1:
DB Object ID DB Object Symbol GO ID DB:Reference Evidence Code With (or) From
12345 GP1 GO:0008150 (biological process) PMID:12345678 ISS GP2
12345 GP1 GO:0003674 (molecular function) PMID:12345678 ISS GP2
12345 GP1 GO:0005575 (cellular component) PMID:12345678 ISS GP2
67890 PCC1 GO:0008150 (biological process) PMID:12345678 ISS PCC2
67890 PCC1 GO:0003674 (molecular function) PMID:12345678 ISS PCC2
67890 PCC1 GO:0005575 (cellular component)* PMID:12345678 ISS PCC2
  • Cellular component annotations made between a protein-containing complex and a GO Cellular Component (CC) term are intended to reflect a 'part_of' relationship between the complex and the GO CC term. Curators should not make GO CC annotations that are the equivalent of a cross reference between a complex identifier and a GO term, as this information should already be captured in the xrefs field of the GO CC term.


  • Note that older ISS annotations may not contain an entry in the With/From field. This is because an entry in the With/From field for ISS evidence codes was not always mandatory. However, to ensure that entities used in the With/From field are those for which experimental evidence exists, an entry in the With/From field is now required.

When ISS Should NOT be Used

Quality Control Checks

Evidence and Conclusion Ontology

ECO:0000250 sequence similarity evidence used in manual assertion

Links

Curator Guide to GO Evidence Codes

Gene Ontology website GO Evidence Codes list

Review Status

Last reviewed: February 28, 2018

Unresolved issues

What identifier can be put in the 'with' field - is a Protein Complex acceptable ? see https://github.com/geneontology/go-annotation/issues/1925