Inferred from Sequence or structural Similarity (ISS)

Overview

The ISS evidence code, or one of its sub-categories, should be used whenever a manual, sequence-based analysis forms the basis for an annotation. If the annotation has not been reviewed manually, the correct evidence code is IEA, even if the evidence supporting the annotation is sequence-based. ISS should be used if a combination of sequence-based tools or methods are used. If only one particular type of sequence-based evidence is used then one of the more specific sub-categories of ISS may be more appropriate for the annotation. There are three sub-categories of ISS: ISA, ISM, and ISO.

ISS can also be used for structural similarity with experimentally characterized gene products, as determined by crystallography, nuclear magnetic resonance, or computational prediction. In practice, ISS annotations are rarely, if ever, made purely from structural information. When included, structural information is generally at the level of secondary structure modeling or prediction derived from sequence information. Secondary structure information is particularly useful as one component of RNA gene predictions and in some domain models.

The With/From field is required for ISS annotations. The entry in the With/From field is the accession of the object or model to which the annotated entity has similarity. It is mandatory for annotators to make an entry in the with field when using the ISS code or one of its sub-categories if the annotation is based on an alignment with other proteins (e.g UniProtKB) or a sequence model contained in a database (e.g. Pfam, InterPro). If the annotation is based on similarity to another gene product, then there must be experimental, or IC, evidence for the original annotation. If the annotation is based on a method such as tRNASCAN, which cannot be referred to with an accession number, the With/From field may be left empty. Note though, that such annotations without a With/From entry will get flagged with a warning (not filtered) during the GO QC process.

Entries in the With/From field should be in the format database:accession, where the database abbreviation is listed in the GO database abbreviations collection and accession is the accession number of the object the sequence similarity is with. Multiple entries in the with field should be separated by pipes.

If the searches and evaluation of the sequence-based data are described in a published paper, the ID (either one assigned by PubMed or one assigned by another database such as a Model Organism Database) of the paper should be placed in the reference column. If a published paper does not perform or report the sequence similarity, yet mentions related proteins, curators may still use that reference for evidence. However, if a curator evaluated the sequence-based data, and there is no mention of the similarity in a published reference, then GO_REF:0000024, should be used for evidence.

Examples of ISS Usage

An ISS annotation is often based on more than just one type of sequence-based evidence. Often, a host of searches are performed for any given query protein. These searches might include BLAST, profile HMMs, TMHMM, SignalP, PROSITE, InterPro, etc. Evaluation of output from these search tools (bear in mind that every search may not yield results for every protein) leads an annotator to a particular ISS annotation for a particular protein. For example, a BLAST search might reveal that a query protein matches an experimentally characterized protein from another species at 50% identity over the full lengths of both proteins. After reading literature about the match protein, the curator sees that the match protein is known to contain a domain located in the plasma membrane and another domain that extends into the cytoplasm. It is also known from the literature that the experimentally characterized match protein requires the binding of ATP to function. TMHMM analysis of the query protein predicts several membrane spanning regions in one half of the protein (consistent with location in a membrane). In addition there are PROSITE and Pfam results which reveal the presence of an ATP-binding domain in the other half of the protein which TMHMM predicts to be cytoplasmic. These four search results taken together point to a probable identification of the query protein as having the function of the match protein.
It is advised that curators carefully review other supporting evidence for terms transferred using ISS, particularly when transferring from annotations originating from high throughput experiments (evidence codes HTP, HDA, HMP, HGI, HEP).
PMID:8674114 describes comparative analysis of several newly identified and previously characterized snoRNAs. They list a number of sequence features, both conserved sequence elements and a region of complementarity to rRNA, and spacings that are characteristic of box C/D snoRNAs. As the authors don't develop a predictive method, the analysis they describe isn't considered to be a model, so ISM is not appropriate. As being a member of the box C/D snoRNA family is predictive for being a methylation guide, one could make annotations for a number of snoRNAs based on this paper. Note that the yeast U24 gene (snR24) is also experimentally characterized in this paper. Thus, for snR24 from S. cerevisiae, it is possible to make annotations using both the ISS and the IMP evidence codes, or one might choose not to make the ISS-based annotation for snR24 since experimental evidence is available.

Use of the With/From Field for ISS

The ISS evidence code requires curators to enter a stable database identifier for the interacting entity in the With/From field of the Gene Association File (GAF).
If the With/From value is a gene or gene product, then there must be experimental, or IC evidence, for the original annotation.
Acceptable types of entries in the With/From field include:
- Genes or Gene Products (designated 'GP')
- Protein-Containing Complexes (PCC)
and are illustrated in Table 1:

DB Object ID	DB Object Symbol	GO ID	DB:Reference	Evidence Code	With (or) From
12345	GP1	GO:0008150 (biological process)	PMID:12345678	ISS	GP2
12345	GP1	GO:0003674 (molecular function)	PMID:12345678	ISS	GP2
12345	GP1	GO:0005575 (cellular component)	PMID:12345678	ISS	GP2
67890	PCC1	GO:0008150 (biological process)	PMID:12345678	ISS	PCC2
67890	PCC1	GO:0003674 (molecular function)	PMID:12345678	ISS	PCC2
67890	PCC1	GO:0005575 (cellular component)*	PMID:12345678	ISS	PCC2

Cellular component annotations made between a protein-containing complex and a GO Cellular Component (CC) term are intended to reflect a 'part_of' relationship between the complex and the GO CC term. Curators should not make GO CC annotations that are the equivalent of a cross reference between a complex identifier and a GO term, as this information should already be captured in the xrefs field of the GO CC term.

Note that older ISS annotations may not contain an entry in the With/From field. This is because an entry in the With/From field for ISS evidence codes was not always mandatory. However, to ensure that entities used in the With/From field are those for which experimental evidence exists, an entry in the With/From field is now required.