With field

From GO Wiki
Jump to: navigation, search

Evidence Codes that Can include 'With' field information

Annotations that use certain evidence codes can contain information in the 'with' field (column 8):

Where an evidence code has not been listed it is assumed that no value should be included in the 'with'.

Evidence Code Mandatory or Optional? Expected values
IC Mandatory GO identifier
IPI Mandatory Protein, Gene, RNA, Chemical identifiers
IGI Optional Protein or Gene identifiers
IMP Optional Alleles, Morpholinos. In some cases, two alleles are present, but not to the same gene; one is an allele of the gene in question, but is a conditional allele that requires a second allele (usually a Cre allele) in order to get the knock-out of that gene in the tissue/cells of interest. So in this case, it would be incorrect to have these on two separate annotation lines (MGI).
IGC Optional -
RCA  ???
ISS Mandatory Protein, Gene identifier or Panther family, InterPro or Pfam identifier, CBS:SignalP, CBS:TargetP, CBS:TMHMM, MetaCyc identifiers, Enzyme Commission Numbers, KEGG or KEGG_PATHWAY identifiers
ISO Mandatory Protein or Gene identifier
ISM Optional -
ISA Mandatory? Protein or Gene identifier
IEA Mandatory? Protein identifier, External term identifier used in an external2go mapping, InterPro identifier

Understanding Multiple values in 'with' fields

Interpretation of the pipe ('|') separating values:

From Midori:

- Pipes should only be used with the inferred-from-interaction evidence codes, i.e. IGI and IPI.

- With either IPI or IGI, piped 'with' entries indicate that the evidence is actually a 3-way (or 4-way, or n-ary ...) interaction, not just multiple pairwise interactions.


However, there is currently a wider set of evidence codes using pipes in their 'with' and some differences in their interpretation:

MOD evidence code using pipe-separated values Intended Meaning Alternative delimiters present?
FlyBase IC, IPI, IGI Pipe-separated values indicate a multi-way interaction and so all values displayed in one annotation line should always be grouped together -
ZFIN IC, IPI, ISS, IGI Pipe-separated values indicate a multi-way interaction and so all values displayed in one annotation line should always be grouped together. In addition, for IGI evidence in binary interactions, there is the ZDB-GENE ID of the interacting partner, and the ZDB-MRPHLNO ID for the morpholino used to target the gene BEING ANNOTATED -
TAIR IPI, IGI Pipe-separated values indicate a multi-way interaction and so all values displayed in one annotation line should always be grouped together -
WormBase IPI, ISS, IMP, IGI Pipe-separated values indicate a multi-way interaction and so all values displayed in one annotation line should always be grouped together. For IMP annotations, RNAi and Phenotype identifiers are piped together for additional information. -
MGI IC, IPI, ISO, IMP, IGI Only display binary interactions for binding, therefore multiple piped values indicate separate binary interactions between the gene product identified in column 2, and directly to each of the ids listed in the with. Therefore it would be correct to reinterprete the file so that each vlaue in the 'with' is displayed on a separate annotation line. Currently do use commas in some IMP and IGI annotations, however these will shortly be removed.
SGD IC, ISA, IGI SGD does not currently specify whether multiple ids in the 'with' indicate one-to-many or multiple one-to-one interactions; currently they could mean either. -
dictyBase IC, IPI, ISS, IGI Pipe-separated values indicate multiple independent interactions between the gene product identified in column 2, and each of the ids listed in the with. Therefore it would be correct to reinterprete the file so that each vlaue in the 'with' is displayed on a separate annotation line. -
UniProtKB-GOA none Currently only one identifier is included in the with field for each annotation (although we are intending to allow multiple values in the second half of 2011). Only binary binding interactions are displayed. -
RGD IC, IPI, IGI IPI - piped IDs should be interpreted as displaying multiple separate binary interactions -
PomBase IC,IGI IC - pipe separated GO IDs are used when both GO IDs are required to make the inference

IGI - pipe separated values indicate a multi-way interaction and so all values displayed in one annotation line should always be grouped together

IPI - Only binary binding interactions are captured with GO. Pipe separated values are only used with the term "protein binding, bridging"

ISS - Only characterized orthologs, or Protein family identifiers are used. No pipes separated values

CGD IPI, ISS, IGI  ? -
TIGR ISS, IGI  ? -
EcoliWiki IGI, IPI represented as multiple binary interactions -
EcoCyc IGI, IPI no distinction is made -
MTB IPI  ? -


Out-of-date documentation on this - should be deleted?


N.B. UniProtKB-GOA is updating their display of 'with' field data originating from external MODs.

In future, external annotations that have used the 'with' field will only be included if:

  • the evidence code and with field combination is legal (e.g. no with fields for IDA-evidenced annotations, only GO identifiers in IC-evidenced annotations)
  • Othe 'with' field value matches a RegExp for the gene/protein/chemical/GO identifier, outlined here
  • a single value is included in the 'with' field, or if it known that it is appropriate to 'unwrap' piped values into separate annotation lines, e.g. for MGI or DictyBase (this behaviour will change in summer 2010 for IC, IGI and IMP codes when UniProtKB-GOA has updated its database schema to accept multiple values in the with). For IPI annotations, GOA intends to only displays binary interactions.

QC Checks that check the contents on 'with' fields:

Implemented

GO_AR:0000003 Annotations to 'protein binding ; GO:0005515', should be made with IPI and interactor should be in the 'with' field

GO_AR:0000004 Reciprocal annotations for protein binding should be made

Proposed

Hard QC: All IC annotations should include a GO id in column 8 (with)

Hard QC: All IPI annotations should include a nucleotide/protein/chemical identifier in column 8 (with)

Hard QC: All IDA annotations should not include an identifier in column 8 (with)

Hard QC: All identifiers in the GAFs must use the correct DB abbreviation

Soft QC: All gene/protein/chemical identifiers used in GO annotations should conform to RegExps supplied in the GO.xref.abbs file