Inferred from High Throughput Direct Assay (HDA)

HDA: High Throughput Direct Assay

Overview

The HDA evidence code is used to indicate a high throughput direct assay was carried out to determine the function, process, or component indicated by the GO term.

The HDA evidence code is equivalent to the IDA code and the general guidelines for annotating IDA code should be adhered to.

Notes on what qualifies as high throughout data and general annotation guidance for high throughout experiments can be found on the HTP evidence code page.

HDA usage

Examples of high throughput assay that should be annotated using HDA:

Enzyme assays
In vitro reconstitution
Immunofluorescence (for cellular component)
Cell fractionation (for cellular component)
Physical interaction/binding assay (sometimes appropriate for cellular component or molecular function)

The majority of high-throughput direct assays are proteomics and fluorescence microscopy studies.

Proteomics

Proteomics methods using mass spectrometry are often used in the identification of proteins from purified macromolecular complexes and sub-cellular/extracellular compartments. For such experiments the major sources of false positives result from co-purifying contaminants and the mis-identification of peptides/proteins. When applied to large complexes and compartments, such experiments should always be annotated with the evidence code HDA as there are acknowledged sources of unavoidable error. Smaller complexes will often have their components analysed and confirmed in other ways, and so may be suitable for annotation with an IDA evidence code.

Mass spectrometry methods have changed significantly over the last two decades. Peptide mass spec fingerprinting (PMF) using MALDI-TOF of gel bands/spots was the method of choice for many years and many valuable annotations have been based on these experiments. In some instances, such as mitochondrial proteome and spliceosomal complexes, these experiments are the only ones available that provide good coverage of the components. A HDA evidence should be used to support the GO term in these cases. PMF experiments were rarely reported with any standardized statistics that can usefully serve as a guide for annotation and so curators should pay particular attention to the purification methods and protein lists given in such experiments. If newer datasets are available, it is suggested that they are compared and the curator should judge whether both or just the more recent are annotated.

For high throughput proteomics MALDI-TOF has been superseded and LC-MSMS is the current method of choice. Large datasets from LC-MSMS should report a false discovery rate (FDR) and the number of unique peptide used to assign a protein ID. When annotating proteins identified using LC-MSMS, ideally curators should look for:

An FDR of <1% for peptides (and even better, if given, for proteins)

A minimum of 2 unique peptides per protein

Note 1: Sometimes a protein identified by a single unique peptide is confirmed by an alternative method or peptide identity itself by de novo sequencing (see PMID:17443350 for an example) and can therefore be annotated.

Note 2: Quantitative studies may also report “Razor peptides”. These are non-unique peptides that are matched to the most likely protein based on other peptides. Do not count these as 'unique'.

Note 3: For guidance for FDR see http://www.bioinfor.com/fdr-tutorial, https://link.springer.com/protocol/10.1007%2F978-1-4939-3106-4_7.

Note 4: If other methodologies or different statical methods have been used to have been used to define or further refine the final list of proteins, the curator should use their discretion as to whether this is sufficient to allow high-quality annotation with the GO.

Note 5:It is also desirable, and a requirement of many journals, that the mass spectrometry data has been deposited in a publicly available repository such as PRIDE (PMID:26527722).

Although the quality of the mass spectrometry is relatively easy to determine, it can be more difficult is to assess the quality of the purification. In general, the authors should have taken steps to reduce the contaminants in the sample, but it is up to the curator to judge whether the sample is merely an enrichment rather than a purification. Methods that a curator might expect to see in an high-quality purification protocol include:

Multi-step purification (e.g. tandem affinity purification, PMID:20658971)
Purification protocol optimisation
Verification of purity by assaying contaminants and known components

Strategies that couple purification techniques with data analysis can significantly decrease the number of false positives arising because of contaminants. For example:

Excluding components that do not appear in replicates and repeats
Multivariate data/principal component analysis (PMID:22472443 PMID:27278775, PMID:25165137) e.g. LOPIT (PMID:15295017,PMID:15295017)

For some cellular components, a high degree of purification may be achieved with a relatively simple, one-step protocol e.g. separating plasma from plasma cells by centrifugation. Techniques such as principal component analysis can allow for high-quality assignments to be achieved from simple enrichments. For some cellular components, a high degree of purity may be difficult to achieve and it is advised that curators consult the expert and GO curation communities to determine whether such sets should be annotated using the GO.

Fluorescence microscopy

There are many examples of large-scale subcellular localization studies based on fluorescently tagging proteins (PMID:25294944, PMID:16823372, PMID:26928762. Although these papers do not show that the proteins are active within the compartments, these datasets are valued by the scientific community and should be annotated with HDA. The nature of high throughput fluorescence studies does not lend itself to standardized statistics that would serve to guide curation. It is therefore important that curators pay particular attention to the text within the paper and supplementary tables before applying terms. Often these papers will include low throughput experiments to validate their approach (these should be independent experiments rather than chosen from the screen as an illustration) and these should be annotated with an IDA as well as a HDA code.

Evidence and Conclusion Ontology

ECO:0007005 high throughput direct assay evidence used in manual assertion

Links

Curator Guide to GO Evidence Codes

Gene Ontology website GO Evidence Codes list

Review Status

Last reviewed: February 23, 2018