Inferred from High Throughput Experiment (HTP)

From GO Wiki
Jump to: navigation, search

HTP: High Throughput Experiment

Overview

HTP: Inferred from High Throughput Experiment

  • This code is used in an annotation to indicate that an high throughput experimental assay has been located in the cited reference, whose results indicate a gene product's function, process involvement, or subcellular location (indicated by the GO term). The HTP code is equivalent to the conventional EXP code.
  • The HTP code is the parent code for the HDA, HMP, HGI and HEP high throughput experimental codes.
  • The HTP evidence code can be used where any of the high throughput assays described for the HDA, HMP, HGI, or HEP evidence codes is reported. However it is highly encouraged that groups should annotate to one of the more specific experimental codes (HDA, HMP, HGI, or HEP) instead of HTP.
  • Note: No equivalent high-throughput evidence code for IPI is provided. Curators should check that HTP interaction data has been curated by IntAct. If not, papers can be flagged by mailing intact-help@ebi.ac.uk.
  • A published reference should always be cited in the reference column, and no value should be entered into the with/from column of HTP annotations.

What qualifies as HTP data?

The term high-throughput data is often used to describe data that has been generated by automatic or semi-automatic methodology without validation of the results for individual gene products. The experiments can be viewed as screens: experiments performed in parallel without explicit target selection; they are generally not hypothesis-driven. As HTP datasets are generated from the scaling of experimental techniques used for hypothesis-driven approaches, the type of experiment itself cannot be used to define a HTP experiment. Characteristics that are often associated with HTP experiments should be used to guide the curator’s decision as to whether it should be classed as HTP for the purposes of annotation. These include:

  • Applying the same workflow to a large number genes/gene products.
  • Generating data in an automated or semi-automated fashion.
  • Addressing open-ended rather than hypothesis-driven questions.
  • Generating dataset(s) (usually presented in tabular form).
  • Datasets expected to contain some ‘false positives’.
  • Ascribing the same property to all gene products that fall within a given measurement range.

HTP evidence codes can be used to annotate high-quality high-throughput studies. In cases where this can be determined, the dataset should ideally contain less than 1% of false positives. The HTP evidence codes map directly to their low-throughput counterparts and, to ensure consistency, the same rules applied for annotation regardless of throughput. .

General guidance for annotation of HTP experiments

1. The experiment should be of sufficient quality for GO annotation.

The same annotation standards should apply to both low- and high-throughput experimental results. As a result, the vast majority of HTP papers should not meet the criteria for GO curation. As HTP experiments can generate a large number of annotations it is especially important that curators undertake a rigorous review of the data presented. Curators should examine at all aspects of the workflow: experimental design, controls, data handling, validation, statistics, etc. Many HTP workflows are complex and it is advised curators contact the authors for guidance if needed. To be suitable for annotation it is especially important that the authors have designed the experiment to minimize the likelihood of false positives. Examples of how experimental design can lower the false positives:

  • Careful design of the experimental setup.
  • Confirmation by repeated testing.
  • Verification by independent screening methods.
  • A high threshold for positive scoring.
  • A low false discovery rate for inclusion (ideally <1% FDR).
  • Identification and exclusion of common contaminants/house-keeping genes.
  • Multivariate data/principal component analysis.

Curators are encouraged to apply a “common sense” approach when adding annotations. Statistics and workflows cannot tell you how well an experiment has been conducted. This sometimes means a subjective eye-balling of the final list of genes/gene products. Do they look reliable? Curators may be able to estimate an FDR by spot-checking a small number of potential annotations. If a curator does not think that the data looks reliable, they should question the value of curating it using GO annotation - many datasets are better captured by other methods of curation/data repositories. Although this may mean that some interesting and valid GO annotations are not made, on the whole it will help maintain a higher standard of annotation.

If using Protein2GO, curators should mark a HTP paper that does not meet the standard for curation by using the topics tags “Not suitable for annotation” and “High Throughput”.

Choosing not to annotate a HTP paper does not necessarily mean that the data is of poor quality, as with conventional annotation, curators must assess whether they can confidently assign a GO term based on the experimental output.


2. Chose the term carefully.

  • Is the experiment specifically set-up to address this biological question? Make sure to annotate to the conclusion, not to the assay.
  • Is a higher-level term more appropriate?
  • Is labeling with a term from the GO appropriate? The data may be better captured by another mechanism: as a dataset, phenotype annotation or pair-wise physical interaction, for example.


3. Annotations curated from HTP data do not need to be curated as a whole set.

  • Subsets generated by extra screening steps or statistical cut-offs should be used to select gene products to annotate.
  • If there is not enough information in the paper, contact the authors for guidance on how to interpret the data and statistics.
  • As with existing annotations, individual annotations may be removed if shown to be incorrect e.g. in the light of new information.
  • Often a selected number of genes/gene products will be validated by LTP analysis or further experimentation. These should be annotated using a conventional evidence code in addition to a HTP code.


4. There is no minimum cut-off number that defines a HTP dataset.

HTP evidence code(s) should be used to annotate sets of genes/gene products, but it is independent of the number of entities annotated (i.e. there is no minimum number that defines what is classed as HTP).

Examples of Usage

Quality Control Checks

Evidence and Conclusion Ontology

ECO:0006056 high throughput evidence used in manual assertion

Links

Curator Guide to GO Evidence Codes

Gene Ontology website GO Evidence Codes list

Review Status

Last reviewed: February 23, 2018