Tips to Produce High Quality Annotations

From GO Wiki
Jump to navigation Jump to search

Get the wider perspective

  • Favor a pathway-by-pathway or gene product-by-gene product approach for curation, rather than paper-by-paper.
    • GO annotations should be made considering the overall context in which a gene product functions, which may not be apparent from just looking at individual papers
  • Read recent publications, including recent reviews if available.
    • Recent reviews can also be helpful for understanding the overall pathway or process you are annotating.
  • Look at existing annotations for the gene product and for the term you have chosen to annotate, to ensure consistency.
    • This may trigger the need to revise other annotations.
  • Remove incorrect annotations based on invalidated hypotheses.
    • If potentially incorrect annotations have been made by another group, create an issue in the go-annotation tracker.

Focus on the research hypothesis

  • Use existing knowledge to understand the hypothesis being tested and its relation to the experimental observation.

Capture the conclusion, not the assay

Make sure to distinguish between assays and GO terms

  • Some assays can readily seem to correspond to GO terms. Be careful when interpreting the results of these assays as they may not reflect the actual role of the gene product in a process.
  • Examples:
    • GO:0006309 apoptotic DNA fragmentation
    • GO:0006919 activation of cysteine-type endopeptidase activity involved in apoptotic process (AKA caspase activation)
      • These assays are often used to measure whether apoptosis occurred, but NOT to measure a specific, direct role of a gene product in that process.
    • GO:0042060 wound healing it is incorrect to use this term for wound healing assay experiments, which is an assay for 'GO:0048870' cell motility', and NOT wound healing.

Some assay conditions test general protein properties, not their function


  • 1. Requirement for a post-translational modification for activity: PMID:16500043, Fig 3. DNAse II is a glycosylated protein, and the glycosylation is presumably necessary for the correct protein conformation - the protein is not active when glycosylation is less active, for example with tunicamycin.
    • Hints that tunicamycin does not regulate the activity of DNAse II:
      • The paper doesn't mention that DNAse II binds tunicamycin (a direct interaction is necessary for a regulation annotation)
      • The paper indirectly shows that in the presence of tunicamycin is not glycosylated (it has a lower molecular weight), and this lower molecular weight form is less active.
    • The conclusion of the experiment is that DNAse II is glycosylated. Since glycosylated is not known to be regulated (unlike phosphorylation), it is unlikely that this data supports any kind of regulation. The effect of tunicamycin is outside the scope of GO. The only annotation that can be made from Fig. 3 is GO:0004531 deoxyribonuclease II activity.
  • 2. Impact of chemicals that affect the cellular environment on an activity: PMID:19690162, Fig. 3. Trx contains a conserved active site with two cysteines that are essential for its redox activity. The authors investigated whether this site was necessary for interaction with SlrP by treating the cells with hydrogen peroxide. The interaction was reduced. In the discussion the authors suggest that the reduced form of Trx (ie in the presence of H2O2) doesn't have the correct conformation for interaction with SlrP ("The residues necessary for interaction with SlrP could be buried in the dimeric form of Trx.".
    • Hints that hydrogen peroxide does not regulate Trx/SlrP binding:
      • based on experimental data: hydrogen peroxide is not shown to directly interact with either interacting partner
      • based on biological knowledge: the likely effect of hydrogen peroxide is the reduction of the disulfide bond of Trx, which likely affects its conformation.

Annotate gene products to their normal function

  • The scope of the GO is to capture the normal function of gene products, so special care must be taken to understand how experimental observations inform this.
  • Mutant phenotypes are an important sources of GO annotations, but curators should consider carefully how to use that information for annotation.
  • Specific guidelines for annotating mutant phenotypes are available here.

Check term placement in the ontology

Make sure that the parents of the term are also consistent with the annotation. For example 'GO:0015616 DNA translocase activity' is a 'GO:0008094 DNA-dependent ATPase activity', although just by looking at the term name one cannot tell that DNA translocase is a type of ATPase.

Represent current knowledge

The GO is not an archive or all findings published on a protein. Do no hesitate to remove older annotations that are inconsistent with the current state of knowledge for a protein' role.

Annotations from large-scale datasets

The term high-throughput data is often used to describe data that has been generated by automatic or semi-automatic methodology without validation of the results for individual gene products. The experiments can be viewed as screens: experiments performed in parallel without explicit target selection; they are generally not hypothesis-driven. More details as to how to annotate high throughput data can be found in the Guide_to_GO_Evidence_Codes#High_Throughput_Experimental_Evidence_Codes page.

Avoiding predatory journals

Make sure you select high quality papers. Some journals have been labeled 'Predatory journals' for their dubious practices with respect to the publication process. There is a list of these journals here: If you have a doubt this may hep decide whether or not to annotate the paper.

More information about predatory journals may be found in these articles:

Review Status

Last reviewed: 2019-04-11

Back to: Annotation