PAINT annotation guidelines

From GO Wiki
Revision as of 14:36, 17 November 2010 by Pascale (talk | contribs)
Jump to navigation Jump to search

Semantics of annotations

  • Note that an annotation you make at a node means that you are inferring that a particular function/process/ component ALREADY existed at the node you are annotating. It means that a particular "character" was present in the particular ancestral gene/genome/organism you are annotating. It is possible that the trait evolved before but that the supporting data does not allow you to make the annotation.

For instance, you should not annotate a gene present in the common ancestor of all life with the term "nucleus" because that organism did not have a nucleus. A "NOT" annotation means that an ancestral term that would otherwise be inherited is inferred to have been LOST in a particular descendant, and of course will not be inherited past that point. We use NOT annotations to denote a functional change during evolution, so you will need to first make a positive annotation, and then make any annotations that indicate the loss of that GO term.

General Rules

  • In general, we will annotate to the most specific term possible and propagate as far back as possible, given the ancestral inference.
  • For molecular function and cellular component, address every experimental annotation. For every experimental annotation, either:
    • Use it for a propagation (note that if you already annotated a more specific term, you do not need to use the more general term)
    • Explain in the notes box why you didn't use it
  • For biological process: annotate all appropriate CELLULAR LEVEL PROCESSES. Higher level processes should be annotated only if they do not require extensive work to clarify (i.e. don't read entire papers).


Initial Steps

  • Look at the tree topology to see if it makes sense. For example, use OrthoMCL mapping to do a reality check on the tree (each family is color coded in the first column of the 'TABLE' view). If it does not, contact Paul and the tree will be edited as appropriate.
  • Very useful to spend a few minutes looking at a review, geneWiki, etc for an overview of the family when PAINT curators are not familiar. Please write down the reviews you used in the notes box.
  • Generally easiest to start with Mol. Function, then Cell. Component, then Biol. Process

Annotation Rules

  • For closely related genes with opposite annotations, look at the papers and see if they are really contradictory, if so, don't propagate. If not, make note of the annotations so they can be addressed later by the specific MOD(s)
  • For something that looks indirect, is there something that looks more direct? (IMP's may be more indirect.) We look for something that could be explaining it and use it if we can.
  • Scoping
    • Use common sense and keep the big picture of the tree and knowledge about the family in mind (eg. LON family: propagation of mito., light strand promoter anti-sense binding annotation to base of euks) ie. we should not always limit ourselves to the bare minimal triangulation. Always include an evidence note when doing so.
    • We can expand the scope of a BP term to reflect that of related MF and/or CC terms. E.G. (LONP1): the MF and CC mitochondrial terms apply to the entire LONP1 euk. clade, so we can apply mitochondrial organization to the entire euk. clade.
    • If a process (e.g. a "p53 dependent apoptotic process") involves a specific target, the scope of an inferred annotation should not extend beyond the phylogenetic distribution of the target.


Term-specific notes

  • Do not propagate GO:0005515 protein binding (will be suppressed from PAINT), GO:0005488 binding, and enzyme binding.
  • We will only propagate children of protein binding when the terms are specific enough to indicate a specific protein family and/or it provides useful biological information to the biologist wanting to learn more about this term ie. that molecular function is related to the biological process(es) that are annotated in this family.
  • We will propagate small molecule binding terms.


Dealing with NOTs

  • NOTs are important: they allow us to capture likely functional changes over evolution so we do not make incorrect homology inferences.
  • You can only make a NOT for positive annotations made to an ancestor, so make the positive annotation first.
  • Every NOT must have an manual note added in the Evidence pane. Add notes below the generic paragraph that pops up.
  • If there is a NOT annotation among the experimental MOD annotations, use the "Inferred from Descendant Sequences" evidence code
  • If there is specific evidence about active site residues (these are not yet automatically identified in PAINT, but will be taken from SwissProt and CDD in the future) that are missing or substituted, use the "Inferred from Missing Residues"
  • If the branch is relatively long (indicating relatively rapid sequence evolution, a potential clue of adaptive evolution), use "Inferred from Rapid Divergence"
  • NOT + rapid divergence = the line will not be in the GAF provided to the MOD but will be retained in the PAINT GAF. This will enable the ability to say "do not propagate" to a particular clade, distinguished from adding an explicit NOT.