PAINT annotation guidelines

From GO Wiki
Jump to navigation Jump to search

Initial Steps

  • Look at the tree topology to see if it makes sense. For example, use OrthoMCL mapping to do a reality check on the tree (each family is color coded in the first column of the 'TABLE' view). If it does not, contact Paul and the tree will be edited as appropriate.
  • Try to find a review (or two) on the protein family and read it quickly. If you can't find one, WikiGenes or the OMIM entry for any human genes can be helpful. Please write down the reviews or sources you used in the notes box ("Evidence.txt").

Semantics of annotations

  • Note that an annotation of a node means that you are inferring that the annotation most likely first evolved in this node. It means that the gene's molecular function, or its involvement in a particular biological process, or its location in a particular component, most likely evolved along the branch of the tree leading to that particular node. Of course, it is possible that the trait evolved before this point but that the supporting data does not allow you to make the annotation. Implications for tree annotation:
    • The function, process or component must have existed in the organism that had the gene. For instance, you should not annotate an ancestral gene present in the common ancestor of all life with the term "nucleus" because that organism did not have a nucleus.


  • We use NOT annotations to denote a functional change during evolution. Annotating a node with a NOT means that an ancestor of the node had a function/process/component annotation that was subsequently first LOST in that node. This lack of an annotation will then be inherited by subsequent descendants. Implications for tree annotation:
    • You will need to first make a positive annotation to an ancestral node, and then make NOT annotations for each descendant of that node that likely lost the annotation.


General guidelines

  • In general, we will annotate to the most specific term possible and propagate as far back as possible, given the ancestral inference.
  • For molecular function and cellular component, address every experimental annotation. For every experimental annotation, either:
    • Use it for a propagation (note that if you already annotated a more specific term, you do not need to use the more general term)
    • Explain in the notes box why you didn't use it
  • For biological process: annotate all appropriate CELLULAR LEVEL PROCESSES. Higher level processes should be annotated only if they do not require extensive work to clarify (i.e. don't read entire papers).
  • Generally easiest to start with Mol. Function, then Cell. Component, then Biol. Process

More detailed guidelines

  • For closely related genes with opposite annotations, look at the papers and see if they are really contradictory, if so, don't propagate. If not, make note of the annotations so they can be addressed later by the specific MOD(s)
  • For something that looks indirect, is there something that looks more direct? (IMP's may be more indirect.) We look for something that could be explaining it and use it if we can.
  • Scoping
    • Use common sense and keep the big picture of the tree and knowledge about the family in mind (eg. LON family: propagation of mito., light strand promoter anti-sense binding annotation to base of euks) ie. we should not always limit ourselves to the bare minimal triangulation. Always include an evidence note when doing so.
    • We can expand the scope of a BP term to reflect that of related MF and/or CC terms. E.G. (LONP1): the MF and CC mitochondrial terms apply to the entire LONP1 euk. clade, so we can apply mitochondrial organization to the entire euk. clade.
    • If a process (e.g. a "p53 dependent apoptotic process") involves a specific target, the scope of an inferred annotation should not extend beyond the phylogenetic distribution of the target.


Term-specific notes

  • Do not propagate GO:0005515 protein binding (will be suppressed from PAINT), GO:0005488 binding, and enzyme binding.
  • We will only propagate children of protein binding when the terms are specific enough to indicate a specific protein family and/or it provides useful biological information to the biologist wanting to learn more about this term ie. that molecular function is related to the biological process(es) that are annotated in this family.
  • We will propagate small molecule binding terms.


Dealing with NOTs

  • NOTs are important: they allow us to capture likely functional changes over evolution so we do not make incorrect homology inferences.
  • You can only make a NOT for positive annotations made to an ancestor, so make the positive annotation first.
  • Every NOT must have an manual note added in the Evidence pane. Add notes below the generic paragraph that pops up.
  • If there is a NOT annotation among the experimental MOD annotations, use the "Inferred from Descendant Sequences" evidence code
  • If there is specific evidence about active site residues (these are not yet automatically identified in PAINT, but will be taken from SwissProt and CDD in the future) that are missing or substituted, use the "Inferred from Missing Residues"
  • If the branch is relatively long (indicating relatively rapid sequence evolution, a potential clue of adaptive evolution), use "Inferred from Rapid Divergence"
  • NOT + rapid divergence = the line will not be in the GAF provided to the MOD but will be retained in the PAINT GAF. This will enable the ability to say "do not propagate" to a particular clade, distinguished from adding an explicit NOT.