PAINT User Guide

From GO Wiki
Revision as of 08:39, 7 December 2010 by Paul Thomas (talk | contribs) (Branch lengths)

Jump to: navigation, search

Summary

PAINT is a Java application for viewing and annotating phylogenetic trees. This document briefly describes how to set up and use the tool.

Requirements

Java 1.6 must be installed.

Installing PAINT

PAINT is a Java application, and can be run on a wide variety of platforms. To install PAINT on either Mac or Windows, follow these steps:

  • Download the PAINT application at:

http://sourceforge.net/project/showfiles.php?group_id=184610

  • On a Mac, open a Unix terminal window, go to the directory containing the PAINT program, and execute the command: sh launchPAINT.sh. On a Windows machine, run the program lauchPAINT.bat.


Using PAINT

Logging in

In the tree viewer, go to File->Login in the menu. Type in your user id and password.

Accessing a PANTHER family tree, and GO annotations for leaves

Once you’re logged in, you can search for a tree by going to File->Open from database. You can enter a search term (wildcard characters won't work, though for some of the searches you can enter partial names with no wildcard characters). Alternatively, you can select a PANTHER family identifier from the drop-down menu. A list of families will be returned. In the future, you will be able to "lock" the book to prevent others from having write access during curation, but for now this is disabled. Right now you can view a family by clicking on the family identifier in the search results box.

Adjusting the view

Recommended configuration for curation

  • The larger the better, so resize the entire window by dragging the lower right corner. Leave some space at the top of your screen, though, for the annotation matrix so you can read some of the vertical text in the matrix.
  • Undock the "matrix" panel from the bottom panel, and drag and resize to correspond to the right ("grid") panel. The first row of the matrix should line up with the first leaf node of the tree in the tree panel.

Resizing the panel and column widths

You can resize the tree panel by clicking and dragging the partition between the two panels. The right panel can be toggled between multiple sequence alignment ("MSA") and a table of information about each sequence in the tree ("Grid"). Table columns can also be resized.

Viewing the multiple sequence alignment (MSA)

The trees were estimated from an MSA. One can toggle back and forth between the table view (“Grid”) and the MSA view (“MSA”) using the buttons just below the menu, above the table/MSA panel. You can view ancestral sequences, by first collapsing the appropriate node in the tree (right-click, or apple-click on Mac).

Rescaling the tree branch lengths

You may also want to rescale the trees if the branches look too long for comfortable viewing. Go to Tree->Scale... and enter a different number. For most trees we find 50 works well, but it depends on the tree.

Importing an existing GAF file

Gene Annotation Format (GAF) files can be imported, to add or modify existing annotations. Annotations that were stored as a GAF file can be read back in.

  • Open the appropriate tree (see Accessing PANTHER trees above).
  • Go to File->Import in the menu.
  • Select the GAF file that contains the annotations, and click the Import button.

Annotating the trees with GO terms

Ancestral nodes in the tree can be annotated with any GO term that has been experimentally determined in one (or more) of its descendants, and then these “inferred” annotations can be propagated to its other descendants.

Annotating an ancestral node, and propagating to descendants by inheritance

  • Select the GO term from the matrix view, by clicking and holding.
  • Drag to the ancestral node you wish to annotate. When the mouse is over the node, it will turn dark. Release the mouse button to annotate.
  • NOTE that you can only annotate a node if AT LEAST ONE descendant has that experimental annotation (or a more specific one), so if a node does not turn dark it cannot be annotated.

Removing an annotation

  • Click on the desired node. Nodes with annotations are colored orange.
  • Go to the "annotations" panel (default position is at the bottom), find the annotation you wish to remove, and click on the trash can icon.

Annotating a descendant as having lost an ancestral function

  • Select the node by clicking on it.
  • Click on the "ECO/QUAL" column of the desired annotation.
  • Select NOT by putting the mouse over, and then select the evidence for the NOT annotation.

Saving (exporting) your annotations

Annotations can be saved as Gene Annotation Format (GAF) file.

  • Go to File->Export in the menu.
  • Enter the file name (must end in .gaf), and click on the Export button.

Interpreting the PANTHER trees

Speciation and duplication events

In the tree, a speciation node is shown with a circle, and a gene duplication node with an square.

Branch lengths

  • Branch lengths show the amount of sequence divergence that has occurred between a given node and its ancestral node, in terms of the average number of amino acid substitutions per site. Shorter branches indicate less sequence divergence and therefore greater conservation of ancestral characters. A branch might be shorter because of a slower evolutionary rate (greater negative selection), or because less "time" has gone by (actually a combination of number of generations and population dynamics), or both.
  • Very long branches indicate an unreliable divergence estimate, due to insufficient data. Note that sometimes there is not enough data to compare all branches that descend from a given node. In this case, we have set all descendant branches to a length of 2.0 (very long branches). Branch lengths of 2.0 are often due to a sequence fragment, and at a duplication node it may also indicate a gene that has been incorrectly broken into two different genes by a gene prediction program.
  • Following a gene duplication (after a square node), the relative branch lengths for descendant branches are particularly useful: the shortest branch (least diverged) is more likely to have greater functional conservation.

Multiple sequence alignment (MSA)

  • Some columns in the MSA have upper-case characters (and dashes '-' for insertions/deletions). These columns were used to estimate the phylogenetic tree.
  • Lower-case characters and periods (‘.’ for insertions/deletions) denote positions that were ignored when estimating the phylogenetic tree. Sometimes, tree errors arise because not enough columns were used, and the phylogeny could not be reconstructed well based on the included columns. Since they were not used in the phylogeny, lower-case characters can be particularly helpful in verifying the tree topology: any conserved insertions should be parsimoniously traceable to a common ancestor.

Known bugs

The tree building program has some known bugs that are being fixed. Most often, the errors are due to problems with the sequence alignment, or the specific MSA columns used to estimate the phylogeny. It performs fairly robust handling of sequence fragments, but sequence fragments still cause errors. Another source of error is when the sequences evolve very slowly, generating little variation from which to estimate phylogeny. In this case, the errors can usually be fixed by including additional alignment positions to consider in the phylogeny.

Reporting bugs or likely errors in the trees

Please email Paul directly at pdthomas@usc.edu.