PAINT User Guide

From GO Wiki
Jump to navigation Jump to search

Summary

PAINT is a Java application for viewing and annotating phylogenetic trees. This document briefly describes how to set up and use the tool.

Note: PAINT is currently available as a beta release. Consequently, some bugs persist. Please be flexible and patient.

Requirements

Java 1.6 (aka Java 6 on a Macintosh) must be installed.

Installing PAINT

PAINT is a Java application, and can be run on a wide variety of platforms. To install PAINT on either Mac or Windows, follow these steps:

  • Download the PAINT application at:

http://sourceforge.net/project/showfiles.php?group_id=184610 OR http://sourceforge.net/projects/pantherdb/files/

Note: Depending on your location, it may be necessary to modify or replace the hibernate.cfg.xml file to use different GO mirrors to speed up PAINT. Choose the closest mirror from Berkeley, Princeton, or EBI.

  • On a Mac, open a Unix terminal window, go to the directory containing the PAINT program, and execute the command:
sh launchPAINT.sh

OR

./launchPAINT.sh
  • On a Windows machine, run the program lauchPAINT.bat.

Using PAINT

Logging in

In the tree viewer, go to File->Login in the menu. Type in your user id and password. These may be filled in already.

Accessing a PANTHER family tree, and GO annotations for leaves

Once you’re logged in, you can search for a tree by going to File-->Open from database. If you know the PANTHER family ID of the protein(s) you are interested in, you can or select it from the drop-down menu or type it directly into the lower dialog box. Otherwise, you can search for the gene symbol, gene identifier, protein identifier, or description of your gene of interest in the upper dialog box. Note that:

  1. Wildcard characters won't work, though for some of the searches you can enter partial names with no wildcard characters.
  2. The search is (or may be) case-sensitive.
  3. You will have to select which type of search you would like using the radio buttons.

A list of families will appear in the “Search Results” box. Double-click on the family id to open that family in PAINT.

Opening the family may take several minutes.

In the future, you will be able to "lock" the book to prevent others from having write access during curation, but for now this is disabled. Right now you can view a family by clicking on the family identifier in the search results box.

Appearance and Basic Operation

Windows

PAINT is organized into tabbed panes that may be resized or split out into windows. Initially, the top pane shows a phylogenetic tree on the left and table on the right; you may switch back and forth between the table and a multiple sequence alignment (MSA) using the button above the table.

The bottom pane contains three tabs: Associations, Evidence, and Annotation Matrix. You may also see a minimized tab for Status. Click on a tab to bring it to the front. Click the icons in the tabs or the upper right corner to Undock/Dock, Minimize, Maximize, or close individual tabs or groups of tabs. Tabs and panes may also be rearranged within a window by dragging. Windows may be closed, arranged, or resized in the standard ways.

Recommended configuration for curation

  • Bigger is better. Use as much of the monitor as you can afford. If you are using a laptop, you may wish to attach an external monitor.
  • We recommend that you use PAINT in a 4-window configuration with the Association, Evidence, and Annotation Matrix tabs undocked into separate windows and arranged comfortably on your screen.
  • Adjust the width of the window and the partition between the Tree and the Table until you are comfortable with them. (Note: due to a display bug, you may not be able to see the bottom row of the table if the Tree panel is actually wide enough to accommodate the entire tree; make it slightly too small.)
  • Undock the Annotation Matrix from the bottom pane. Make it as wide and tall as possible without obscuring the tree window. Adjust the horizontal divider to separate the top pane from the matrix itself until you can see enough of the names of the GO terms in the top pane. Resize and reposition the tree window to align each protein with the corresponding row in the matrix.
  • Undock the Evidence and Annotation tabs and find comfortable comes for them. You may wish to narrow the Annotation matrix to make room.

Note: Creating very large Annotation Matrices may slow your computer down; this will manifest as slow scrolling and delayed screen refreshes when resizing the matrix. Consider sizing it to utilize only the top pane with the GO terms.

Resizing the panel and column widths

You can resize the tree panel by clicking and dragging the partition between the two panels. The right panel can be toggled between multiple sequence alignment ("MSA") and a table of information about each sequence in the tree ("Grid"). Table columns can also be resized.

Viewing the multiple sequence alignment (MSA)

The trees were estimated from an MSA. One can toggle back and forth between the table view (“Grid”) and the MSA view (“MSA”) using the buttons just below the menu, above the table/MSA panel. You can view ancestral sequences, by first collapsing the appropriate node in the tree (right-click, or apple-click on Mac).

Rescaling the tree branch lengths

You may also want to rescale the trees if the branches look too long for comfortable viewing. Go to Tree->Scale... and enter a different number. For most trees we find 50 works well, but it depends on the tree.

Importing an existing GAF file

Gene Annotation Format (GAF) files can be imported, to add or modify existing annotations. Annotations that were stored as a GAF file can be read back in.

  • Open the appropriate tree (see Accessing PANTHER trees above).
  • Go to File->Import in the menu.
  • Select the GAF file that contains the annotations, and click the Import button.

Annotating the trees with GO terms

Ancestral nodes in the tree can be annotated with any GO term that has been experimentally determined in one (or more) of its descendants, and then these “inferred” annotations can be propagated to its other descendants.

Annotating an ancestral node, and propagating to descendants by inheritance

  • Select the GO term from the matrix view, by clicking and holding.
  • Drag to the ancestral node you wish to annotate. When the mouse is over the node, it will turn dark. Release the mouse button to annotate.
  • NOTE that you can only annotate a node if AT LEAST ONE descendant has that experimental annotation (or a more specific one), so if a node does not turn dark it cannot be annotated.

Removing an annotation

  • Click on the desired node. Nodes with annotations are colored orange.
  • Go to the "annotations" panel (default position is at the bottom), find the annotation you wish to remove, and click on the trash can icon.

Annotating a descendant as having lost an ancestral function

  • Select the node by clicking on it.
  • Click on the "ECO/QUAL" column of the desired annotation.
  • Select NOT by putting the mouse over, and then select the evidence for the NOT annotation.

Saving (exporting) your annotations

Annotations can be saved as Gene Annotation Format (GAF) file.

  • Go to File->Export in the menu.
  • Enter the file name (must end in .gaf), and click on the Export button.

Reporting bugs and feature requests

  • Please add bugs and feature requests to their respective trackers
  • And for good measure fire off a message to the mailing list to get our attention
    • pantherdb-paint-users @ lists.sourceforge.net

Interpreting the PANTHER trees

Speciation and duplication events

In the tree, a speciation node is shown with a circle, and a gene duplication node with an square.

Branch lengths

  • Branch lengths show the amount of sequence divergence that has occurred between a given node and its ancestral node, in terms of the average number of amino acid substitutions per site. Shorter branches indicate less sequence divergence and therefore greater conservation of ancestral characters. A branch might be shorter because of a slower evolutionary rate (greater negative selection), or because less "time" has gone by (actually a combination of number of generations and population dynamics), or both.
  • Very long branches indicate an unreliable divergence estimate, due to insufficient data. Note that sometimes there is not enough data to compare all branches that descend from a given node. In this case, we have set all descendant branches to a length of 2.0 (very long branches). Branch lengths of 2.0 are often due to a sequence fragment, and at a duplication node it may also indicate a gene that has been incorrectly broken into two different genes by a gene prediction program.
  • Following a gene duplication (after a square node), the relative branch lengths for descendant branches are particularly useful: the shortest branch (least diverged) is more likely to have greater functional conservation.

Multiple sequence alignment (MSA)

  • Some columns in the MSA have upper-case characters (and dashes '-' for insertions/deletions). These columns were used to estimate the phylogenetic tree.
  • Lower-case characters and periods (‘.’ for insertions/deletions) denote positions that were ignored when estimating the phylogenetic tree. Sometimes, tree errors arise because not enough columns were used, and the phylogeny could not be reconstructed well based on the included columns. Since they were not used in the phylogeny, lower-case characters can be particularly helpful in verifying the tree topology: any conserved insertions should be parsimoniously traceable to a common ancestor.

Known bugs

The tree building program has some known bugs that are being fixed. Most often, the errors are due to problems with the sequence alignment, or the specific MSA columns used to estimate the phylogeny. It performs fairly robust handling of sequence fragments, but sequence fragments still cause errors. Another source of error is when the sequences evolve very slowly, generating little variation from which to estimate phylogeny. In this case, the errors can usually be fixed by including additional alignment positions to consider in the phylogeny.

Reporting bugs or likely errors in the trees

Please email Paul directly at pdthomas@usc.edu.