PAINT User Guide: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 109: Line 109:
==Saving (exporting) your annotations==
==Saving (exporting) your annotations==
Annotations can be saved as Gene Annotation Format (GAF) file.
Annotations can be saved as Gene Annotation Format (GAF) file.
*Go to File->Export in the menu.
*Go to ''File-->Export'' in the menu.
*Enter the file name (must end in .gaf), and click on the Export button.
*Enter the file name (must end in .gaf), and click on the Export button.
==Importing an existing GAF file==
Gene Annotation Format (GAF) files can be imported, to add or modify existing annotations. Annotations that were stored as a GAF file can be read back in.
*Open the appropriate tree (see Accessing PANTHER trees above).
*Go to ''File-->Import'' in the menu.
*Select the GAF file that contains the annotations, and click the Import button.
== Reporting bugs and feature requests ==
== Reporting bugs and feature requests ==
* Please add bugs and feature requests to their respective trackers
* Please add bugs and feature requests to their respective trackers

Revision as of 16:39, 13 December 2010

Summary

PAINT is a Java application for viewing and annotating phylogenetic trees. This document briefly describes how to set up and use the tool.

Note: PAINT is currently available as a beta release. Consequently, some bugs persist. Please be flexible and patient.

Requirements

Java 1.6 (aka Java 6 on a Macintosh) must be installed.

Installing PAINT

PAINT is a Java application, and can be run on a wide variety of platforms. To install PAINT on either Mac or Windows, follow these steps:

  • Download the PAINT application at:

http://sourceforge.net/project/showfiles.php?group_id=184610 OR http://sourceforge.net/projects/pantherdb/files/

Note: Depending on your location, it may be necessary to modify or replace the hibernate.cfg.xml file to use different GO mirrors to speed up PAINT. Choose the closest mirror from Berkeley, Princeton, or EBI.

  • On a Mac, open a Unix terminal window, go to the directory containing the PAINT program, and execute the command:
sh launchPAINT.sh

OR

./launchPAINT.sh
  • On a Windows machine, run the program lauchPAINT.bat.

Using PAINT

Logging in

In the tree viewer, go to File->Login in the menu. Type in your user id and password. These may be filled in already.

Accessing a PANTHER family tree, and GO annotations for leaves

Once you’re logged in, you can search for a tree by going to File-->Open from database. If you know the PANTHER family ID of the protein(s) you are interested in, you can or select it from the drop-down menu or type it directly into the lower dialog box. Otherwise, you can search for the gene symbol, gene identifier, protein identifier, or description of your gene of interest in the upper dialog box. Note that:

  1. Wildcard characters won't work, though for some of the searches you can enter partial names with no wildcard characters.
  2. The search is (or may be) case-sensitive.
  3. You will have to select which type of search you would like using the radio buttons.

A list of families will appear in the “Search Results” box. Double-click on the family id to open that family in PAINT.

Opening the family may take several minutes.

In the future, you will be able to "lock" the book to prevent others from having write access during curation, but for now this is disabled. Right now you can view a family by clicking on the family identifier in the search results box.

Appearance and Basic Operation

Windows

PAINT is organized into tabbed panes that may be resized or split out into windows. Initially, the top pane shows a phylogenetic tree on the left and table on the right; you may switch back and forth between the table and a multiple sequence alignment (MSA) using the button above the table.

The bottom pane contains three tabs: Associations, Evidence, and Annotation Matrix. You may also see a minimized tab for Status. Click on a tab to bring it to the front. Click the icons in the tabs or the upper right corner to Undock/Dock, Minimize, Maximize, or close individual tabs or groups of tabs. Tabs and panes may also be rearranged within a window by dragging. Windows may be closed, arranged, or resized in the standard ways.

Phylogenetic tree and identifier table

Proteins are arranged in a phylogenetic tree showing their relationships; the tree is aligned with a table showing additional information and linkouts to various databases. You can adjust the relative sizes of each within the window by dragging the dot in the partition separating them. Click on the triangles at the top of the partition to expand or retract either of the panes. Note that the identifier table contains a lot of information that can be observed by scrolling to the right. Also, table columns can be resized.

Proteins with experimental annotations (IDA, EXP, IMP, IGI, IPI, or IEP evidence codes) for a particular ontology are colored and shown in boldface. You may select one ontology at a time to examine using the radio buttons at the top of the window. You may change the color used to indicate annotations using Edit → Curation status colors… . Descriptions here refer to the default colors.

The root and internal nodes of the tree are shown as circles (speciation events) and squares (gene duplication events). The nodes are numbered in a defined order starting with the root, AN0. (“AN” = “Ancestor.”) Mouse over a node to see its identifier. If you right-click on a node, a menu will appear with the options to “Collapse or expand node” to hide/show its descendants or “Reroot to node” to focus only on its descendants. As long as there is room, you may reroot downwards; however, the only option to go back up the tree is Tree → Reset Root to Main.

Click on a protein name in the tree to highlight the protein in the tree and the table. Clicking anywhere within a row in the table highlights the protein in the tree and the table; clicking on one of the blue linkouts will also open a link in your web browser. Left-click on a node in the tree to highlight the entire clade descended from it.

You may also want to rescale the tree if the branches are too long for comfortable viewing or too short to distinguish individual nodes. Select Tree->Scale... and enter a different number. For most trees, we find 50 works well, but it depends on the tree.

Viewing the multiple sequence alignment (MSA)

The trees were estimated from an MSA. Toggle back and forth between the table view (“Grid”) and the MSA view (“MSA”) using the buttons above the table/MSA panel. Note: You can view the sequence of a hypothetical ancestral protein (node) by first collapsing the appropriate node.

Associations window

Click on a protein in the tree or table. Annotations associated with that protein appear in the Associations window. Click the term name to link out to AmiGO. Click the reference to link out to the reference at PubMed or the appropriate model organism database.

The ECO/QUAL (Evidence code and qualifier) column shows the evidence code supporting the annotation and icons indicating any qualifiers, such as NOT (a red circle with a white X), colocalizes_with, or contributes_to. The icon resembling a ball-and-stick molecular figure indicates an experimental annotation, as opposed to an inferred annotation (see below).

Annotation matrix

The annotation matrix displays annotations associated with each protein in table format. Click on a protein in the tree and the corresponding row will be highlighted in the matrix. Each column indicates whether an annotation exists for a given term. Red squares indicate experimental annotations, blue squares inferred annotations. Black dots within the squares indicate that the annotation is directly to the term, white dots that the annotation is to a child term. NOT annotations are indicated with the same icon as in the Associations window, a red circle with a white X. Mouse over an annotation square to popup the protein and the term.

In the upper pane, click on a term to highlight it; every protein annotated to that term will also be highlighted. If one protein is annotated to the selected term, its name and annotations will appear in the Associations window; if more than one protein is annotated to the term, the Associations window will indicate, in the upper left corner, which ancestral node is the most recent ancestor to all the annotated proteins.

Technical note: The terms in the annotation matrix are arranged in a non-redundant first in-first out (FIFO) order, with all parent terms shown for each term used. A few very broad terms such as “protein binding” are not shown, even though they are listed in the associations table.

Evidence window

The evidence window is a text editor used to record notes on the curation process. It is pre-seeded with some boilerplate text.

“Find” function

The Find function (Edit → Find…) allows you to search for either a gene or a GO term. Select a gene or term search using the radio buttons. Searches are case-insensitive.

A gene search matches against any text in the identifier table. Scroll through the list of matches and click on a specific match to highlight it in the tree, table, and annotation matrix, and to display its annotations in the Associations window.

You may search GO terms using text, or you may use numbers to search for GO IDs.

Recommended configuration for curation

  • Bigger is better. Use as much of the monitor as you can afford. If you are using a laptop, you may wish to attach an external monitor.
  • We recommend that you use PAINT in a 4-window configuration with the Association, Evidence, and Annotation Matrix tabs undocked into separate windows and arranged comfortably on your screen.
  • Adjust the width of the window and the partition between the Tree and the Table until you are comfortable with them. (Note: due to a display bug, you may not be able to see the bottom row of the table if the Tree panel is actually wide enough to accommodate the entire tree; make it slightly too small.)
  • Undock the Annotation Matrix from the bottom pane. Make it as wide and tall as possible without obscuring the tree window. Adjust the horizontal divider to separate the top pane from the matrix itself until you can see enough of the names of the GO terms in the top pane. Resize and reposition the tree window to align each protein with the corresponding row in the matrix.
  • Undock the Evidence and Annotation tabs and find comfortable comes for them. You may wish to narrow the Annotation matrix to make room.

Note: Creating a very large Annotation Matrix may slow your computer down; this will manifest as slow scrolling and delayed screen refreshes when resizing the matrix. Consider sizing it to utilize only the top pane with the GO terms.

Annotating the trees with GO terms

Ancestral nodes in the tree can be annotated with any GO term that has been experimentally determined in one (or more) of its descendants, and then these “inferred” annotations can be propagated to its other descendants.

Annotating an ancestral node, and propagating to descendants by inheritance

  • Select the GO term from the matrix view, by clicking and holding.
  • Drag to the ancestral node you wish to annotate. When the mouse is over the node, it will turn dark. Release the mouse button to annotate.
  • NOTE that you can only annotate a node if AT LEAST ONE descendant has that experimental annotation (or a more specific one), so if a node does not turn dark it cannot be annotated.

Removing an annotation

  • Click on the desired node. Nodes with annotations are colored orange.
  • Go to the "annotations" panel (default position is at the bottom), find the annotation you wish to remove, and click on the trash can icon.

Annotating a descendant as having lost an ancestral function

  • Select the node by clicking on it.
  • Click on the "ECO/QUAL" column of the desired annotation.
  • Select NOT by putting the mouse over, and then select the evidence for the NOT annotation.

Saving (exporting) your annotations

Annotations can be saved as Gene Annotation Format (GAF) file.

  • Go to File-->Export in the menu.
  • Enter the file name (must end in .gaf), and click on the Export button.

Importing an existing GAF file

Gene Annotation Format (GAF) files can be imported, to add or modify existing annotations. Annotations that were stored as a GAF file can be read back in.

  • Open the appropriate tree (see Accessing PANTHER trees above).
  • Go to File-->Import in the menu.
  • Select the GAF file that contains the annotations, and click the Import button.

Reporting bugs and feature requests

  • Please add bugs and feature requests to their respective trackers
  • And for good measure fire off a message to the mailing list to get our attention
    • pantherdb-paint-users @ lists.sourceforge.net

Interpreting the PANTHER trees

Speciation and duplication events

In the tree, a speciation node is shown with a circle, and a gene duplication node with an square.

Branch lengths

  • Branch lengths show the amount of sequence divergence that has occurred between a given node and its ancestral node, in terms of the average number of amino acid substitutions per site. Shorter branches indicate less sequence divergence and therefore greater conservation of ancestral characters. A branch might be shorter because of a slower evolutionary rate (greater negative selection), or because less "time" has gone by (actually a combination of number of generations and population dynamics), or both.
  • Very long branches indicate an unreliable divergence estimate, due to insufficient data. Note that sometimes there is not enough data to compare all branches that descend from a given node. In this case, we have set all descendant branches to a length of 2.0 (very long branches). Branch lengths of 2.0 are often due to a sequence fragment, and at a duplication node it may also indicate a gene that has been incorrectly broken into two different genes by a gene prediction program.
  • Following a gene duplication (after a square node), the relative branch lengths for descendant branches are particularly useful: the shortest branch (least diverged) is more likely to have greater functional conservation.

Multiple sequence alignment (MSA)

  • Some columns in the MSA have upper-case characters (and dashes '-' for insertions/deletions). These columns were used to estimate the phylogenetic tree.
  • Lower-case characters and periods (‘.’ for insertions/deletions) denote positions that were ignored when estimating the phylogenetic tree. Sometimes, tree errors arise because not enough columns were used, and the phylogeny could not be reconstructed well based on the included columns. Since they were not used in the phylogeny, lower-case characters can be particularly helpful in verifying the tree topology: any conserved insertions should be parsimoniously traceable to a common ancestor.

Known bugs

The tree building program has some known bugs that are being fixed. Most often, the errors are due to problems with the sequence alignment, or the specific MSA columns used to estimate the phylogeny. It performs fairly robust handling of sequence fragments, but sequence fragments still cause errors. Another source of error is when the sequences evolve very slowly, generating little variation from which to estimate phylogeny. In this case, the errors can usually be fixed by including additional alignment positions to consider in the phylogeny.

Reporting bugs or likely errors in the trees

Please email Paul directly at pdthomas@usc.edu.