PAINT User Guide: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 70: Line 70:
====Navigating within the tree====
====Navigating within the tree====
* Click on a protein name in the tree to highlight the protein in the tree and the table.   
* Click on a protein name in the tree to highlight the protein in the tree and the table.   
* Click anywhere within a row in the table to highlight the protein in the tree and the table.
* Click on one of the blue linkouts will open a link in your web browser. 
* Left-click on a node in the tree to highlight the entire clade descended from it.
* Left-click on a node in the tree to highlight the entire clade descended from it.



Revision as of 18:53, 27 February 2012

Summary

PAINT is a Java application for viewing and annotating phylogenetic trees. This document briefly describes how to set up and use the tool.

Note: PAINT is currently available as a beta release. Consequently, some bugs persist. Please be flexible and patient.

Requirements

Java 1.6 (aka Java 6 on a Macintosh) must be installed.

Installing and configuring PAINT

PAINT is a Java application, and can be run on a wide variety of platforms. To install PAINT on either Mac or Windows, follow these steps:

  • Download the PAINT application at:

http://sourceforge.net/projects/pantherdb/files/PAINT/

  • Configuring
    • Note: Depending on your location, it may be necessary to modify or replace the hibernate.cfg.xml file to use different GO mirrors to speed up PAINT. Choose the closest mirror from Berkeley (default, suitable for US West Coast users), Princeton, or EBI.
    • Edit paint/config/hibernate.cfg.xml
    • Select your closest database and uncomment if necessary (and two lines for the associated user name and password)
      • Berkeley is on spoon.lbl.gov (formerly sin.lbl.gov - changed Feb 11, 2011)
      • EBI is on mysql.ebi.ac.uk
      • Princeton is on gomirror.princeton.edu
    • Make unused database options into comments
using brackets of 
<!--  your commented out stuff goes inside here -->

Launching PAINT

    • On a Windows machine, run the program lauchPAINT.bat.
    • On a Mac, open a Unix terminal window, go to the directory containing the PAINT program, and execute the command:
sh launchPAINT.sh     OR      ./launchPAINT.sh

Using PAINT

Logging in

In the tree viewer, go to File → Login in the menu. User id and password will be filled in already. Click OK.

Loading a PANTHER family tree and its GO annotations

Once you’re logged in, you can search for a tree by going to File → Open from database. If you know the PANTHER family ID of the protein(s) you are interested in, you can or select it from the drop-down menu or type it directly into the lower dialog box. Otherwise, you can search for the gene symbol, gene identifier, protein identifier, or description of your gene of interest in the upper dialog box. Note that:

  1. Wildcard characters won't work, though for some of the searches you can enter partial names with no wildcard characters.
  2. The search is case-sensitive. Keep in mind that gene name case varies from species to species - which may sometimes help in queries.
  3. You will have to select which type of search you would like using the radio buttons.

A list of families will appear in the “Search Results” box. Double-click on the family id to open that family in PAINT.

Opening the family may take several minutes.

In the future, you will be able to "lock" the book to prevent others from having write access during curation, but for now this is disabled. Right now you can view a family by clicking on the family identifier in the search results box.

Appearance and Basic Operation

Windows

PAINT is organized into tabbed panes that may be resized or split out into windows.

  • The top left pane shows a phylogenetic tree.
  • The top right pane allows you to switch back and forth between (i) the Annotation Matrix; (ii) the Protein Information Table and (iii) a multiple sequence alignment (MSA) of all sequences.
  • Columns in the Protein Information Table can be resized.
  • The bottom pane contains two tabs: Annotations and Evidence.
  • All panes can be minimized and docked/undocked.
  • Click on a tab to bring it to the front.
  • Click the icons in the tabs or the upper right corner to Undock/Dock, Minimize, Maximize, or close individual tabs or groups of tabs.
  • Tabs and panes may also be rearranged within a window by dragging.
  • Windows may be closed, arranged, or resized in the standard ways.

Phylogenetic Tree

Proteins are arranged in a phylogenetic tree showing their evolutionary relationships. The tree can be rescaled if the branches are too long for comfortable viewing or too short to distinguish individual nodes. Select Tree->Scale... and enter a different number. For most trees, we find 50 works well, but it depends on the tree, ie, on the closeness of the sequences in the tree.

The root and internal nodes of the tree are shown as circles (speciation events) and squares (gene duplication events). The nodes are numbered in a defined order starting with the root, AN0. (“AN” = “Ancestor.”) Mouse over a node to see its identifier. If you right-click on a node, a menu will appear with the options to “Collapse or expand node” to hide/show its descendants or “Reroot to node” to focus only on its descendants. You can repeat the reroot operation as many times as necessary; however, to go back up the tree is via the PAINT menu: Tree → Reset Root to Main.

Proteins with experimental annotations (IDA, EXP, IMP, IGI, IPI, or IEP evidence codes) for a particular ontology are colored and shown in boldface. You may select one ontology at a time to examine using the radio buttons at the top of the window. You may change the color used to indicate annotations using Edit → Curation status colors… . Descriptions here refer to the default colors.

Navigating within the tree

  • Click on a protein name in the tree to highlight the protein in the tree and the table.
  • Left-click on a node in the tree to highlight the entire clade descended from it.

Protein Information table

The phylogenetic tree is aligned with a protein information table showing additional information and linkouts to various databases. You can adjust the relative sizes of each within the window by dragging the line in the partition separating them. Note that the identifier table contains a lot of information that can be observed by scrolling to the right.

Multiple sequence alignment (MSA)

The trees were estimated from an MSA. Toggle back and forth between the table view (“Protein Information”) and the MSA view (“MSA”) using the buttons above the table/MSA panel. Note: You can view the sequence of a hypothetical ancestral protein (node) by first collapsing the appropriate node.

Associations window

Click on a protein in the tree or table. Annotations associated with that protein appear in the Associations window. Click the term name to link out to AmiGO. Click the reference to link out to the reference at PubMed or the appropriate model organism database.

The ECO/QUAL (Evidence code and qualifier) column shows the evidence code supporting the annotation and icons indicating any qualifiers, such as NOT (a red circle with a white X), colocalizes_with, or contributes_to. The icon resembling a ball-and-stick molecular figure indicates an experimental annotation, as opposed to an inferred annotation (see below).

Annotation matrix

Note: The colors refer to the default colors in PAINT

The annotation matrix displays annotations associated with each protein in table format. Click on a protein in the tree and the corresponding row will be highlighted in the matrix. Each column indicates whether an annotation exists for a given term.

  • Experimental annotations are represented by red squares, while inferred annotations are represented by blue squares.
  • Black dots within the squares indicate that the annotation is directly to the term, white dots that the annotation is to a child term.
  • NOT annotations are indicated with the same icon as in the Associations window, a red circle with a white X.
  • Mouse over an annotation square to popup the protein name and the term.
  • In the annotation matrix, click on a dotted square to highlight it; if the term is a associated to the protein via an experimental evidence (directly or to a parent term; ie the red squares), every protein annotated to that term will also be highlighted in the Protein Information Table. (Inferred annotations are not highlighted.)

Technical note: The terms in the annotation matrix are arranged in a non-redundant first in-first out (FIFO) order, with all parent terms shown for each term used. A few very broad terms such as “protein binding” are not shown, even though they are listed in the associations table.

(Functionality removed:

If one protein is annotated to the selected term, its name and annotations will appear in the Associations window; if more than one protein is annotated to the term, the Associations window will indicate, in the upper left corner, which ancestral node is the most recent ancestor to all the annotated proteins.)

Evidence window

The evidence window is a text editor used to record notes on the curation process. It is pre-seeded with some boilerplate text.

“Find” function

The Find function (Edit → Find…) allows you to search for either a gene or a GO term. Select a gene or term search using the radio buttons. Searches are case-insensitive.

A gene search matches against any text in the identifier table. Scroll through the list of matches and click on a specific match to highlight it in the tree, table, and annotation matrix, and to display its annotations in the Associations window.

You may search GO terms using text, or you may use numbers to search for GO IDs.

Recommended configuration for curation

  • Bigger is better. Use as much of the monitor as you can afford. If you are using a laptop, you may wish to attach an external monitor.
  • We recommend that you use PAINT in a 4-window configuration with the Association, Evidence, and Annotation Matrix tabs undocked into separate windows and arranged comfortably on your screen.
  • Adjust the width of the window and the partition between the Tree and the Table until you are comfortable with them. (Note: due to a display bug, you may not be able to see the bottom row of the table if the Tree panel is actually wide enough to accommodate the entire tree; make it slightly too small.)
  • Undock the Annotation Matrix from the bottom pane. Make it as wide and tall as possible without obscuring the tree window. Adjust the horizontal divider to separate the top pane from the matrix itself until you can see enough of the names of the GO terms in the top pane. Resize and reposition the tree window to align each protein with the corresponding row in the matrix.
  • Undock the Evidence and Annotation tabs and find comfortable comes for them. You may wish to narrow the Annotation matrix to make room.

Note: Creating a very large Annotation Matrix may slow your computer down; this will manifest as slow scrolling and delayed screen refreshes when resizing the matrix. Consider sizing it to utilize only the top pane with the GO terms.

Making an inference: Transferring annotations

Ancestral nodes in the tree can be annotated with any GO term that has been annotated to one (or more) of its descendants. These “inferred” annotations can be propagated to its other descendants.

Annotating an ancestral node, and propagating to descendants by inheritance

  1. Click on and hold a GO term from the upper pane of the Annotation Matrix.
  2. Drag the term to the ancestral node you wish to annotate. When you mouse over it, the node will turn dark. Release the mouse button to annotate.
  3. The node is now annotated with that term using the evidence code “IDS” (“Inferred from Descendant Sequence”).
  4. All descendants of the node will now be annotated with that term using the evidence code “IAS” (“Inferred from Ancestral Sequence”). (Proteins and nodes already annotated with the term or one of its child terms will remain unchanged.)

Note: You may only annotate a node with a given GO term if AT LEAST ONE descendant has an annotation to that term or a child term. If a node does not turn dark (step 2), it cannot be annotated.

Removing an annotation

NOTE: This is not working in paint1.0_beta43

If you make a mistake or change your mind, you can remove an annotation.

  1. Click on the desired node. Nodes with annotations are colored orange.
  2. Go to the Associations window. To remove an annotation, click the trash can icon in the Undo column.

"NOT" annotations

Note: In the context of PAINT, adding the NOT modifier indicates a belief that the specified function was present in an ancestral protein and has been LOST in the indicated protein or clade. This is a special case of existing GO guidelines for NOT, which state that a NOT annotation may be made in situations where a particular function is expected but not observed.

You may add a NOT modifier to an existing annotation if you feel the evidence warrants it for one of the following reasons, listed here in order of decreasing strength:

  • There are experimental annotations indicating that a function has been lost from one or more proteins. (Evidence code IDS = Inferred from Descendant Sequences)
  • Specific residues have been mutated at, for example, an enzyme’s catalytic site, and a specified function is no longer possible. (Evidence code IMR = Inferred from Mutant Residues)
  • A protein or clade has evolved rapidly, losing the original function and gaining a new one. This may be visible as a long branch in the tree, but the meaning of “long” varies by context, and a visibly long branch is not strictly required. (Evidence code IRD = Inferred from Rapid Divergence)

To add the NOT qualifier:

  1. Select a node or protein from the tree. This may be either a directly annotated node or one of its children.
  2. In the ECO/QUAL column of the Associations window, click on the “IAS.” A popup menu will appear.
  3. Under “NOT,” select which evidence code justifies the NOT qualifier.
  4. All annotations to proteins and nodes descended from that node will have the NOT qualifier added.

You may remove a NOT qualifier by clicking the trash can in the Undo column. Note that you can only remove the qualifiers from the specific node to which it was made.

Preventing propagation to a clade or protein

If you do not wish to allow a positive annotation to propagate into a particular clade AND you do not wish to make a statement as strong as a NOT annotation, you may block an annotation from propagating into the clade. Instead of selecting a choice from the NOT menu as above, choose “STOP.” In addition to the annotation no longer propagating downward, a small hash mark will appear near the node in the tree to indicate that the block exists. Note that a hash mark only indicates the existence of at least one block, not that every annotation through that node is blocked.

You may remove a STOP qualifier in the same way, and with the same restrictions and consequences, as a NOT qualifier.

Saving your annotations

To save your annotations to your hard drive in Gene Association File (GAF) format, select File → Save annotations. We recommend that, in the dialog box, you navigate to the Desktop and, since PAINT saves 8-9 files in one stroke, create a new folder in which to house these files. Follow the on-screen directions to complete the process.

Importing an existing GAF file

If you have quit PAINT and wish to resume curating a family, you may reload your annotations from the GAF file. Select File → Open from files… and navigate to the GAF file using the dialog box. Click on the GAF file that contains the annotations and follow the onscreen directions.

Reporting bugs and feature requests

  • Please add bugs and feature requests to their respective trackers
  • And for good measure fire off a message to the mailing list to get our attention
    • pantherdb-paint-users @ lists.sourceforge.net

Other useful links

Interpreting the PANTHER trees

Speciation and duplication events

In the tree, a speciation node is shown with a circle, and a gene duplication node with an square.

Branch lengths

  • Branch lengths show the amount of sequence divergence that has occurred between a given node and its ancestral node, in terms of the average number of amino acid substitutions per site. Shorter branches indicate less sequence divergence and therefore greater conservation of ancestral characters. A branch might be shorter because of a slower evolutionary rate (greater negative selection), or because less "time" has gone by (actually a combination of number of generations and population dynamics), or both.
  • Very long branches indicate an unreliable divergence estimate, due to insufficient data. Note that sometimes there is not enough data to compare all branches that descend from a given node. In this case, we have set all descendant branches to a length of 2.0 (very long branches). Branch lengths of 2.0 are often due to a sequence fragment, and at a duplication node it may also indicate a gene that has been incorrectly broken into two different genes by a gene prediction program.
  • Following a gene duplication (after a square node), the relative branch lengths for descendant branches are particularly useful: the shortest branch (least diverged) is more likely to have greater functional conservation.

Multiple sequence alignment (MSA)

  • Some columns in the MSA have upper-case characters (and dashes '-' for insertions/deletions). These columns were used to estimate the phylogenetic tree.
  • Lower-case characters and periods (‘.’ for insertions/deletions) denote positions that were ignored when estimating the phylogenetic tree. Sometimes, tree errors arise because not enough columns were used, and the phylogeny could not be reconstructed well based on the included columns. Since they were not used in the phylogeny, lower-case characters can be particularly helpful in verifying the tree topology: any conserved insertions should be parsimoniously traceable to a common ancestor.

Known bugs

The tree building program has some known bugs that are being fixed. Most often, the errors are due to problems with the sequence alignment, or the specific MSA columns used to estimate the phylogeny. It performs fairly robust handling of sequence fragments, but sequence fragments still cause errors. Another source of error is when the sequences evolve very slowly, generating little variation from which to estimate phylogeny. In this case, the errors can usually be fixed by including additional alignment positions to consider in the phylogeny.

Bugs in version paint1.0_beta43

  • 'delete' does not work
  • When you select a node on the tree, not all descendent sequences get selected in the table view on the right panel.
  • Upon collapsing, the annotation matrix does not align correctly with the tree.

Reporting bugs or likely errors in the trees

Please email Paul directly at pdthomas@usc.edu.

Curation Guidelines

Those guidelines have been published (Gaudet, Livestone, Lewis, Thomas, 2011) [1]


Please refer to this PDF file for curation guidelines: File:PAINT demo and curation guidelines 120710.pdf