PAINT User Guide: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
(48 intermediate revisions by 5 users not shown)
Line 1: Line 1:
[[Category:PAINT]]
=Summary=
=Summary=
[[PAINT]] is a Java application for viewing and annotating phylogenetic trees.  This document briefly describes how to set up and use the tool.
[[PAINT]] is a Java application for viewing and annotating phylogenetic trees.  This document briefly describes how to set up and use the tool.
'''Note''': PAINT is currently available as a beta release.  Consequently, some bugs persist.  Please be flexible and patient.


[[File:Background slides on family and function evolution.pdf]]
[[File:Background slides on family and function evolution.pdf]]


=Requirements=
=Requirements=
Java 1.6 (aka Java 6 on a Macintosh) must be installed.
Java 1.8 (aka Java 8 on a Macintosh) must be installed.


=Installing and configuring PAINT=
=Installing and configuring PAINT=
PAINT is a Java application, and can be run on a wide variety of platforms.  To install PAINT on either Mac or Windows, follow these steps:
PAINT is a Java application, and can be run on a wide variety of platforms.  To install PAINT on either Mac or Windows, follow these steps:
* Download the PAINT application at:
* Download the PAINT application at:
http://sourceforge.net/projects/pantherdb/files/PAINT/
http://paintcuration.usc.edu/
 
Select the latest version to download.
* Configuring
** '''Note''': Depending on your location, it may be necessary to modify or replace the hibernate.cfg.xml file to use different GO mirrors to speed up PAINT.  Choose the closest mirror from Berkeley (default, suitable for US West Coast users), Princeton, or EBI.
** Edit paint/config/hibernate.cfg.xml
** Select your closest database and uncomment if necessary (and two lines for the associated user name and password)
*** Berkeley is on spoon.lbl.gov '''(formerly sin.lbl.gov - changed Feb 11, 2011)'''
*** EBI is on mysql.ebi.ac.uk
*** Princeton is on gomirror.princeton.edu
** Make unused database options into comments
using brackets of
<nowiki><!--  your commented out stuff goes inside here --></nowiki>


=Launching PAINT=
=Launching PAINT=
Line 33: Line 20:
=Using PAINT=
=Using PAINT=


==Loading a PANTHER family tree and its GO annotations==
==Login==
You are required to login before you can open a tree. The purpose is to record proper acknowledgement for all the curated annotations (of tree nodes) created by you.


There are two ways to load a PANTHER family tree
Go to File -> Login.


===Open a new tree===
If you just want to view the tree and annotations, you can enter 'gouser' as the username. The password is filled already. This is a read-only login.


Once you launched the PAINT tool, you can search for a tree in the PANTHER database by going to ''Annotate -> New''.
If you want to curate the tree, enter your username and password. If you don’t have a login and password, send an email to huaiyumi@usc.edu and request one.
If you know the PANTHER family ID of the protein(s) you are interested in, you can select it from the drop-down menu or type it directly into the lower dialog box.  Otherwise, you can search for the gene symbol, gene identifier, protein identifier, or description of your gene of interest in the upper dialog box. Note that:
#Wildcard characters won't work, though for some of the searches you can enter partial names with no wildcard characters.
#The search is case-sensitive. Keep in mind that gene name case varies from species to species - which may sometimes help in queries.
#You will have to select which type of search you would like using the radio buttons.
A list of families will appear in the “Search Results” box.  Double-click on the family id to open that family in PAINT.


Opening the family may take several minutes.
==Curating a gene family==
 
The analogy is to a library. You will first find and check out (lock) the families you want to curate, and then select a family to curate from your list of locked families. All families now have a curation status (curated, partially curated, uncurated).
In the future, you will be able to "lock" the book to prevent others from having write access during curation, but for now this is disabled. Right now you can view a family by clicking on the family identifier in the search results box.
 
===Open a previous curated tree===
To open a previously curated tree located on your hard drive, use ''Annotate -> Open'', and navigate to the folder that contains the GAF file. Open the tree with the ''.paint'' file.


===Step 1: Find and "lock" the families you would like to curate===
When you lock the family, other curators won’t be able to curate them. This is to prevent people from working on the same family.
* Go to File -> Manage and View Books...
[[File:PAINT_search.png|thumb|Fig. 1 PAINT family search box|400px]]
**A window, as shown in Figure 1, will pop up. You can search for families by following ways.:
***Search by keywords in specified fields, such as Gene Symbol, Protein Identifier, Gene Identifier, or gene definition.
***Enter a specific PTHR ID
***Retrieve a list of all families, or just the uncurated families.
**Press the "submit" button to launch search
[[File:PAINT_family_search_results.png|thumb|Fig. 2 PAINT family search results|400px]]
* Select books to lock. Figure 2 shows an example when all uncurated families are returned. There are 4 possible curation status states:
**Manually curated – These are the families curated, and the curator believes that the curation is complete.
**Locked – Those families are locked by a curator. The name of the curator who locks the family is shown in the Locked by column.
**Partially curated – These are the families that have been curated. The curator can unlock the family and leave it as partially curated.
**Require paint review – The previously curated paint annotations are changed due to updates in either PANTHER and GO.
**Unknown – These are uncurated families.
*Check the box in the “Lock” column of the families you want to check out, and click “Lock or Unlock selected Books” button at the bottom of the panel.


===Step 2: Open a family to curate===
You can only curate families you have locked.
[[File:PAINT_family_opening.png|thumb|Fig. 3 Opening a previously locked family.|400px]]
*To open a family, click “View Locked Books”, and then click the “View” button (Figure 3).


===Step 3: Save your annotations===
You can choose to save but keep the family locked so you can continue the curation later. You can also save and unlock the family.
* Go to File -> Save to Database. A window will pop up with the following options.
**Cancel
**Save and unlock – The family will be unlocked and marked as Partially Curated.
**Save – The family will remain locked. The curator should do this as often as possible during the curation.
**Save, unlock & set curated – The family will be marked as Manually Curated.


---------
---------
Line 208: Line 215:


You may add a NOT modifier to an existing annotation if you feel the evidence warrants it for one of the following reasons, listed here in order of decreasing strength:
You may add a NOT modifier to an existing annotation if you feel the evidence warrants it for one of the following reasons, listed here in order of decreasing strength:
* There are experimental annotations indicating that a function has been lost from one or more proteins.  (Evidence code IDS = Inferred from Descendant Sequences)
* There are experimental annotations indicating that a function has been lost from one or more proteins.  (Evidence code IBA = Inferred from Biological Ancestor)
* Specific residues have been mutated at, for example, an enzyme’s catalytic site, and a specified function is no longer possible.  (Evidence code IMR = Inferred from Mutant Residues)
* Specific residues have been mutated at, for example, an enzyme’s catalytic site, and a specified function is no longer possible.  (Evidence code IKR = Inferred from Known Residues)
* A protein or clade has evolved rapidly, losing the original function and gaining a new one.  This may be visible as a long branch in the tree, but the meaning of “long” varies by context, and a visibly long branch is not strictly required.  (Evidence code IRD = Inferred from Rapid Divergence)
* A protein or clade has evolved rapidly, losing the original function and gaining a new one.  This may be visible as a long branch in the tree, but the meaning of “long” varies by context, and a visibly long branch is not strictly required.  (Evidence code IRD = Inferred from Rapid Divergence)
To add the NOT qualifier:
To add the NOT qualifier:
Line 215: Line 222:
# In the ECO/QUAL column of the Associations window, click on the “IAS.”  A popup menu will appear.
# In the ECO/QUAL column of the Associations window, click on the “IAS.”  A popup menu will appear.
# Under “NOT,” select which evidence code justifies the NOT qualifier.
# Under “NOT,” select which evidence code justifies the NOT qualifier.
# All annotations to proteins and nodes descended from that node will have the NOT qualifier added.
# If the evidence code is IBA or IKR, all annotations to proteins and nodes descended from that node will have the NOT qualifier added (as these have good evidence for loss)
# If the evidence code is IRD, descendant sequences will not be annotated with the NOT qualifier, but the ancestral annotation will not be propagated to the descendants.  Thus this acts like a STOP PROPAGATION.


You may remove a NOT qualifier by clicking the trash can in the Undo column.  Note that you can only remove the qualifiers from the specific node to which it was made.
You may remove a NOT qualifier by clicking the trash can in the Undo column.  Note that you can only remove the qualifiers from the specific node to which it was made.


===Preventing propagation to a clade or protein===
In addition to the annotation no longer propagating downward, a small hash mark will appear near the node in the tree to indicate that the block exists.  Note that a hash mark only indicates the existence of at least one block, not that every annotation through that node is blocked.
If you do not wish to allow a positive annotation to propagate into a particular clade AND you do not wish to make a statement as strong as a NOT annotation, you may block an annotation from propagating into the clade.  Instead of selecting a choice from the NOT menu as above, choose “STOP.”  In addition to the annotation no longer propagating downward, a small hash mark will appear near the node in the tree to indicate that the block exists.  Note that a hash mark only indicates the existence of at least one block, not that every annotation through that node is blocked.
 
You may remove a STOP qualifier in the same way, and with the same restrictions and consequences, as a NOT qualifier.


===Annotation with Qualifiers===
===Annotation with Qualifiers===
Line 228: Line 233:
If you propagate an annotation with a qualifier, e.g. "NOT", "contributes_to" (for MF only), or "colocalizes_with" (for CC only), you will get a pop-up window asking if you wish to also propagate the relevant qualifier(s). The default is '''No'''; to accept this option, click '''OK'''. To propagate the qualifier, tick the '''Yes''' button and click '''OK'''.
If you propagate an annotation with a qualifier, e.g. "NOT", "contributes_to" (for MF only), or "colocalizes_with" (for CC only), you will get a pop-up window asking if you wish to also propagate the relevant qualifier(s). The default is '''No'''; to accept this option, click '''OK'''. To propagate the qualifier, tick the '''Yes''' button and click '''OK'''.


==Recording trees examined, but not annotated==
== Partial annotation of trees ==


When you examine a tree and feel that it should not be annotated for some reason, please record that in the Evidence Notes so that we can track the fact that the family has been examined. Please use one of these tags (in all caps) in the Notes section of the Evidence tab. You can additional information after the tag if you wish (syntax between tag and additional info not discussed or determined). Then, save your annotations as normal so that PAINT will save the notes file.
{|align='right'
|-
|[[File:PTHR24073-RabFamily.jpg|thumb|500px|Figure 10 The RAB GTPase superfamily]]]
|}


* '''MISSING ANNOTATION''' - Use this if the tree looks OK, but there are insufficient experimental annotations to propagate any annotations.
When you want to annotate a very large family, e.g. the RAB GTPase superfamily (PTHR24073), it may not be feasible to annotate all clades at the same time. In this kind of situation, you may choose to annotate only the clades you are knowledgeable and confident of, and leave other clades unexamined. When you do this, you should fully annotate the clades you choose to annotate. For example, if you choose to do the IFT27 clade, do it fully. Please don't do piecemeal annotations in various locations that may make it hard for a subsequent annotator to understand what has been done.  
* '''MISSING SEQUENCE''' - Use this if you feel that a specific sequence or sequences is missing. You can list the IDs of the sequence(s) after the tag.
* '''BAD TREE''' - Use this if you feel that the tree has major problems beyond one or a few missing sequences.


==Saving your annotations==
We also agreed at the July 2014 PAINT Jamboree that you can make propagations all the way to the root if you feel that there is an ancestral role, even if you think that some clades have lost this. For example, in the RAB GTPase superfamily, we think that it had an ancestral function as a GTPase, but it is possible that some clades, e.g. the IFT22 clade, have lost this ancestral activity. You can make these high level propagations as part of your initial annotation of the family. If there are clades where this is wrong, perhaps the IBA annotation from PAINT will generate feedback that will help us correct it.


To save your annotations to your hard drive in Gene Association File (GAF) format, select ''Annotate -> Save''.  Since PAINT saves 8-9 files in one stroke, we recommend that, in the dialog box, you create a new folder in your hard drive designated to house these files. Typically, we recommend that you name this new folder with the ID of the tree, such as PTHR11409. Follow the on-screen directions to complete the process.
=== Recording partial annotation in the notes file ===
If you only partially annotate a tree, please record in the notes file which clades you have worked on using the node number, e.g. '''Eukaryota_PTN001180007''' as well as a common name, e.g. '''IFT27''', if it is helpful.


==Importing an existing GAF file==
=== Special SVN commit message for indicating partial annotation of a tree ===
If you have quit PAINT and wish to resume curating a family, you may reload your annotations from the GAF file.  Select ''Annotate -> Open'' and navigate to the GAF file using the dialog box.  Click on the ''.paint' file and the tree will be loaded to the PAINT tool with all the previously curated annotations.
Instead of using the standard SVN commit message ("new PAINT annotations"), please commit with a special SVN log message: "new PAINT annotations for PARTIALLY annotated tree".


==Submitting PAINT annotations to the GO database using SVN==
==Recording trees examined, but not annotated==


===Check out the SVN directory=== 
When you examine a tree and feel that it should not be annotated for some reason, please record that in the Evidence Notes so that we can track the fact that the family has been examined. Please use one of these tags (in all caps) in the Notes section of the Evidence tab. You can additional information after the tag if you wish (syntax between tag and additional info not discussed or determined). Then, save your annotations as normal so that PAINT will save the notes file.
*Note you will only have to do this once. 
* The command below will create a local directory called GO-paint, which can be modified and checked back in to SVN. Your GO-paint directory will be a subdirectory of the directory you are in when you give this command. You can put it anywhere within your directory structure that you wish.
*: '''svn co svn+ssh://<font color="red">username</font>@ext.geneontology.org/share/go/svn/trunk/gene-associations/submission/paint GO-paint'''
*: where username is your GO username


* '''MISSING ANNOTATION''' - Use this if the tree looks OK, but there are insufficient experimental annotations to propagate any annotations.
* '''MISSING SEQUENCE''' - Use this if you feel that a specific sequence or sequences is missing. You can list the IDs of the sequence(s) after the tag.
* '''BAD TREE''' - Use this if you feel that the tree has major problems beyond one or a few missing sequences.


===Create a new directory for a specific Panther family===
# '''cd GO-paint'''
# '''mkdir PTHRnnnnnn'''
#: (where PTHRnnnnnn is the family you will be working on, e.g. PTHR10003)
# '''svn add PTHRnnnnn'''
# '''svn commit -m "new directory for Panther family"'''
===Annotations for a new family, or notes for a family examined, but not annotated===
* '''Before opening PAINT'''
** ''Update'' - use this command to update your repository to the latest version
**# Move to the directory above your "GO-paint/" directory.
**#: '''cd <font color="red">[path]</font>/'''
**# Update the entire GO-paint directory with this command:
**#: '''svn update GO-paint/'''
** ''Create a new directory'' - Use the instructions above to create a directory specific for the Panther family you will be working on.
* '''While using PAINT''' on family PTHRnnnnn, save to this new directory
* '''After saving''', when you're finished editing the PAINT notes and annotations, check in to the SVN repository
** '''To save all files''' for an annotated family:
**# Go to the directory where you've saved the new PAINT files
**#: '''cd <font color="red">[path]</font>/GO-paint/PTHRnnnnnn'''
**#: where [path] is wherever you saved the SVN directory and PTHRnnnnnn is the family you're working on
**# Add the new files to SVN
**#: '''svn add PTHR*'''
**# Commit the new files
**#: '''svn commit -m "new PAINT annotations"'''
** '''To save ONLY the notes file, PTHR*.txt''', for a family you are not annotating at this time:
**# Go to the directory where you've saved the new PAINT files
**#: '''cd <font color="red">[path]</font>/GO-paint/PTHRnnnnnn'''
**#: where [path] is wherever you saved the SVN directory and PTHRnnnnnn is the family you're working on
**# Add only the new *.txt file to SVN
**#: '''svn add PTHR*.txt'''
**# Commit the new file
**#: '''svn commit -m "new note for examined but unannotated Panther family"'''
===To commit notes for a family examined, but not annotated===
* After saving, when you're finished editing the PAINT annotations, check in '''only the PTHR*.txt''' file to the SVN repository
** Step one: go to the directory where you've saved the new PAINT files
**: '''cd <font color="red">[path]</font>/GO-paint/PTHRnnnnnn'''
**: where [path] is wherever you saved the SVN directory and PTHRnnnnnn is the family you're working on
** Step two: add only the new *.txt file to SVN
**: '''svn add PTHR*.txt'''
** Step three: commit the new files
**: '''svn commit -m "new note for examined but unannotated Panther family"'''
===Updating annotations for a family that was previously done===
* Before opening PAINT, update the SVN directory - use this command to update your repository to the latest version
** Step one: move to the directory above your "GO-paint/" directory.
**: '''cd <font color="red">[path]</font>/'''
** Step two: update the entire GO-paint directory.
**: '''svn update GO-paint/'''
* While using PAINT on family PTHRnnnnn, save to the appropriate directory
* After saving, when you're finished editing the PAINT annotations, check in to the SVN repository
** Step one: go to the directory where you've saved the new PAINT files
**: '''cd <font color="red">[path]</font>/GO-paint/PTHRnnnnnn'''
**: where [path] is wherever you saved the SVN directory and PTHRnnnnnn is the family you're working on
** Step two: commit the new files
**: '''svn commit -m "updated PAINT annotations"'''
===To remove PAINT annotations for a family (e.g. if you've made a mistake)===
* Step one: go to the directory where you've saved the new PAINT files
*: '''cd <font color="red">[path]</font>/GO-paint/PTHRnnnnnn'''
*: where [path] is wherever you saved the SVN directory and PTHRnnnnnn is the family you're working on
* Step two: remove the files from SVN
*: '''svn rm PTHRnnnnn*'''
* Step three: commit the updated files
*: '''svn commit -m "undoing"'''
You can webView SVN here -
http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/gene-associations/submission/paint/
GO SVN help page:
http://www.geneontology.org/GO.svn.help.shtml
== Reporting bugs and feature requests ==
* Please add bugs and feature requests to their respective trackers
** [https://sourceforge.net/tracker/?group_id=184610&atid=1126622 Paint Bugs]
** [https://sourceforge.net/tracker/?group_id=184610&atid=1126623 Paint Feature Requests]
* And for good measure fire off a message to the mailing list to get our attention
** pantherdb-paint-users @ lists.sourceforge.net
==Other useful links==
*To see existing annotations and notes:
**http://pantree.org/
*PAINT saved GAF file repository:
**http://www.geneontology.org/gene-associations/submission/paint/
*PAINT SVN instructions:
**
*Annotation tracker:
**http://amigo.berkeleybop.org/cgi-bin/amigo/phylotree


=Interpreting the PANTHER trees=
=Interpreting the PANTHER trees=
==Speciation and duplication events==
==Speciation and duplication events, and horizontal transfer==
In the tree, a speciation node is shown with a circle, and a gene duplication node with an square.
In the tree, a speciation node is shown with a circle, and a gene duplication node with an square.  Horizontal transfer events also appear in the tree, though more rarely, and these are represented with a diamond.


==Branch lengths==
==Branch lengths==
Line 364: Line 273:


==Known bugs==
==Known bugs==
===Errors in phylogenetic trees, PANTHER version 8.0===
===Errors in phylogenetic trees, PANTHER version 12.0===
Most often, the errors in phylogenetic trees are due to problems with the sequence alignment, or the specific MSA columns used to estimate the phylogeny.  The phylogeny inference program performs fairly robust handling of sequence fragments, but sequence fragments still cause errors.  Another source of error is when the sequences evolve very slowly, generating little variation from which to estimate phylogeny.  In this case, the errors can usually be fixed by including additional alignment positions to consider in the phylogeny.
Most often, the errors in phylogenetic trees are due to problems with the sequence alignment, or the specific MSA columns used to estimate the phylogeny.  The phylogeny inference program performs fairly robust handling of sequence fragments, but sequence fragments still cause errors.  Another source of error is when the sequences evolve very slowly, generating little variation from which to estimate phylogeny.  In this case, the errors can usually be fixed by including additional alignment positions to consider in the phylogeny.


===Bugs in version paint1.0_beta56===
=Reporting bugs or likely errors in the trees=
* Upon collapsing, the ancestral sequence at that node is not shown in the MSA
 
==Tree issues==
If a Panther tree needs to be reviewed, please create a ticket in the Panther GitHub tracker: https://github.com/pantherdb/Helpdesk/issues


==Reporting bugs or likely errors in the trees==
==PAINT issues==
Please email Paul directly at pdthomas@usc.edu.
Issues with the PAINT tools should be reported in this tracker: https://github.com/pantherdb/db-PAINT/issues


=Curation Guidelines=
=Curation Guidelines=


'''Those guidelines have been published (Gaudet, Livestone, Lewis, Thomas, 2011) [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3178059/?tool=pubmed]'''
'''Those guidelines have been published (Gaudet, Livestone, Lewis, Thomas, 2011) [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3178059/?tool=pubmed]'''
Please refer to this PDF file for curation guidelines: [[File:PAINT_demo_and_curation_guidelines_120710.pdf]]


Curation guidelines are described in detail on this page:
Curation guidelines are described in detail on this page:
http://wiki.geneontology.org/index.php/PAINT_SOP
http://wiki.geneontology.org/index.php/PAINT_SOP

Revision as of 17:52, 9 January 2020

Summary

PAINT is a Java application for viewing and annotating phylogenetic trees. This document briefly describes how to set up and use the tool.

File:Background slides on family and function evolution.pdf

Requirements

Java 1.8 (aka Java 8 on a Macintosh) must be installed.

Installing and configuring PAINT

PAINT is a Java application, and can be run on a wide variety of platforms. To install PAINT on either Mac or Windows, follow these steps:

  • Download the PAINT application at:

http://paintcuration.usc.edu/

Launching PAINT

  • On a Windows machine, run the program lauchPAINT.bat.
  • On a Mac, open a Unix terminal window, go to the directory containing the PAINT program, and execute the command:
sh launchPAINT.sh     OR      ./launchPAINT.sh

Using PAINT

Login

You are required to login before you can open a tree. The purpose is to record proper acknowledgement for all the curated annotations (of tree nodes) created by you.

Go to File -> Login.

If you just want to view the tree and annotations, you can enter 'gouser' as the username. The password is filled already. This is a read-only login.

If you want to curate the tree, enter your username and password. If you don’t have a login and password, send an email to huaiyumi@usc.edu and request one.

Curating a gene family

The analogy is to a library. You will first find and check out (lock) the families you want to curate, and then select a family to curate from your list of locked families. All families now have a curation status (curated, partially curated, uncurated).

Step 1: Find and "lock" the families you would like to curate

When you lock the family, other curators won’t be able to curate them. This is to prevent people from working on the same family.

  • Go to File -> Manage and View Books...
Fig. 1 PAINT family search box
    • A window, as shown in Figure 1, will pop up. You can search for families by following ways.:
      • Search by keywords in specified fields, such as Gene Symbol, Protein Identifier, Gene Identifier, or gene definition.
      • Enter a specific PTHR ID
      • Retrieve a list of all families, or just the uncurated families.
    • Press the "submit" button to launch search
Fig. 2 PAINT family search results
  • Select books to lock. Figure 2 shows an example when all uncurated families are returned. There are 4 possible curation status states:
    • Manually curated – These are the families curated, and the curator believes that the curation is complete.
    • Locked – Those families are locked by a curator. The name of the curator who locks the family is shown in the Locked by column.
    • Partially curated – These are the families that have been curated. The curator can unlock the family and leave it as partially curated.
    • Require paint review – The previously curated paint annotations are changed due to updates in either PANTHER and GO.
    • Unknown – These are uncurated families.
  • Check the box in the “Lock” column of the families you want to check out, and click “Lock or Unlock selected Books” button at the bottom of the panel.

Step 2: Open a family to curate

You can only curate families you have locked.

Fig. 3 Opening a previously locked family.
  • To open a family, click “View Locked Books”, and then click the “View” button (Figure 3).

Step 3: Save your annotations

You can choose to save but keep the family locked so you can continue the curation later. You can also save and unlock the family.

  • Go to File -> Save to Database. A window will pop up with the following options.
    • Cancel
    • Save and unlock – The family will be unlocked and marked as Partially Curated.
    • Save – The family will remain locked. The curator should do this as often as possible during the curation.
    • Save, unlock & set curated – The family will be marked as Manually Curated.

Appearance and Basic Operation

Windows

Fig. 1 Main paint window

PAINT is organized into three main panes (Figure 1):

  • The upper left pane shows a phylogenetic tree.
  • The upper right pane allows you to switch back and forth between (i) the Annotation Matrix; (ii) the Protein Information Table and (iii) a multiple sequence alignment (MSA) of all sequences.
  • The bottom pane contains two tabs: Annotations and Evidence.

All the tabbed panes may be resized or split out into windows.

  • Click on a tab (e.g., Protein Information, Evidence) to bring it to the front.
  • Click the icons in the tabs or the upper right corner to Undock/Dock, Minimize, Maximize, or close individual tabs or groups of tabs (red circle, Figure 1).
  • Tabs and panes may also be rearranged within a window by dragging.
  • Columns in the Protein Information Table can be resized.
  • Windows may be closed, arranged, or resized in the standard ways.

Phylogenetic Tree

A phylogenetic tree contains nodes and branches (Figure 1). There are three types of nodes, root, internal and leaf. Leaf nodes correspond to the proteins in the tree. Root and internal nodes represent the inferred most common ancestor of the descendants. Branch length may be interpreted as time estimates between the nodes.

Fig. 2 PAINT phylogenetic tree

The root and internal nodes of the tree are shown as circles (speciation events) and squares (gene duplication events). If the tree has been previously curated, the nodes maybe colored in indicate the type of annotation (e.g., with inferred or experimental evidence). More details will be described in the "Making an inferrence" section of this guide. The nodes are numbered in a defined order starting with the root, AN0. (“AN” = “Ancestor.”) Mouse over a node to see its identifier. If you right-click on a node, a menu will appear (Figure 2) with the options to

  • Collapse node - the entire clade is collapsed to a single node (rectangle). All the descendants are hidden, but the GO term assignments to them are still available for annotation. Right-click the node again and select "Expand node" to re-expand it.
  • Reroot to node - make the selected node and the root, and hide the rest of the tree. This is useful when the tree is too large. To bring back the entire tree again, use menu "Tree -> Reset Root to Main".
  • Export seq ids from leaves - the ids of all leave sequences descended from the node are exported to a text file
  • Prune -- All nodes descended from the node are removed from the tree.

The tree branches can be rescaled if they are too long for comfortable viewing or too short to distinguish individual nodes. The default branch scale is 50, which works for most trees. To rescale, select Tree->Scale... and enter a different number.

Proteins with experimental annotations (IDA, EXP, IMP, IGI, IPI, or IEP evidence codes) for a particular ontology are colored and shown in boldface (blue circles, Figure 1). You may select one ontology at a time to examine using the radio buttons (red arrow, Figure 1) at the top of the window. You may change the color used to indicate annotations using Edit → Curation status colors… . Descriptions here refer to the default colors.

Navigating within the tree

  • Click on a protein name in the tree to highlight the protein in the tree and the table.
  • Left-click on a node in the tree to highlight the entire clade descended from it.

Annotation matrix

Figure 3. PAINT Annotation matrix

Note: The colors refer to the default colors in PAINT

The annotation matrix gives an overview of the annotations associated with any proteins in table format. It displays one of the three Gene Ontologies at a time. You can switch to a different ontology by clicking the radio button on the upper left part of the window (red arrow, Figure 1). Mouse-over the downward triangle to see the GO term (yellow circle, Figure 1). The terms in the annotation matrix are grouped, with the most specific terms on the left. A few very broad terms such as “protein binding” are not shown, even though they are listed in the Annotations pane.

The matrix has a row for each gene/gene product in the tree, and a column for each GO term that is directly annotated to at least one gene/gene product in the tree.

  • Click on a protein in the tree and the corresponding row will be highlighted in the matrix.
  • The annotations of the corresponding proteins and GO terms in the matrix are shown in colored squares (Figure 3).
    • When you first open a tree, only the experimental annotations are shown. These are the annotations than can be used for annotating ancestral genes.
      • Experimental annotations are represented by green color. If it is a direct annotation (i.e. the actual annotation is to that exact term in that column of the matrix), there is a black dot in the middle of the green square. If it is an indirect annotation (i.e. the actual annotation is to a child of the term in that column of the matrix), there is a white dot in the middle of the square.
      • NOT annotations are indicated with by a red circle with a white X.
    • When you have annotated an ancestral node, inferred annotations are also shown in the matrix. This allows you to easily keep track of what you've already annotated.
      • Inferred annotations are represented by blue color, with either a black (direct) or white (indirect) dot in the center, or X for NOT as above.
  • Mouse-over an annotation square to see the tool tip of the protein name and the term.
  • Click on the annotation square to highlight the row. All the annotations to the protein, as well as the evidences and confidence codes will be displayed in the Annotation panel (see below for more details).
  • Right-click (or Command-click in Mac) on the experimental annotation (green square) in the matrix will automatically highlight the inferred most recent common ancestor (MRCA) node for the term.



Protein Information table

Figure 4. Protein information table

The phylogenetic tree is aligned with a protein information table showing additional information and linkouts to various databases (Figure 4). You can adjust the relative sizes of each within the window by dragging the line in the partition separating them. Note that the identifier table contains a lot of information that can be observed by scrolling to the right.

Navigating withing the Protein Information table

  • Click anywhere within a row in the table to highlight the protein in the tree and the table.
  • Click on one of the blue linkouts will open a link in your web browser.


Multiple sequence alignment (MSA)

Figure 5 Multiple Sequence Alignment view

The trees were estimated from an MSA (Figure 5). The evolutionarily conserved part of the alignment is indicated with uppercase letters. The other less conserved region is in lowercase letters. If a sequence misses a position in the matchstate, it is called a delete state and is designated by a dash. If a sequence needs to insert a position in the less conserved region in order to keep the match state region aligned, it is called an insert state and is designated by a dot.

The conserved columns are colored with dark blue, blue or light blue, which indicates the conservation of 80%, 60% or 40%, respectively, in the column.

Toggle back and forth between the table view (“Protein Information”) and the MSA view (“MSA”) using the buttons above the table/MSA panel.

Note: You can view the sequence of a hypothetical ancestral protein (node) by first collapsing the appropriate node.



Annotations window

Figure 6 The annotations window
Figure 7.

Click on a protein in the tree or table. Annotations associated with that protein appear in the Annotation pane (Figure 6). It contains five columns.

  • Term name - It lists the GO term name and accession. Click the term to link out to AmiGO. A term with a NOT annotation is shown in pink with strikethrough. If there is a green arrow in front of the term, it means "contribute_to" (Figure 6B).
  • Reference - A direct annotation to the term lists the reference (or experimental evidence) of the annotation. Click the reference to link out to the reference at PubMed or the appropriate model organism database. An inferred annotation lists the PAINT reference number (e.g.,PAINT_REF:0011409).
  • ECO/QUAL (Evidence code and qualifier) - This column shows the evidence code supporting the annotation and icons indicating any qualifiers, such as NOT (red arrow, Figure 6A), an experimental annotation (yellow arrow, Figure 6A), or an inferred annotation (arrange arrow, Figure 6A).
  • With - This column contains the evidence to support the inference. If a directly inferred node (orange color, Figure 9B) is selected, the column shows the sequence id(s) used for the inference that have experimental annotation (Figure 7A). When an individual protein is selected, it shows the ancestral node from which the inference is inherited (Figure 7B).
  • Undo - this is used to undo an inference made by PAINT (see Remove annotation section below).

Evidence window

The evidence window is a text editor used to record notes on the curation process. It is pre-seeded with a general overview of the PAINT process (GO_REF:0000033 on http://www.geneontology.org/cgi-bin/references.cgi). NOTE: The purpose of the annotation notes is to convey important points about the annotations and the phylogenetic tree both to other annotators and to users, so annotators should try to make the notes as clear as possible.

The annotator uses this Evidence to describe important points in the annotation process, including:

  • References used to annotate the family (for example, a few major reviews)
  • Any important points about the family topology, including potential inconsistencies in the tree
  • Reasons for annotating to a different node than the MRCA (most common recent ancestor), ie the node that triangulation of annotation identifies.

“Find” function

Figure 8

The Find function (Edit → Find…, Figure 8A) allows you to search for either a gene or a GO term. Select a gene or term search using the radio buttons (Figure 8B). Searches are case-insensitive.

A gene search matches against exact match of any text stored in the database, such as any sequence identifiers, gene symbol, or even gene name (red arrow, Figure 8C). The search does not return partial match (blue arrow, (Figure 8C). To do a partial match, wildcard character(s) (*) can be added before and/or after the search term. Scroll through the list of matches and click on a specific match to highlight it in the tree, table, and annotation matrix, and to display its annotations in the Annotations window.

You may search GO terms using text, or you may use numbers to search for GO IDs.

Recommended configuration for curation

  • Bigger is better. Use as much of the monitor as you can afford. If you are using a laptop, you may wish to attach an external monitor.
  • Adjust the width of the window and the partition between the Tree and the Table until you are comfortable with them.

Making an inference: Transferring annotations

Ancestral nodes in the tree can be annotated with any GO term that has been annotated to one (or more) of its descendants. These “inferred” annotations can be propagated to its other descendants.

Annotating an ancestral node, and propagating to descendants by inheritance

Figure 9

In the example shown in Figure 9, the GO term adenine deaminase activity in the 1st column of the Annotation Matrix (red arrow, Figure 9A) is annotated to three proteins. To annotate the ancestral node,

  1. right-click (or Command-click in Mac) a GO term (green square) from the Annotation Matrix (an inferred node will be highlighted in grey (blue arrow, Figure 9A);
  2. drag the term to the ancestral node you wish to annotate. This can be the inferred node or any other nodes. When you mouse over it, the node will turn dark. Release the mouse button to annotate. (Click here for a video demo of the procedure: http://youtu.be/8kHrdiuNfos.)
  3. the node is now annotated with that term using the evidence code “IBD” (“Inferred from Biological Descendant”).
  4. All descendants of the node will now be annotated with that term using the evidence code “IBA” (“Inferred from Biological Ancestry”) (Figure 9B). (Proteins and nodes already annotated with the term or one of its child terms will remain unchanged.)


Note: You may only annotate a node with a given GO term if AT LEAST ONE descendant has an annotation to that term or a child term. If a node does not turn dark (step 2), it cannot be annotated.

Removing an annotation

If you make a mistake or change your mind, you can remove an annotation.

  1. Click on the desired node. Nodes with inferred annotations are colored orange (Figure 9B).
  2. Go to the Annotation tab. To remove an annotation, click the trash can icon in the Undo column (Figure 7A).

"NOT" annotations

Note: In the context of PAINT, adding the NOT modifier indicates a belief that the specified function was present in an ancestral protein and has been LOST in the indicated protein or clade. This is a special case of existing GO guidelines for NOT, which state that a NOT annotation may be made in situations where a particular function is expected but not observed.

You may add a NOT modifier to an existing annotation if you feel the evidence warrants it for one of the following reasons, listed here in order of decreasing strength:

  • There are experimental annotations indicating that a function has been lost from one or more proteins. (Evidence code IBA = Inferred from Biological Ancestor)
  • Specific residues have been mutated at, for example, an enzyme’s catalytic site, and a specified function is no longer possible. (Evidence code IKR = Inferred from Known Residues)
  • A protein or clade has evolved rapidly, losing the original function and gaining a new one. This may be visible as a long branch in the tree, but the meaning of “long” varies by context, and a visibly long branch is not strictly required. (Evidence code IRD = Inferred from Rapid Divergence)

To add the NOT qualifier:

  1. Select a node or protein from the tree. This may be either a directly annotated node or one of its children.
  2. In the ECO/QUAL column of the Associations window, click on the “IAS.” A popup menu will appear.
  3. Under “NOT,” select which evidence code justifies the NOT qualifier.
  4. If the evidence code is IBA or IKR, all annotations to proteins and nodes descended from that node will have the NOT qualifier added (as these have good evidence for loss)
  5. If the evidence code is IRD, descendant sequences will not be annotated with the NOT qualifier, but the ancestral annotation will not be propagated to the descendants. Thus this acts like a STOP PROPAGATION.

You may remove a NOT qualifier by clicking the trash can in the Undo column. Note that you can only remove the qualifiers from the specific node to which it was made.

In addition to the annotation no longer propagating downward, a small hash mark will appear near the node in the tree to indicate that the block exists. Note that a hash mark only indicates the existence of at least one block, not that every annotation through that node is blocked.

Annotation with Qualifiers

If you propagate an annotation with a qualifier, e.g. "NOT", "contributes_to" (for MF only), or "colocalizes_with" (for CC only), you will get a pop-up window asking if you wish to also propagate the relevant qualifier(s). The default is No; to accept this option, click OK. To propagate the qualifier, tick the Yes button and click OK.

Partial annotation of trees

Figure 10 The RAB GTPase superfamily
]

When you want to annotate a very large family, e.g. the RAB GTPase superfamily (PTHR24073), it may not be feasible to annotate all clades at the same time. In this kind of situation, you may choose to annotate only the clades you are knowledgeable and confident of, and leave other clades unexamined. When you do this, you should fully annotate the clades you choose to annotate. For example, if you choose to do the IFT27 clade, do it fully. Please don't do piecemeal annotations in various locations that may make it hard for a subsequent annotator to understand what has been done.

We also agreed at the July 2014 PAINT Jamboree that you can make propagations all the way to the root if you feel that there is an ancestral role, even if you think that some clades have lost this. For example, in the RAB GTPase superfamily, we think that it had an ancestral function as a GTPase, but it is possible that some clades, e.g. the IFT22 clade, have lost this ancestral activity. You can make these high level propagations as part of your initial annotation of the family. If there are clades where this is wrong, perhaps the IBA annotation from PAINT will generate feedback that will help us correct it.

Recording partial annotation in the notes file

If you only partially annotate a tree, please record in the notes file which clades you have worked on using the node number, e.g. Eukaryota_PTN001180007 as well as a common name, e.g. IFT27, if it is helpful.

Special SVN commit message for indicating partial annotation of a tree

Instead of using the standard SVN commit message ("new PAINT annotations"), please commit with a special SVN log message: "new PAINT annotations for PARTIALLY annotated tree".

Recording trees examined, but not annotated

When you examine a tree and feel that it should not be annotated for some reason, please record that in the Evidence Notes so that we can track the fact that the family has been examined. Please use one of these tags (in all caps) in the Notes section of the Evidence tab. You can additional information after the tag if you wish (syntax between tag and additional info not discussed or determined). Then, save your annotations as normal so that PAINT will save the notes file.

  • MISSING ANNOTATION - Use this if the tree looks OK, but there are insufficient experimental annotations to propagate any annotations.
  • MISSING SEQUENCE - Use this if you feel that a specific sequence or sequences is missing. You can list the IDs of the sequence(s) after the tag.
  • BAD TREE - Use this if you feel that the tree has major problems beyond one or a few missing sequences.


Interpreting the PANTHER trees

Speciation and duplication events, and horizontal transfer

In the tree, a speciation node is shown with a circle, and a gene duplication node with an square. Horizontal transfer events also appear in the tree, though more rarely, and these are represented with a diamond.

Branch lengths

  • Branch lengths show the amount of sequence divergence that has occurred between a given node and its ancestral node, in terms of the average number of amino acid substitutions per site. Shorter branches indicate less sequence divergence and therefore greater conservation of ancestral characters. A branch might be shorter because of a slower evolutionary rate (greater negative selection), or because less "time" has gone by (actually a combination of number of generations and population dynamics), or both.
  • Very long branches indicate an unreliable divergence estimate, due to insufficient data. Note that sometimes there is not enough data to compare all branches that descend from a given node. In this case, we have set all descendant branches to a length of 2.0 (very long branches). Branch lengths of 2.0 are often due to a sequence fragment, and at a duplication node it may also indicate a gene that has been incorrectly broken into two different genes by a gene prediction program.
  • Following a gene duplication (after a square node), the relative branch lengths for descendant branches are particularly useful: the shortest branch (least diverged) is more likely to have greater functional conservation.

Multiple sequence alignment (MSA)

  • Some columns in the MSA have upper-case characters (and dashes '-' for insertions/deletions). These columns were used to estimate the phylogenetic tree.
  • Lower-case characters and periods (‘.’ for insertions/deletions) denote positions that were ignored when estimating the phylogenetic tree. Sometimes, tree errors arise because not enough columns were used, and the phylogeny could not be reconstructed well based on the included columns. Since they were not used in the phylogeny, lower-case characters can be particularly helpful in verifying the tree topology: any conserved insertions should be parsimoniously traceable to a common ancestor.

Known bugs

Errors in phylogenetic trees, PANTHER version 12.0

Most often, the errors in phylogenetic trees are due to problems with the sequence alignment, or the specific MSA columns used to estimate the phylogeny. The phylogeny inference program performs fairly robust handling of sequence fragments, but sequence fragments still cause errors. Another source of error is when the sequences evolve very slowly, generating little variation from which to estimate phylogeny. In this case, the errors can usually be fixed by including additional alignment positions to consider in the phylogeny.

Reporting bugs or likely errors in the trees

Tree issues

If a Panther tree needs to be reviewed, please create a ticket in the Panther GitHub tracker: https://github.com/pantherdb/Helpdesk/issues

PAINT issues

Issues with the PAINT tools should be reported in this tracker: https://github.com/pantherdb/db-PAINT/issues

Curation Guidelines

Those guidelines have been published (Gaudet, Livestone, Lewis, Thomas, 2011) [1]

Curation guidelines are described in detail on this page: http://wiki.geneontology.org/index.php/PAINT_SOP