Difference between revisions of "Phylogenetic Annotation Project"

From GO Wiki
Jump to: navigation, search
m
 
(231 intermediate revisions by 16 users not shown)
Line 1: Line 1:
=Goal of the Reference Genome Annotation Project=
+
''Note that this project was formerly called the Reference Genome Annotation Project.''
The GO consortium has established the complete annotation of 12 reference genomes as a priority goal. These reference genomes are:
 
  
<b><i>Arabidopsis thaliana</i>, <i>Caenorhabditis elegans</i>, <i>Danio rerio</i>, <i>Dictyostelium discoideum</i>, <i>Drosophila melanogaster</i>, <i>Escherichia coli</i>, <i>Gallus gallus</i>, <i>Homo sapiens</i>, <i>Mus musculus</i>,
+
=Overview=
<i>Rattus norvegicus</i>, <i>Saccharomyces cerevisiae</i>, <i>Schizosaccharomyces pombe</i></b>
+
The Phylogenetic Annotation Project performs annotation inferences across evolutionary related proteins based on known function of proteins within PANTHER [http://pantherdb.org/] phylogenetic family trees.
 +
 +
= PAINT (Phylogenetic Annotation and INference Tool) =
 +
PAINT is a Java software application for supporting inference of ancestral as well as present-day characters (represented by ontology terms) in the context of a phylogenetic tree.  PAINT is currently being used in the GO [[Phylogenetic Annotation Project]] to support inference of GO function terms (molecular function, cellular component and biological process) by homology.
  
The Reference Genome GO Annotation Team, with representatives from each genome annotation group, will coordinate annotation, facilitate implementation of GO Consortium annotation priorities, provide metrics to assess progress toward the goal of broad and deep annotation of the reference genomes. This group will be responsible for the coordination of the annotation of the twelve reference genomes. This group represents the annotation expertise within the GO consortium and provides key liaisons to the model organism databases the have primary responsibilities for the annotation of the reference genomes.
+
==Resources for PAINT annotation==
  
=== More information===
+
===[[PAINT annotation guidelines|PAINT Annotation principles]]===
====[[Reference Genome Annotation Project Summary]]====
+
The [[PAINT SOP|PAINT Annotation principles]] page describes the PAINT annotation guidelines.
* [http://geneontology.org/GO.refgenome.shtml?all Reference genome web page]
 
====[[Reference_Genome Contact Persons from each database]]====
 
----
 
  
=Progress Reports=
+
===[[PAINT_User_Guide|PAINT User Guide]]===
* October 2009 RefGen [[Reference_Genome_October_2009]]
+
The [[PAINT_User_Guide|PAINT User Guide]] provides annotation guidelines for PAINT annotation as well as step-by-step instructions on how to use the PAINT tool.
* September 2009 RefGen [[Reference_Genome_September_2009]]
 
* July 2009 RefGen [[RefGenProgress_2009-07]]
 
* June 2009 RefGen [[RefGenProgress_2009-06]]
 
* May 2009 RefGen [[RefGenProgress_2009-05]]
 
* April 2009 RefGen [[RefGenProgress_2009-04]]
 
* April 1, 2009 Third Reference Genome Annotation Meeting, Eugene, OR [[Oregon_Reference_Genomes_Meeting| Agenda and Minutes]]
 
* March 2009 RefGen [[RefGenProgress_2009-03]]
 
* February 2009 RefGen [[RefGenProgress_2009-02]]
 
* September 2008 RefGen [[RefGenProgress_2008-09-10]]
 
* August 2008 RefGen [[RefGenProgress_2008-08-13]]
 
* July 2008 RefGen [[RefGenProgress_2008-07-18]]
 
* June 2008 RefGen [[RefGenProgress_2008-06-18]]
 
* May 2008 RefGen [[RefGenProgress_2008-06-04]]
 
  
* April 20, 2008 Second Reference Genome Annotation Meeting, Salt Lake City, UT [[http://wiki.geneontology.org/index.php/Reference_Genome_Meeting_Minutes_April_2008 Minutes]]
+
===[http://pantree.org/tree/allTrees.jsp PAINT trees curation status]===
 +
The [http://pantree.org/tree/allTrees.jsp PAINT trees curation status] page provides a list of all Panther trees and the history of their curation status.
  
* Sept 27, 2007: First Reference Genome Annotation Meeting, Princeton, NJ [[http://wiki.geneontology.org/index.php/Reference_Genome_minutes Minutes]]
+
=[[PAINT_database_update_pipeline|PAINT Update pipeline]]=
  
----
+
=[[PAINT_GAF_production|PAINT GAF production]]=
 +
==[[PAINT GAF QC-examples|PAINT GAF QC-examples]]==
  
=Communication=
+
= [[PAINT Conference Calls]]=  
====[[Reference Genome Mailing list]]====
+
* Monthly, every First Tuesday of the month, 9 AM Pacific/6 PM Europe as of March 2019
 +
* Link to the zoom is in the Google Calendar
  
==== [[Conference Calls]]====
+
* [[PAINT_Conference_Calls]] Agendas and Minutes
  
====  [[Reference Genomes Meetings | Meetings]]====
 
  
====  [[Electronic_jamborees| Electronic jamborees ]]====
+
=Reporting bugs or likely errors in the trees=
----
 
  
=Annotation Targets=
+
==Tree issues==
===[[Panther gene lists]]===
+
If a Panther tree needs to be reviewed, please create a ticket in the Panther GitHub tracker: https://github.com/pantherdb/Helpdesk/issues
  
From May 2008
+
==PAINT issues==
===[http://spreadsheets.google.com/ccc?key=pZhlLFuj8ewDe799QTmxzCA&hl=en Target Gene List] (May 2008-) ===
+
Issues with the PAINT tools should be reported in this tracker: https://github.com/pantherdb/db-PAINT/issues
  
 +
==Pantree issues==
 +
Issues with the Pantree.org site should be reported at: https://github.com/pantherdb/PanTree
  
===[http://dcn.spreadsheets.google.com/ccc?id=o16926456948884040128.4584390909151853752.07000735126025259412.442372083524637957 Target Gene List August 2006-April 2008]===
+
=Pages to review=
 +
* http://wiki.geneontology.org/index.php/PAINT_annotation_working_group
 +
* [[reference proteomes files]]: to be moved elsewhere
 +
* Metrics: Discussion on annotation progress measurements
 +
**From 2017 Grant, suggestions for metrics:
 +
*** fraction of human proteins in annotated families (PAINT progress)
 +
*** impact: number of annotations added, for human and for other species
 +
** From a previous grant, see [[Image:HowToCaptureMetrics3.doc|thumb|Description]]
 +
** Other ideas (to be reviewed): [[Metrics:_breath_and_depth_of_annotations |Breath and Depth]]
 +
**** http://wiki.geneontology.org/index.php/GO_Reference_Genome_Meeting_Metric_Plan
  
* Access requires your email to be added to the system.  Email Pascale if you would like to be added.
+
=Archived & retired Pages=
* This spreadsheet contains links to separate spreadsheets maintained by each of the reference genome groups.
 
  
===[[Procedure for selection of target genes]]===
+
Those pages are kept as reference but the information in them is not the most current information.
 
+
* [[Reference Genome Mailing list]] - disabled
===[[Procedure for filling Genome-Specific spreadsheets]]===
+
* [[Electronic_jamborees| Electronic jamborees ]]
 
+
* [[Annotation_pipeline]] By Judy, Suzi, Michael
----
+
* [[Ideas for publicizing Ref.Genome Annotation Data]]
 
 
=Gene Annotation=
 
 
 
===[[GAFs for trees-based annotations]]===
 
 
 
===[[PAINT_SOP |Standard Operating Procedure for Tree-based propagation of annotations]]===
 
 
* [[PAINT-GONUTS integration]]
 
* [[PAINT-GONUTS integration]]
 +
* [[Reference Genome Annotation Project Summary]]
 +
* [[Progress_Reports#Reference_Genomes | Project timeline]]
 +
* [[Reference_Genome Contact Persons from each database]]
 +
* [[Reference Genome Progress Reports]]
 +
* [[Procedure for selection of target genes]]
 +
* [[Procedure for filling Genome-Specific spreadsheets]]
 +
* [[Tools_for_identifying_orthologs|Tools for orthology determination]]: A summary of tools available to identify orthologs.
 +
* [[Orthology discussion page|SOP for determining ortholog (by database)]]: The purpose of this page was to discuss the method by which each group establishes orthology between reference genome genes and human disease genes. We now collaborate with PANTHER to provide that. (Issues are different)
 +
* [[Ref_Gen_pub_draft | Reference Genome Web Page Draft]]: We now have a real web page!
 +
* [[List of potentially problematic families for all vs. all BLAST methods of orthology determination]]
 +
* [[Running P-POD orthology tool on the reference genomes gene set]] by Kara Dolinski at Princeton - Nov2007.
 +
* [[Reference_Genome_sequence_annotation]]: GFF3 sequence files for reference genome MODs
 +
* [[Reference Genome Database Requirements Discussion]]
 +
* [[Source_Forge_items_for_reference_genomes_(Retired)]]
 +
* [[Reference Genome Publication Counts]]
 +
* [[Review_of_trees-based_annotations_(Retired)]]
 +
* [[GAF file 2.0]] survey of contributing groups
 +
* [[RG:_Software|Reference Genome Software]] Plan to have some tracking system - supplanted with the db-version of Paint (2017)
 +
* [[Ref_genome_Annotation_progress_ideas_(Retired)]]
  
===[[Annotation_pipeline]]===
 
By Judy, Suzi, Michael
 
 
===Annotation Quality control===
 
* [[Annotation QC]]
 
* Annotation completion Source Forge tracker [[http://sourceforge.net/tracker/?group_id=36855&atid=1040173 http://sourceforge.net/tracker/?group_id=36855&atid=1040173]]
 
* [[Reference_Genome_Database_Reports]].  Those reports are generated with the GOOSE SQL interface and provide lists of potentially mis-annotated genes.
 
 
===Annotation Consistency Issues===
 
 
* [[Annotation consistency: IEA, ISS, IC Usage Discussion]]: Tanya Berardini, Emily Dimmer, Pascale Gaudet, David Hill, Chris Mungall, Kimberly VanAuken
 
* [[Annotation consistency: IDA or IC for processes]]: Tanya Berardini, Emily Dimmer, Pascale Gaudet, David Hill, Chris Mungall, Kimberly VanAuken, Ruth Lovering
 
* [[Annotation consistency: HTP]] Annotation of high throughput experiments, including microarray data  [[SGD GO HTP guidelines]] : Stacia Engel, Emily Dimmer, Val Wood, Ruth Lovering
 
* [[Annotation consistency: Using IEP]], including microarray data, and heat shock protein example: Emily Dimmer, Stacia Engel, Pascale Gaudet, Ruth Lovering, Varsha Khodiyar, Val Wood
 
* [[Annotation consistency: x protein binding and with]] : Becky Foulger, David Hill
 
* [[Annotation consistency: xx binding in the context of gene product]]: example is 'co-enzyme binding' from electronic jamboree : Pascale Gaudet, David Hill, Victoria Petri,
 
* [[Annotation consistency: 'Response to' terms]]: Check that all databases use evidence codes correctly for those terms. Tanya Berardini, Emily Dimmer, Pascale Gaudet, Ruth Lovering
 
 
====Improving GO terms and definitions====
 
* [[Annotation consistency: Clarification of oligomerization, dimerization, protein complex assembly]]: Debby Siegele
 
* [[Annotation consistency: chaperone activity definition]]: clarify the definitions of unfolded/misfolded protein binding and add chaperone activity as a synonym to both of the terms. Also add ‘de novo’ synonym to ‘unfolded protein binding’.  Victoria Petri
 
* [[Binding terms working group|Binding terms working group]]
 
 
===[[Misused terms]]===
 
This page provides a list of often misused terms and (hopefully) an explanation as to how to use the term properly. This information should also be included in the 'comments' of the OBO file.
 
 
===[[Variant_annotation]] ===
 
*This page describes how each database handles curation of multiple forms of the same gene
 
 
===[[Other taxa annotations | Providing annotations to GOA in taxa other than you MOD's]] ===
 
* Please follow these instructions if you encounter a gene not from your database that you need to annotate.
 
 
===[[Reference Genome Gene Index | Gene Annotation wiki pages]]===
 
* The purpose of these pages are to allow discussions of annotation and orthology issues related to particular genes.  The individual gene pages are to be created as needed.
 
 
----
 
 
=Annotation Progress=
 
== Lung Development Gene Annotation Progress ==
 
== Graphical views of the annotations: ==
 
=== [http://www.geneontology.org/images/RefGenomeGraphs/ Selected refG target sets] ===
 
* PPOD clusters selected since April 2008
 
* Manually curated target sets selected before April 2008
 
 
=== [http://proto.informatics.jax.org/prototypes/GOgraphEX/PPOD12_Graphs/ All PPOD clusters with at least one object from each of the twelve refG organisms] ===
 
 
==[[Reference Genomes Metrics]] | Metrics: Discussion on annotation progress measurements==
 
 
----
 
 
=Orthology determination=
 
 
====[[List of potentially problematic families for all vs. all BLAST methods of orthology determination]] ====
 
 
==Data used to make orthology calls==
 
 
====New [[gene2geneproduct file]]====
 
At the April 2009 Reference Genome meeting it was decided to create a new file to replace the GP2protein file, called 'gene2geneproduct'. Specifications can be found on this page (will be added soon).
 
 
====[[GAF file 2.0]]====
 
The GAF file should contain 17 columns, and the meaning of columns 2, 12 and 17 have been modified. See that page for specifications.
 
 
====Data used for [[Running P-POD orthology tool on the reference genomes gene set]] ====
 
by Kara Dolinski at Princeton - Nov2007
 
* This page contains a description of the project and the requirements for providing files for the P-POD analysis. 
 
 
====GFF3 sequence files for reference genome MODs====
 
[[Reference_Genome_sequence_annotation]]
 
----
 
 
=Software/database development=
 
 
*'''[[Reference Genome Database Requirements Discussion]]'''
 
*'''[[RG:_Software|Reference Genome Software]]'''
 
*'''[[RG_Software_group|Software group]]'''
 
*'''[[PAINT_SOP|PAINT]]'''
 
 
The purpose of this page is to discuss features and requirements that would be desirable in a database used to replace the existing Google Spreadsheet system for managing target genes, their annotations and metrics.
 
 
=Retired Pages=
 
Those pages are kept as reference but the information in them is not the most current information.
 
 
====[[Tools_for_identifying_orthologs|Tools for orthology determination]]====
 
A summary of tools available to identify orthologs.
 
  
==== [[Orthology discussion page|SOP for determining ortholog (by database)]]====
+
==Past Annotation targets==
  
* The purpose of this page was to discuss the method by which each group establishes orthology between reference genome genes and human disease genes.
+
* [[RefG annotation priorities]] of September 2009
We now collaborate with PANTHER and POPOD to provide that. (Issues are different)
+
*[[Lung_branching_morphogenesis_genes]] December 2009
 +
* [http://proto.informatics.jax.org/prototypes/GOgraphEX/PPOD12_Graphs/ All PPOD clusters with at least one object from each of the twelve refG organisms]
 +
*[http://spreadsheets.google.com/ccc?key=pZhlLFuj8ewDe799QTmxzCA&hl=en Target Gene List]: May 2008-Jan 2010
 +
*[[Tree annotation progress]] 2010-2011
 +
* [[RefG_Heart_Development_co-curation#Heart_Development_Transcription_Annotation_Targets]]: May- Sept 2011
 +
* [[Wnt_signaling_Pathway]] June-Sept 2010
 +
* [[Apoptosis Reference Genome Targets]] February-April 2011
 +
* [[PAINT_-_Apoptosis_(Archived)]]
 +
* [[PAINT - Apoptosis]] Nov 2013
 +
* DNA repair family list: http://goo.gl/BaQxMC 2014
 +
* http://dcn.spreadsheets.google.com/ccc?id=o16926456948884040128.4584390909151853752.07000735126025259412.442372083524637957
 +
Target Gene List August 2006-April 2008
 +
* [[Reference_Genome_Genes_(Retired)]]
 +
* [[PAINT_trees_to_review (Retired)]]
 +
== Review Status ==
  
==== [[Ref_Gen_pub_draft | Reference Genome Web Page Draft]]====
+
Last reviewed: 2021-07-01
* We now have a real web page!
+
[[Category:PAINT]]
----
 

Latest revision as of 22:50, 30 June 2021

Note that this project was formerly called the Reference Genome Annotation Project.

Overview

The Phylogenetic Annotation Project performs annotation inferences across evolutionary related proteins based on known function of proteins within PANTHER [1] phylogenetic family trees.

PAINT (Phylogenetic Annotation and INference Tool)

PAINT is a Java software application for supporting inference of ancestral as well as present-day characters (represented by ontology terms) in the context of a phylogenetic tree. PAINT is currently being used in the GO Phylogenetic Annotation Project to support inference of GO function terms (molecular function, cellular component and biological process) by homology.

Resources for PAINT annotation

PAINT Annotation principles

The PAINT Annotation principles page describes the PAINT annotation guidelines.

PAINT User Guide

The PAINT User Guide provides annotation guidelines for PAINT annotation as well as step-by-step instructions on how to use the PAINT tool.

PAINT trees curation status

The PAINT trees curation status page provides a list of all Panther trees and the history of their curation status.

PAINT Update pipeline

PAINT GAF production

PAINT GAF QC-examples

PAINT Conference Calls

  • Monthly, every First Tuesday of the month, 9 AM Pacific/6 PM Europe as of March 2019
  • Link to the zoom is in the Google Calendar


Reporting bugs or likely errors in the trees

Tree issues

If a Panther tree needs to be reviewed, please create a ticket in the Panther GitHub tracker: https://github.com/pantherdb/Helpdesk/issues

PAINT issues

Issues with the PAINT tools should be reported in this tracker: https://github.com/pantherdb/db-PAINT/issues

Pantree issues

Issues with the Pantree.org site should be reported at: https://github.com/pantherdb/PanTree

Pages to review

Archived & retired Pages

Those pages are kept as reference but the information in them is not the most current information.


Past Annotation targets

Target Gene List August 2006-April 2008

Review Status

Last reviewed: 2021-07-01