Beginning Annotation SOP: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 17: Line 17:
== SOP for starting annotation ==
== SOP for starting annotation ==


Look at the annotation tools on the GO website:
Automatic Annotation
 
A.  Automatic GO annotation tools
 
There are several GO-related annotation tools that have been developed by many groups.  Look at the annotation tools on the GO website:


http://www.geneontology.org/GO.tools.annotation.shtml
http://www.geneontology.org/GO.tools.annotation.shtml
Line 23: Line 27:
Please write to the GO-Friends mailing list if you have a specific annotation tool needs. All the tool developers are there and will help you to choose a good to tool, or may modify a tool to include the functionality that you need. Mail go-friends at geneontology.org.
Please write to the GO-Friends mailing list if you have a specific annotation tool needs. All the tool developers are there and will help you to choose a good to tool, or may modify a tool to include the functionality that you need. Mail go-friends at geneontology.org.


B.  Automatic annotation based on GO mapping files and GO-annotated protein datasets for those users with database infrastructure in place.


 
1) sequence-based methods
For those users with in-house database infrastructure in place.
 
'''A) Make more advanced electronic annotations using one or more of the following methods.'''
 
 
1) sequence-based


blast2go
blast2go
You may also like to try BLAST2GO to find GO annotations to sequences similar to  yours.
You may also like to try BLAST2GO to find GO annotations to sequences similar to  yours.
<br>
<br>
Line 41: Line 39:
http://www.godatabase.org/cgi-bin/gost/gost.cgi
http://www.godatabase.org/cgi-bin/gost/gost.cgi
<br>
<br>


2) Domain-based comparison methods
2) Domain-based comparison methods


Interproscan
Interproscan
Run your sequences through [[InterProScan]] either online or by downloading and running the application on your own computer. <br>
Run your sequences through [[InterProScan]] either online or by downloading and running the application on your own computer. <br>
http://www.ebi.ac.uk/InterProScan/
http://www.ebi.ac.uk/InterProScan/
Line 52: Line 48:
3) other
3) other


keyword2go
keyword2go - a mapping of Swiss-Prot keywords to GO
 
ec2go - a mapping of EC numbers to GO
ec2go
see additional mappings page on go website (link here)
 
see mappings page on go website (link here)


The free TIGR Annotation Engine Service for prokaryotic genomesThis service provides automatic annotation and database infrastructure to anyone with a prokaryotic DNA sequence they wish to annotate.
CAnnotation Services


1)  TIGR's Annotation Engine Service for prokaryotic genomes.  This free service provides automatic annotation and database infrastructure to anyone with a prokaryotic DNA sequence they wish to annotate.  www.tigr.org/AnnotationEngine


2)  GenDB


3) GOblet
D. GOblet
<br>
<br>
http://goblet.molgen.mpg.de/
http://goblet.molgen.mpg.de/
<br>
<br>


 
E.  If you do not have any database infrastructure you can use the public repositories.


'''A) Submit your sequences to one of the large public repositories.''' <br>
'''A) Submit your sequences to one of the large public repositories.''' <br>
Line 86: Line 82:




'''C) Manual Annotation'''
Manual Annotation


i)Check and improve your annotations against the literature.  
1.  Literature based manual annotation - Check and improve your annotations against the literature.  


1) Read the manual annotation guidelines on the GO Consortium website.<br>
1) Read the manual annotation guidelines on the GO Consortium website.<br>
Line 96: Line 92:
[insert description of camps and mentoring]
[insert description of camps and mentoring]


ii) Sequence based annotation.
2. Sequence based manual annotation.
 
The process of manual annotation based on sequence similarity involves the manual review of a host of sequence based search data including:  BLAST-type searches, domain based searches (InterPro,Pfam, TIGRFAMs, PROSITE, etc.), SignalP, TMHMM, paralagous families, COGs, etc.  The annotator evalutates this information by looking at alignments, scores, etc. while taking into consideration the genomic context of the gene product being annotated including neighboring genes, possible operons, syntenic regions, pathway and system resconstruction, etc.


== PAMGO example ==
== PAMGO example ==

Revision as of 13:36, 16 October 2006

This page is a place to build the SOP on beginning annotation.


These are the guidelines that the PIs gave me:

The initial step would be to create a document that outlines the annotation process. In addition, a case study, such as how the Chicken genome came to be annotated—what order events happened in, how the timing worked, what software was used, how they interacted with their GOA mentors, and so on—would be very useful.

As far as the documentation; you might start with this outline...

First, a brief statement about how the annotation process starts once the genes or gene products are defined (i.e. unique, stable IDs/ identifiers from UniProt or RefSeq are available for their sequences). Then, the document should include steps for doing GO annotations by various methods including automated methods such as InterProScan approach or by incorporating experimentally based annotations of orthologs; and curated methods such as assigning literature (experimentally) based GO annotations. The document should provide pointers to the existing documentation wherever possible. Thirdly, there should be information on the gene association file format and how to submit.

Once this documentation [essentially a 'standard operating procedure' not a detailed how-to] is defined, it can then be used to frame the inquiries of annotation groups and to support these groups in many contexts.


SOP for starting annotation

Automatic Annotation

A. Automatic GO annotation tools

There are several GO-related annotation tools that have been developed by many groups. Look at the annotation tools on the GO website:

http://www.geneontology.org/GO.tools.annotation.shtml

Please write to the GO-Friends mailing list if you have a specific annotation tool needs. All the tool developers are there and will help you to choose a good to tool, or may modify a tool to include the functionality that you need. Mail go-friends at geneontology.org.

B. Automatic annotation based on GO mapping files and GO-annotated protein datasets for those users with database infrastructure in place.

1) sequence-based methods

blast2go You may also like to try BLAST2GO to find GO annotations to sequences similar to yours.
http://www.geneontology.org/GO.tools.other.shtml#blast2go
Use GOst
http://www.godatabase.org/cgi-bin/gost/gost.cgi

2) Domain-based comparison methods

Interproscan Run your sequences through InterProScan either online or by downloading and running the application on your own computer.
http://www.ebi.ac.uk/InterProScan/

3) other

keyword2go - a mapping of Swiss-Prot keywords to GO ec2go - a mapping of EC numbers to GO see additional mappings page on go website (link here)

C. Annotation Services

1) TIGR's Annotation Engine Service for prokaryotic genomes. This free service provides automatic annotation and database infrastructure to anyone with a prokaryotic DNA sequence they wish to annotate. www.tigr.org/AnnotationEngine

2) GenDB

D. GOblet
http://goblet.molgen.mpg.de/

E. If you do not have any database infrastructure you can use the public repositories.

A) Submit your sequences to one of the large public repositories.
<a href="http://www.ebi.ac.uk/Submissions/"> EBI Submissions, including EMBL-bank </a>
<a href="http://www.ncbi.nlm.nih.gov/Genbank/submit.html">Genbank Submissions</a>
<a href="http://www.ddbj.nig.ac.jp/sub-e.html">DDBJ Submission</a>

(EMBL-Bank, GenBank and DDBJ exchange data amongst themselves so you can use any of these submission interfaces and have your data appear in all three resources.)

Once your sequences have been processed and passed along the pipeline to all the related databases you will be able to retrieve:

1) Unique stable identifiers such as UniProt or RefSeq for your sequences.

2) Some level of automatic GO annotation to your sequences. (Publically available.)
(I am hoping that I can insert a picture here that shows where sequences go in and where annotated sequences come out so that people can choose their favourite provider and be sure that they know what they are getting and what they are missing out on by choosing to download at that point.)


Manual Annotation

1. Literature based manual annotation - Check and improve your annotations against the literature.

1) Read the manual annotation guidelines on the GO Consortium website.
http://www.geneontology.org/GO.annotation.shtml

2) Contact the GO Consortium to ask about annotation camps and mentoring.
[insert description of camps and mentoring]

2. Sequence based manual annotation.

The process of manual annotation based on sequence similarity involves the manual review of a host of sequence based search data including: BLAST-type searches, domain based searches (InterPro,Pfam, TIGRFAMs, PROSITE, etc.), SignalP, TMHMM, paralagous families, COGs, etc. The annotator evalutates this information by looking at alignments, scores, etc. while taking into consideration the genomic context of the gene product being annotated including neighboring genes, possible operons, syntenic regions, pathway and system resconstruction, etc.

PAMGO example

Here is an example of how a new group has started working with the GO Consortium.

The Plant-Associated Microbe Gene Ontology (PAMGO) Group

In 2003 the genome sequence of the tomato pathogen Pseudomonas syringae pv. tomato DC3000 was published. This project was a collaboration between Robin Buell at the The Institute for Genomic Research (TIGR) and Alan Collmer of Cornell University. As part of the annotation of P. syringae TIGR provided some GO assignments to the P. syringae proteins. Dissussion between Alan Collmer and Brett Tyler at the NSF Plant Genome Research Program Awardees Meeting that fall revealed a shared awareness of the potential power of the GO and led to the formation of the Plant-Associated Microbe Gene Ontology (PAMGO) working group. Brett Tyler coordinated the effort to bring together PIs from genome projects representing the major groups of microbial pathogens: Bacteria, Fungi, Oomycetes, and Nematodes. The PAMGO group recognized the potential power of the GO to greatly facilitate research in areas common to all these pathogens by providing a robust framework for comparing functions across species. Since TIGR is a member of the GO consortium, the new PAMGO group entered into collaboration with TIGR staff Michelle Gwinn-Giglio and Linda Hannick to develop terms specific for interactions between pathogens and their hosts.

During 2004 the PAMGO Interest Group worked to develop high level terms to describe processes relevant to plant-microbe associations, which would provide a framework for the later development of more detailed terms. Candace Collmer (Wells College) while on sabbatical leave, and Michelle Gwinn-Giglio (TIGR) led the effort. This activity began with a full-day workshop on April 23, 2004 at TIGR of all the PAMGO participants. The workshop participants defined a set of high level terms and relationships that would be as general as possible, not only for pathogens of all kingdoms, but for the whole range of host-microbe interactions from mutualism to parasitism, and for all hosts, not only plants. Further refinement of the terms and their definitions occurred by email, and on June 2, 2004 the proposed terms were submitted to the GO community for discussion. On Aug 22-23, 2004 Candace and Alan Collmer and Michelle Gwinn-Giglio presented the proposal at a GO content meeting focused on pathogenesis, metabolism, and the cell cycle at the Carnegie Institution, Stanford, CA and on Oct. 15-16, 2004 Michelle Gwinn-Giglio presented three modified options to a GO Consortium Meeting in Chicago. These high level terms generated much debate, both at the original workshop, and within the wider GO community, because of the varied ways in which different communities use words such as "Symbiosis" and "Pathogenesis", and the difficulty of defining the term "Pathogenesis" consistently, given that some organisms may or may not cause disease depending on the physical environment and the physiological or genetic status of the host. This discussion highlighted the varied usage of these terms and stimulated user communities to think about how these terms should be used. A final version was agreed upon and resubmitted to GO on Dec 14, 2004 and made part of the active ontologies on Jan 31, 2005.

In addition to the term development activites in 2004, the PAMGO group was also busy writing a grant to the NSF/USDA Microbial Genome Sequencing Program to fund their GO development work. Fortunately, the grant was awarded and provides 3 years of funding (Fall 2005-Fall 2008) for PAMGO to continue the development of more granular terms under the initial PAMGO term set. The PAMGO group is now actively working on terms that will describe the myriad ways that pathogens affect the metabolism of their hosts. A PAMGO jamboree is scheduled for summer 2006 where it is expected that these additional terms will be entered into the active ontologies. Using the PAMGO terms, as well as the rest of the GO ontologies, PAMGO annotators are assigning GO terms to the proteins from the PAMGO organisms that have a role in interacting with their hosts. It is anticipated that PAMGO will begin sending in association files of these annotations at the end of this year. PAMGO people and pathogens:

Virginia Bioinformatics Institute

Phytophthora sojae (Oomycete)
Phytophthora ramorum (Oomycete)
Brett Tyler
Trudy Torto-Alalibo
Marcus Chibucos
Rays Jiang

Agrobacterium tumefaciens (Bacterium)
Joao Setubal
Joshua Shallom

Cornell University

Pseudomonas syringae pv. tomato DC3000 (Bacterium)
Pseudomonas syringae pv. phaseolicola 1448A (Bacterium)
Pseudomonas syringae pv. syringae B728A (Bacterium)
Alan Collmer
Magdalen Lindeberg
Candace Collmer (Wells College, September-May)

University of Wisconsin

Erwinia chrysanthemi 3937 (Bacterium)
Nicole Perna
Jeremy Glasner
North Carolina State University

Magnaporthe grisea (Fungus)
Meloidogyne hapla (Nematode)
Ralph Dean
David Bird
Thomas Mitchell
Shaowu Meng

The Institute for Genomic Research

Michelle Gwinn-Giglio
Linda Hannick
Robin Buell
Owen White