AmiGO FAQ

From GO Wiki
Redirect page
Jump to navigation Jump to search

Redirect to:

How do I browse genes from all the different participating databases annotated to a particular term?

The GO Consortium has developed AmiGO for searching and browsing the Gene Ontology and the gene products that member databases have annotated using GO terms. Using AmiGO, you can search for one or more gene products and view its GO annotations. The results include the GO hierarchy for the term, definition and synonyms for the term, external links, and the complete set of gene product associations for the term and any of its children. AmiGO also allows you to filter your results if you wish to see only a subset of the data.

How do I find manually annotated gene products only, i.e. how do I sort by evidence code?

The GO Consortium has developed AmiGO for searching and browsing the Gene Ontology and the gene products that member databases have annotated using GO terms. Using AmiGO, you can search for one or more gene products and view its GO annotations. The results can be filtered so that only annotations using a user-defined set of evidence codes are shown. At present, AmiGO only uses manual annotations (it excludes all annotations with the evidence code IEA) but it will soon allow all annotation data to be shown.

Where can I view or download the complete sets of GO annotations?

As with the vocabularies, the gene product/GO association sets from contributing groups are available at the GO web site. Tab-delimited files of the associations between gene products and GO terms that are made by the member organizations are available from their individual FTP sites, from the GO FTP site (ftp://ftp.geneontology.org/pub/go/gene-association), or from a link on the Current Annotations table.

The gene association file format is described in the GO annotation guide. These files store IDs for objects (genes/gene products) in the database that contributed the file (e.g. FlyBase IDs, Swiss-Prot accession IDs for proteins) as well as citation and evidence data. There are also files containing Swiss-Prot/TrEMBL protein sequence identifiers for gene products that have been annotated using GO terms; they are available via FTP.

You can also download the annotations in mySQL format from GO Database Downloads. Note however that the mysql database dumps shouldn't be treated as flat files to be parsed directly. Rather, they are meant to be loaded into a mysql database and queried. There is an issue here is that the data in the Current Annotations table and the GO database do not agree on what constitutes a single "annotation". The former counts a line in a gene_association file as an annotation whereas the GO database (and the file you're looking at, if parsed directly) counts a geneproduct-GO association referenced by a publication as a single annotation, even if there is >1 evidence for this.

=Where can I find GO annotations of proteins and ESTs?

Gene objects in model organism databases typically have multiple nucleotide sequences from the public databases associated with them, including expressed sequence tags (ESTs) and one or more protein sequences. There are two ways to obtain sets of sequences with GO annotations:

  1. from the model organism databases
  2. from the annotation sets for transcripts and proteins contributed to the GO by Compugen and SWISS-PROT

Obtaining GO annotations for model organism sequence sets: In the gene association files, the GO terms are associated with an accession ID for a gene or gene product from the contributing data resource. Usually, the association files of the gene to sequenceIDs are also available from the contributing model organism database. For example, the Mouse Genome Informatics FTP site includes the gene association files contributed to the GO, and other reports that include official mouse gene symbols and names and all curated gene : sequence ID associations.

Obtaining GO annotations for transcript and proteins in general: Large transcript and protein sequence data sets are annotated to the GO by Compugen and SWISS-PROT/TrEMBL, respectively. These files can be downloaded direct from the GO web site. Species of origin for the sequence is included in the association files.

=How can I get FASTA files of proteins annotated to a particular GO term?

On the GO web site, select the link to the AmiGO browser (which will allow you to search the GO gene associations contributed by all the participating databases) and enter your chosen GO term (e.g 'mitochondrion') in the Search box. Toggle the 'Terms' button and click on 'Submit Query.' The resulting page will present a list of all Gene Product Associations to the queried term and its children. Note that associations may be filtered according to Species, Data Source, and Evidence Code as well as to only those gene products annotated directly to the queried term. Check the genes you require the sequence for and, at the bottom of the page, toggle the option box to 'Get FASTA sequences'. Hit the 'Submit Query' button. If you would like sequences for all of the gene products, click on the 'Select all' option.

How do I find all the human genes that have been annotated with a particular GO term?

GO terms have been associated with a non-redundant set of human proteins described in SWISS-PROT/TrEMBL/InterPro and Ensembl. These annotations are available in the GOA-Human file on the EBI and GO FTP sites.

GOA project data are also accessible from Ensembl and from the EMBL/DDBJ/GenBank nucleotide sequences stored at EMBL-Bank. For more information about browsing GOA project data at EBI, see the EBI's GOA page.

Is it possible to browse GO database using a GenBank accession number or gi number?

The GO database does not include GenBank accession numbers for annotated genes (or gene products), with the exception of an annotation dataset provided by Compugen, Inc. at ftp://ftp.geneontology.org/pub/go/gene-associations/gene_association.compugen.Genbank.gz and http://www.geneontology.org/doc/Compugen.README

For annotatians provided by the GO Annotations at EBI (GOA) project, a file of cross-references to database entries including GenBank/EMBL/DDBJ is available at ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/human.xrefs.gz

For some other annotation sets, there are files containing Swiss-Prot/TrEMBL protein sequence identifiers and model organism database IDs, available from ftp://ftp.geneontology.org/pub/go/gp2protein/

Can I search GO using Boolean operators?

Yes - you can perform this sort of search on the ontologies using the ontology editing tool OBO-Edit, which is developed by the GO Consortium. Full instructions for searching using OBO-Edit are available in the OBO-Edit help menu.

What are the recommended data access policies?

The GO Database server, http://www.godatabase.org, is a shared resource and thus we require data mining to be performed in a manner that allows others to utilize this resource at the same time. Any activity that mines the GO Database using AmiGO must be controlled so that only one request at a time. You may download and install the database locally. You can also retrieve all the source files that define the data within the database. Details on installing the database locally are available at http://www.godatabase.org/dev/database/

For more information please contact the GO helpdesk

What is the best way to obtain the GO annotations for a list of UniProt Accession Numbers in batch?

With UniProt accession numbers, you can obtain all GO annotations by parsing a GOA gene association file, which are provided in a simple 15 column tab-delimited format. These files are available from our ftp site, at ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/

The GOA project offers users a number of different files at this site so people can choose whether to look at the entire collection of GO annotations to proteins in UniProtKB: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gene_association.go_uniprot.gz

Or, if you were only interested in proteins from a particular species, we also provide non-redundant, species-specific files for human, mouse, rat, zebrafish, chicken, cow and Arabidopsis proteins (these files are created using the International Protein Index (IPI) - which provides a top level guide to the main databases that describe the proteomes of higher eukaryotic organisms) : ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/gene_association.goa_human.gz

Further information on the content and format of our gene association files is available from our ReadMe at http://www.ebi.ac.uk/GOA/goaHelp.html

Please contact GOA help for further assistance.

What is the best way to link into AmiGO?

AmiGO is under constant development and we suggest that you sometimes check back with this FAQ in order to get the most recent information.

That being said, we try to keep stable URLs that can be used as links on other sites. The URL formats will be:

Terms

The format to use for linking to GO term information is:

http://amigo.geneontology.org/cgi-bin/amigo/term-details?term=<GO ID>

For example:

http://amigo.geneontology.org/cgi-bin/amigo/term-details?term=GO:0043473

Please do not use the "session_id" argument in the linking URL.

Gene Products

The format to use for linking to GO gene product information is:

http://amigo.geneontology.org/cgi-bin/amigo/gp-details?gp=<DB>:<DB ID>

For example:

http://amigo.geneontology.org/cgi-bin/amigo/gp-details?gp=FB:FBgn0000015
http://amigo.geneontology.org/cgi-bin/amigo/gp-details?gp=MGI:MGI:1861998

Please do not use the "session_id" argument in the linking URL.

Current details

We are currently undergoing an infrastructure transition (which should be done in a week or so), and until that is complete, the URL formats are:

http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0043473
http://amigo.geneontology.org/cgi-bin/amigo/gp-details.cgi?gp=FB:FBgn0000015
http://amigo.geneontology.org/cgi-bin/amigo/gp-details.cgi?gp=MGI:MGI:1861998

Same thing, but with a ".cgi" extension.