AmiGO Manual: FAQ

From GO Wiki
Jump to: navigation, search

How do I browse genes from all the different participating databases annotated to a particular term?

The GO Consortium has developed AmiGO for searching and browsing the Gene Ontology and the gene products that member databases have annotated using GO terms. Using AmiGO, you can search for one or more gene products and view its GO annotations. The results include the GO hierarchy for the term, definition and synonyms for the term, external links, and the complete set of gene product associations for the term and any of its children. AmiGO also allows you to filter your results if you wish to see only a subset of the data.

What data does AmiGO use? Are there IEAs? If so, which ones?

This subject is more fully dealt with in the AmiGO manual's overview.

How do I find manually annotated gene products only, i.e. how do I sort by evidence code?

Search results can be filtered so that only annotations using a user-defined set of evidence codes are shown. At present, AmiGO only uses manual annotations and a limited set of IEA annotations (see the overview for details).

Where can I view or download the complete sets of GO annotations?

Annotations can be either downloaded as part of the GO database or as tab-delimited flat files. File format information is linked from these pages.

Where can I find GO annotations of proteins and ESTs?

Gene objects in model organism databases typically have multiple nucleotide sequences from the public databases associated with them, including expressed sequence tags (ESTs) and one or more protein sequences. There are two ways to obtain sets of sequences with GO annotations:

  1. from the model organism databases
  2. from the annotation sets for transcripts and proteins contributed to the GO by Compugen and UniProt

Obtaining GO annotations for model organism sequence sets

In the gene association files, the GO terms are associated with an accession ID for a gene or gene product from the contributing data resource. Usually, the association files of the gene to sequence IDs are also available from the contributing model organism database. For example, the Mouse Genome Informatics FTP site includes the gene association files contributed to the GO, and other reports that include official mouse gene symbols and names and all curated gene : sequence ID associations.

Obtaining GO annotations for transcript and proteins in general

Large transcript and protein sequence data sets are annotated to the GO by Compugen and UniProt, respectively. These files can be downloaded direct from the GO web site. Species of origin for the sequence is included in the association files.

How can I get FASTA files of proteins annotated to a particular GO term?

On the GO web site, select the link to the AmiGO browser (which will allow you to search the GO gene associations contributed by all the participating databases) and enter your chosen GO term (e.g 'mitochondrion') in the Search box. Toggle the 'Terms' button and click on 'Submit Query.' The resulting page will present a list of all Gene Product Associations to the queried term and its children. Note that associations may be filtered according to Species, Data Source, and Evidence Code as well as to only those gene products annotated directly to the queried term. Check the genes you require the sequence for and, at the bottom of the page, toggle the option box to 'Get FASTA sequences'. Hit the 'Submit Query' button. If you would like sequences for all of the gene products, click on the 'Select all' option.

How do I find all the human genes that have been annotated with a particular GO term?

GO terms have been associated with a non-redundant set of human proteins described in SWISS-PROT/TrEMBL/InterPro and Ensembl. These annotations are available in the GOA-Human file on the EBI and GO FTP sites.

GOA project data are also accessible from Ensembl and from the EMBL/DDBJ/GenBank nucleotide sequences stored at EMBL-Bank. For more information about browsing GOA project data at EBI, see the EBI's GOA page.

What gene or protein IDs should I use?

The list of authoritative database groups for certain species lists the database groups who assume sole responsibility for collecting and submitting annotations for one or more species. If you can convert your IDs into the IDs used by that database group, you will be able to find the data you are looking for far more quickly and efficiently.

We maintain a list of suggested resources for mapping gene and protein IDs.

Is it possible to browse GO database using a GenBank accession number or gi number?

The GO database does not include GenBank accession numbers for annotated genes (or gene products), with the exception of an annotation dataset provided by Compugen, Inc. at ftp://ftp.geneontology.org/pub/go/gene-associations/gene_association.compugen.Genbank.gz and http://www.geneontology.org/doc/Compugen.README

For annotatians provided by the GO Annotations at EBI (GOA) project, a file of cross-references to database entries including GenBank/EMBL/DDBJ is available at ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/human.xrefs.gz

For some other annotation sets, there are files containing Swiss-Prot/TrEMBL protein sequence identifiers and model organism database IDs, available from ftp://ftp.geneontology.org/pub/go/gp2protein/

Can I search GO using Boolean operators?

Yes - you can perform this sort of search on the ontologies using the ontology editing tool OBO-Edit, which is developed by the GO Consortium. Full instructions for searching using OBO-Edit are available in the OBO-Edit help menu.

What are the recommended data access policies?

The GO database server is a shared resource and thus we require data mining to be performed in a manner that allows others to utilize this resource at the same time. Any activity that mines the GO database using AmiGO must be controlled so that only one request at a time. You may download and install the database locally. You can also retrieve all the source files that define the data within the database. More details on the database, including downloads and installation, can be found in the GO database guide.

For more information please contact the GO helpdesk

What is the best way to obtain the GO annotations for a list of UniProt Accession Numbers in batch?

With UniProt accession numbers, you can obtain all GO annotations by parsing a GOA gene association file, which are provided in a simple tab-delimited format. These files are available from the GOA ftp site.

The GOA project offers users a number of different files; for example:

  • all UniProtKB proteins with GO annotation
  • human proteins
  • if you were only interested in proteins from a particular species, we also provide non-redundant, species-specific files for human, mouse, rat, zebrafish, chicken, cow and Arabidopsis proteins (these files are created using the International Protein Index (IPI) - which provides a top level guide to the main databases that describe the proteomes of higher eukaryotic organisms)

Further information on the content and format of our gene association files can be found in the ReadMe

Please contact GOA help for further assistance.

What is the best way to link into AmiGO?

AmiGO is under constant development and we suggest that you sometimes check back for the most recent information (and code accordingly). That being said, please check the wiki pages on linking.

How do I install AmiGO locally?

Full documentation for downloading and installing AmiGO are available here.

Can I get citations from AmiGO?

Traffic citations: probably not; however, AmiGO is compatible with Zotero, a FireFox extension for managing references, allowing users to download information about the publications cited.