From GO Wiki
Jump to: navigation, search

Warning: The contents of the FAQs that have been struck out have been moved to the new website.

If you do not find the answer to your question here, you can email the GO helpdesk.

Questions by Category


The GO database

Annotation FAQ

GOA (GO Annotation @ EBI)

Ontology and GO content related questions

Mappings to other classification systems

GO software and tools

File Formats

GO Consortium

Applications of GO

Getting involved in GO

Legal issues

General GO Questions

What is GO?

The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The GO collaborators are developing three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner. There are three separate aspects to this effort: first, we write and maintain the ontologies themselves; second, we make cross-links between the ontologies and the genes and gene products in the collaborating databases; and third, we develop tools that facilitate the creation, maintainence and use of ontologies.

The use of GO terms by several collaborating databases facilitates uniform queries across them. The controlled vocabularies are structured so that you can query them at different levels: for example, you can use GO to find all the gene products in the mouse genome that are involved in signal transduction, or you can zoom in on all the receptor tyrosine kinases. This structure also allows annotators to assign properties to gene products at different levels, depending on how much is known about a gene product.

Why do we need GO?

To ask meaningful questions, biologists often need to retrieve and analyse data from disparate sources. For example, if you were searching for new targets for antibiotics, you might want to find all the gene products that are involved in bacterial protein synthesis, but that have significantly different sequences or structures from those in humans. But if one database describes these molecules as being involved in 'translation', whereas another uses the phrase 'protein synthesis', it will be difficult for you - and even harder for a computer - to find functionally equivalent terms.

The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The GO collaborators are developing three ontologies - a word used by computer scientists to mean 'specifications of a relational vocabulary' - that describe biological processes, cellular components and molecular functions in a species-independent manner.

Ontologies provide a vocabulary for representing and communicating knowledge about a topic, and a set of relationships that hold among the terms of the vocabulary. They can be structurally very complex, or relatively simple. Most importantly, ontologies capture domain knowledge in a way that can easily be dealt with by a computer . Because the terms in an ontology and the relationships between the terms are carefully defined, the use of ontologies facilitates making standard annotations, improves computational queries, and can support the construction of inference statements from the information at hand.

Genomic sequencing projects and microarray experiments alike produce electronically-generated data flows that require computer accessible systems to work with the information. As systems that make domain knowledge available to both humans and computers, bio-ontologies such as GO and the many other bio-ontologies being created (see the OBO web page for some examples) for are essential to the process of extracting biological insight from enormous sets of data.

Which biological domains are supported by GO?

The current ontologies of the GO project are molecular function, biological process, and cellular component. These three areas are considered independent of each other. The ontologies are developed to include all terms falling into these domains without consideration of whether the biological attribute is restricted to certain taxonomic groups. Therefore, biological processes that occur only in plants (e.g. photosynthesis) or mammals (e.g. lactation) are included.

Other biological ontologies are discussed in the OBO web site.

Can I reason over GO?

It is possible to do some reasoning over GO now, and we expect to do much more in the future. We provide logical definitions or 'cross-products' for some terms as part of the extended GO file, which can be reasoned over. Our ontology editor OBO-Edit has a reasoner integrated.

For more information see the cross-product documentation and this 2011 J. Biomed. Informatics paper.

What is beyond the scope of the GO project?

Almost as important as understanding the scope of the GO project is understanding what the GO project is not. The most common misapprehensions are (1) that the GO is a system for naming genes and proteins and (2) that the GO attempts to describe all of biology. The GO neither names genes or gene products, nor attempts to provide structured vocabularies beyond its three domains: molecular function, biological process and cellular component.

GO is not a nomenclature for genes or gene products. The vocabularies describe molecular phenomena (e.g. programmed cell death), not biological objects (e.g. proteins or genes). Sharing gene product names would entail tracking evolutionary histories and reflecting both orthologous and paralogous relationships between gene products. Different research communities have different naming conventions. Different organisms have different numbers of members in gene families. The GO project focuses on the development of vocabularies to describe attributes of biological objects, not on the naming of the objects themselves. This point is particularly important to understand because many genes and gene products are named for their function.

How do I find GO annotations for 'my' genes?

The GO Consortium has developed AmiGO for searching and browsing the Gene Ontology and the gene products that member databases have annotated using GO terms. Using AmiGO, you can search for one or more gene products and view its GO annotations. More AmiGO questions...

Where can I view or download the complete sets of GO annotations?

As with the vocabularies, the gene product sets (gene association files) from contributing groups are freely available; you can download them from the annotation downloads section of the GO website. The files are in tab-delimited text; the file format is described in the GO annotation guide. Gene association files contain all evidence pertinent to the annotation, including database IDs and gene product names, as well as citation and evidence data.

Sometimes the number of GO annotations changes significantly over a short period of time. Why?

Most annotations in association files are electronically inferred (IEA). As with all types of annotations, IEAs change over time, with an overall increasing trend. However, in the specific case of IEAs, significant fluctuations in numbers may sometimes be observed over a short period of time. Nearly always, these are not due to bugs, but rather to the following reasons and/or to a combination thereof:

  • All IEA annotations that are over one year old are removed from association files. This is part of quality control procedures. Another procedure the GO started implementing in mid-2014 are taxonomic checks. A technical summary of annotation QC checks may be found here: http://geneontology.org/page/annotation-quality-control
  • Electronic annotations are provided to UniProt-GOA by various groups, including Ensembl, InterPro and UniProt. UniProt-GOA then includes these in their annotation files that they submit to the GO Consortium. There are numerous reasons why electronic annotations can fluctuate; e.g., InterPro may have changed a mapping that affected a large number of annotations; a mapping between a GO term and a UniProt keyword may have been added or removed; Ensembl may have changed their orthology sets; new quality checking procedures may have been introduced; a supplying group may have had a problem providing the annotations. Since electronic annotations tend to hit a large number of proteins, it is more likely to observe larger fluctuations than one would in a manual annotation set. UniProt-GOA aims to record all the known changes to the datasets they provide in the release notes here: http://www.ebi.ac.uk/GOA/news
  • Lastly, new genome assemblies for various species are periodically released, and that may contribute to changes in gene annotations.

However, if you think that an observed change in the size of an annotation file cannot be explained by any of the above, and suspect a bug, please contact us using the form here: http://geneontology.org/form/contact-go

How do I browse the GO?

The GO Consortium has developed AmiGO for searching and browsing the Gene Ontology and the gene products that member databases have annotated using GO terms. Browsing the GO tree or searching for a term allows you to see term information and the hierarchy for the term, cross-references to external databases, and the complete set of gene product associations for the term and any of its children. More AmiGO questions...

Other tools with GO browsing capabilities can be found on the GO tools page of the GO website.

How do I cite GO?

The GO database and vocabularies are in the public domain. The annotations provided by member organizations in the Current Annotations table are also in the public domain. There are no restrictions on their use, although third parties are asked to give appropriate acknowledgement to the GO Consortium and to the appropriate member organization(s). To reference the Gene Ontology Consortium, please cite this paper:

Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium (2000) Nature Genet. 25: 25-29 PDF

We also recommend that you include the date you scanned the GO information within your paper. The GO ontology, gene_associations, and documentation files have version numbers and dates, which may be used for this purpose. The GO is evolving and changes will occur with time.