Ontology FAQ

From GO Public

Jump to: navigation, search

Contents

What is GO "content"?

GO content refers to the content of the ontologies themselves and the biology underlying it. It includes anything to do with terms and their organisation, definitions, synonyms and the relationships between terms.

What is an ontology?

Ontologies are 'specifications of a relational vocabulary'. In other words they are sets of defined terms like the sort that you would find in a dictionary, but the terms are given hierarchical relationships to one another. The terms in a given vocabulary are likely to be restricted to those used in a particular field or domain, and in the case of GO, the terms are all biological.

How can I suggest new GO terms?

The GO vocabularies are updated on a regular basis, and suggestions from the community for additional terms or for other improvements are very welcome. You can make and track your suggestions via the Curator Requests Tracker.

This system is very simple to use - please see the instructions on the GO website.

You can also submit your suggestions to the GO helpdesk.

Is there any way to convert downstream GO terms to a GO slim term?

There is a script, map2slim.pl, that does essentially this. It uses the GO MySQL database and Perl API, so you should familiarize yourself with those. The script is in the directory http://www.fruitfly.org/developers/src/go-dev/apps/query-utils/

Database and API documentation are available:

Web-based implementations of map2slim are also available; GO Term Mapper is a tool developed at Princeton University which can be used to map terms to their corresponding GO slim term for any species, while the SGD Gene Ontology Slim Mapper does the same for Saccharomyces cerevisiae data.

Can a term that is listed two places in an ontology file have children in one place but not the other?

No - the term will always have the same children wherever, and however many times it appears.

Can a term in one ontology have parents in one of the other two ontologies?

Yes - there are now links between the molecular function and biological process ontologies. See below.

Does the GO ID have any meaning?

The GO IDs are purely unique identifiers; they do not encode any information about a term or its position relative to other terms in the tree.

Why is there no definition for my GO ID?

This is because not all GO terms have definitions yet. Currently over 95% of terms are defined, and eventually all GO terms will have a definition.

If you would like to suggest a definition for an undefined term, please submit it to the requests tracker.

Why is the term Gene_Ontology now obsolete?

The former root node GO:0003673 Gene_Ontology is now obsolete because it did not represent an actual biological concept. It was originally created because some software -- including AmiGO -- relies on there being a root node, so the developers have now created an artificial node in the MySQL database called "all" that is the root of all possible concepts.

Where have the 'unknown' terms gone?

Good principles of ontological design state that terms should represent biological entities that actually exist, e.g., functional activities that are catalyzed by enzymes, biological processes that are carried out in cells, specific locations or complexes in cells, etc. To adhere to these principles the Gene Ontology Consortium has removed the terms, "biological process unknown" (GO:0000004), "molecular function unknown" (GO:0005554) and "cellular component unknown" (GO:0008372) from the ontology.

The "unknown" terms violated this principle of sound ontological design because they did not represent actual biological entities but instead represented annotation status. Annotations to "unknown" terms distinguished between genes that were curated when no information was available and genes that were not yet curated (i.e., not annotated). Annotation status is now indicated by annotating to the root nodes, i.e. "biological_process" (GO:0008150), "molecular_function" (GO:0003674), or "cellular_component" (GO:0005575). These annotations continue to signify that a given gene product is expected to have a molecular function, biological process, or cellular component, but that no information was available as of the date of annotation.

Adhering to principles of correct ontology design should allow GO users to take advantage of existing tools and reasoning methods developed by the ontological community.

How can I calculate the 'level' of a GO term?

GO terms do not occupy strict fixed levels in the hierarchy. Because GO is a Directed Acyclic Graph (DAG), terms can appear at different levels if different paths are followed through the DAG. This is especially true if one mixes is_a and part_of relations. Thus it is more proper to ask: "what is the maximum depth of such and such a term" (or minimum, average).

We do not pre-generate reports showing this. If you genuinely want this information you can perform SQL queries on our database to get it. See this example.

But you may want to reconsider whether you want this information at all! The (maximum) depth of a term may not be as informative as you think.

A more informative metric would be the information content of the node based on annotations. See, for example, the work of Alterovitz et al.

What do the cvs version numbers in the obo file mean?

In the header of the obo file you can see something like the following:

 remark: cvs version: $Revision: 1.293 $

Note that this in not a content version, it is the version of the particular file within the version control system. Do not use or cite this version, it varies depending on where you obtained the file

UPDATE: April 2009

All cvs version numbers in obo files in the GO CVS repository are now synchronized with the editor's version. However, they will NOT be synchronized in other cvs repositories.

We will soon move to include a data-version tag, as specified in obo-format 1.2

E.g.

 !data-version 1.1.293

The first number, is the major version number, and will most likely stay at "1" for a while. The next two numbers are the minor version derived from the editors cvs file

Are there now links between the function and process ontologies?

Yes - having successfully implemented the 'regulates' relationships in the biological process (BP) ontology, we have now added 'regulates' relationships within the molecular function (MF) ontology and between the BP and MF ontologies. These changes are only present in the file:

go/ontology/obo_format_1_2/gene_ontology_ext.obo

Specifically, we have made the implicit regulatory relationships between 'regulation of molecular function' BP terms and the corresponding MF terms explicit. For example:

   * regulation of kinase activity (BP) regulates kinase activity (MF) 

Similarly, we have made the implicit regulatory relationships between terms within the MF ontology explicit. For example:

   * calcium channel regulator activity (MF) regulates calcium channel activity (MF) 

The former are the first inter-ontology links in the GO vocabularies. Note that if software has been constructed with the assumption that there are no inter-ontology links, then this software may break when presented with these new inter-ontology links.

Adding these relationships improves the ability of the ontology to represent biology completely and accurately. The average GO user will benefit from these new links because they will be able to ask and answer more complex questions than they could previously. Users must understand what the different relationships mean and how the various GO tools utilize them.

The addition of these links also has major implications for tools that ignore relationship types when summarizing annotations. For example, it is important to understand whether a query will return all children of a term regardless of its relationship to the parent, or can discriminate between relationship types. If your tool of choice lumps annotations to 'calcium channel regulator activity' together with the regulates parent 'calcium channel activity', a query for calcium channels will also retrieve gene products that function as calcium channel regulators (and not necessarily as channels!). More sophisticated tools will allow users to customize queries to return results that better reflect their interests. For example, tools that are upgraded to take relationships into consideration will allow users to look for processes or functions, and specify whether to include or exclude their regulates children.

Can I download the ontologies as an Excel spreadsheet?

Sorry, no. The complex graph structure of GO, where terms can have one or more parent terms, means that it cannot be rendered as a spreadsheet. It would probably also be too big for Excel to cope with.

For more information see FP-regulates

Personal tools