Principles for creating a GO term

From GO Wiki
Jump to navigation Jump to search

General Rules For GO Terms

The Gene Ontology (GO) describes our knowledge of the biological domain with respect to three aspects: Original source: GO website ontology documentation 

Molecular Function

Molecular-level activities performed by gene products. Molecular function terms describe activities that occur at the molecular level, such as “catalysis” or “transport”. GO molecular function terms represent activities rather than the entities (molecules or complexes) that perform the actions, and do not specify where, when, or in what context the action takes place. Molecular functions generally correspond to activities that can be performed by individual gene products (i.e. a protein or RNA), but some activities are performed by molecular complexes composed of multiple gene products. Examples of broad functional terms are catalytic activity and transporter activity; examples of narrower functional terms are adenylate cyclase activity or Toll-like receptor binding. To avoid confusion between gene product names and their molecular functions, GO molecular functions are often appended with the word “activity” (a protein kinase would have the GO molecular function protein kinase activity).

Cellular Component

The locations relative to cellular structures in which a gene product performs a function, either cellular compartments (e.g., mitochondrion), or stable macromolecular complexes of which they are parts (e.g., the ribosome). Unlike the other aspects of GO, cellular component classes refer not to processes but rather a cellular anatomy.

Biological Process

The larger processes, or ‘biological programs’ accomplished by multiple molecular activities. Examples of broad biological process terms are DNA repair or signal transduction. Examples of more specific terms are pyrimidine nucleobase biosynthetic process or glucose transmembrane transport. Note that a biological process is not equivalent to a pathway. At present, the GO does not try to represent the dynamics or dependencies that would be required to fully describe a pathway.

Naming conventions

  David: Do we want to cite this ? 

Points For Style

The following stylistic points should be applied to all aspects of the ontologies.

Spelling conventions

Where there are differences in the accepted spelling between English and US usage, use the US form, e.g. polymerizing, signaling, rather than polymerising, signalling. There is a dictionary of words used in GO terms in the file GODict.DAT.


Avoid abbreviations unless they're self-explanatory. Use full element names, not symbols. Use hydrogen for H+. Use copper and zinc rather than Cu and Zn. Use copper(II), copper(III), etc., rather than cuprous, cupric, etc.. For biomolecules, spell out the term in full wherever practical: use fibroblast growth factor, not FGF.

Greek symbols

Spell out Greek symbols in full: e.g. alpha, beta, gamma.

Upper vs. lower case

GO terms are all lower case except where demanded by context, e.g. DNA, not dna.

Singular vs. Plural

Use the singular form of the term, except where a term is only used in the plural (e.g. caveolae).

Be Descriptive

Aim to be reasonably descriptive, even at the risk of some verbal redundancy. Remember, databases that refer to GO terms might list only the finest-level terms associated with a particular gene product. If the parent is aromatic amino acid family biosynthesis, then the child should be aromatic amino acid family biosynthesis, anthranilate pathway, not just "anthranilate pathway".

Anatomical Qualifiers

Do not use anatomical qualifiers in the cellular process and molecular function ontologies. For example, GO has the molecular function term DNA-directed DNA polymerase activity but neither "nuclear DNA polymerase" nor "mitochondrial DNA polymerase". These terms with anatomical qualifiers are not necessary because annotators can use the cellular component ontology to attribute location to gene products, independently of process or function.

Gene Products

It is easy to confuse a gene product and its molecular function, because very often these are described in exactly the same words. For example, "alcohol dehydrogenase" can describe what you can put in an Eppendorf tube (the gene product) or it can describe the function of this stuff. There is, however, a formal difference: a single gene product might have several molecular functions, and many gene products can share a single molecular function. For example, there are many gene products that have the function alcohol dehydrogenase activity. Some, but by no means all, of these are encoded by genes with the name "alcohol dehydrogenase". A particular gene product might have both the functions alcohol dehydrogenase activity and acetaldehyde dismutase activity, and perhaps other functions as well. It's important to grasp that, whenever we use terms such as alcohol dehydrogenase activity in GO, we mean the function, not the entity; for this reason, most GO molecular function terms are appended with the word 'activity'.

Referring to Gene Products in Synonyms and Term Names

As noted above, GO terms do not represent gene products; GO term strings should avoid using gene product names if possible. In some cases, however, there are practical reasons for including gene product names, because biologists will search for them.

Gene product names can be used as synonyms for terms that do not name gene products in the primary text strings. Such synonyms are narrower than the terms. For some biological concepts, it would be awkward to use a wording that avoids mentioning a gene product name. In these cases, we use the word 'class' along with the gene product name, to indicate that the term is not restricted to the gene product named or to the species in which the gene product is found. An example is the class of cell cycle regulators known as p53:

term: DNA damage response, signal transduction by p53 class mediator ; GO:0030330
definition: A cascade of processes induced by the cell cycle regulator phosphoprotein p53,
                   or an equivalent protein, in response to the perception of DNA damage.

Dependent ontology terms

Some GO terms imply the presence of others in the ontology. Examples from the process ontology include the following:

  • If either X biosynthesis or X catabolism exists, then the parent X metabolism must also exist.
  • If regulation of process X exists, then the process X must also exist. Potentially any process in the ontology can be regulated. Note: X may refer to a phenotype (for example cell size in regulation of cell size); in these cases, X should not be added to the ontology.

Asserted relationships

In some cases, the reasoner will not infer all of the appropriate relationships between a term and other terms in the ontology or external ontologies. In these cases, it is necessary to manually assert those relationships. For example _part of_ relationships in developmental processes are not inferred and must be asserted. For example: 'GO:0021549 cerebellum development' and 'part of' some 'metencephalon development' are inferred manually.

Disjoint terms


Review Status

Last reviewed: Back to: Editing the Ontology