Principles for creating a GO term
- 1 General Rules For GO Terms
- 2 Naming conventions
- 3 Points For Style
- 4 Gene Products
- 5 Referring to Gene Products in Synonyms and Term Names
- 6 Dependent ontology terms
- 7 Asserted relationships
- 8 Disjoint terms
- 9 Review Status
General Rules For GO Terms
As explained in An Introduction to GO, the purpose of GO is to define particular attributes of gene products. When adding a new term, ensure that it is a valid concept within the scope of GO.
Out of scope
- Gene products: e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as oxidoreductase activity, are.
- Processes, functions or components that describe mutants or diseases: e.g. "oncogenesis" is not a valid GO term because causing cancer is not the normal function of any gene.
- Attributes of sequence such as intron/exon parameters: these are not attributes of gene products and are described in a separate Sequence Ontology (see the Open Biomedical Ontologies website for more information).
- Protein domains or structural features.
- Spontaneous reactions.
David: Do we want to cite this ?
- Naming conventions paper (2009)
Points For Style
The following stylistic points should be applied to all aspects of the ontologies.
Where there are differences in the accepted spelling between English and US usage, use the US form, e.g. polymerizing, signaling, rather than polymerising, signalling. There is a dictionary of words used in GO terms in the file GODict.DAT.
Avoid abbreviations unless they're self-explanatory. Use full element names, not symbols. Use hydrogen for H+. Use copper and zinc rather than Cu and Zn. Use copper(II), copper(III), etc., rather than cuprous, cupric, etc.. For biomolecules, spell out the term in full wherever practical: use fibroblast growth factor, not FGF.
Spell out Greek symbols in full: e.g. alpha, beta, gamma.
Upper vs. lower case
GO terms are all lower case except where demanded by context, e.g. DNA, not dna.
Singular vs. Plural
Use the singular form of the term, except where a term is only used in the plural (e.g. caveolae).
Aim to be reasonably descriptive, even at the risk of some verbal redundancy. Remember, databases that refer to GO terms might list only the finest-level terms associated with a particular gene product. If the parent is aromatic amino acid family biosynthesis, then the child should be aromatic amino acid family biosynthesis, anthranilate pathway, not just "anthranilate pathway".
Do not use anatomical qualifiers in the cellular process and molecular function ontologies. For example, GO has the molecular function term DNA-directed DNA polymerase activity but neither "nuclear DNA polymerase" nor "mitochondrial DNA polymerase". These terms with anatomical qualifiers are not necessary because annotators can use the cellular component ontology to attribute location to gene products, independently of process or function.
It is easy to confuse a gene product and its molecular function, because very often these are described in exactly the same words. For example, "alcohol dehydrogenase" can describe what you can put in an Eppendorf tube (the gene product) or it can describe the function of this stuff. There is, however, a formal difference: a single gene product might have several molecular functions, and many gene products can share a single molecular function. For example, there are many gene products that have the function alcohol dehydrogenase activity. Some, but by no means all, of these are encoded by genes with the name "alcohol dehydrogenase". A particular gene product might have both the functions alcohol dehydrogenase activity and acetaldehyde dismutase activity, and perhaps other functions as well. It's important to grasp that, whenever we use terms such as alcohol dehydrogenase activity in GO, we mean the function, not the entity; for this reason, most GO molecular function terms are appended with the word 'activity'.
Referring to Gene Products in Synonyms and Term Names
As noted above, GO terms do not represent gene products; GO term strings should avoid using gene product names if possible. In some cases, however, there are practical reasons for including gene product names, because biologists will search for them.
Gene product names can be used as synonyms for terms that do not name gene products in the primary text strings. Such synonyms are narrower than the terms. For some biological concepts, it would be awkward to use a wording that avoids mentioning a gene product name. In these cases, we use the word 'class' along with the gene product name, to indicate that the term is not restricted to the gene product named or to the species in which the gene product is found. An example is the class of cell cycle regulators known as p53:
term: DNA damage response, signal transduction by p53 class mediator ; GO:0030330 definition: A cascade of processes induced by the cell cycle regulator phosphoprotein p53, or an equivalent protein, in response to the perception of DNA damage.
Dependent ontology terms
Some GO terms imply the presence of others in the ontology. Examples from the process ontology include the following:
- If either X biosynthesis or X catabolism exists, then the parent X metabolism must also exist.
- If regulation of process X exists, then the process X must also exist. Potentially any process in the ontology can be regulated. Note: X may refer to a phenotype (for example cell size in regulation of cell size); in these cases, X should not be added to the ontology.
In some cases, the reasoner will not infer all of the appropriate relationships between a term and other terms in the ontology or external ontologies. In these cases, it is necessary to manually assert those relationships. For example _part of_ relationships in developmental processes are not inferred and must be asserted. For example:
'GO:0021549 cerebellum development' and
'part of' some 'metencephalon development' are inferred manually.
Last reviewed: Back to: Editing the Ontology