Principles for creating a GO term

From GO Wiki
Revision as of 10:29, 5 April 2019 by Pascale (talk | contribs)
Jump to navigation Jump to search

General Rules For GO Terms

As explained in An Introduction to GO, the purpose of GO is to define particular attributes of gene products. When adding a new term, ensure that it is a valid concept within the scope of GO. The following concepts should not be introduced as GO terms:

  • Gene products: e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as oxidoreductase activity, are.
  • Processes, functions or components that are unique to mutants or diseases: e.g. "oncogenesis" is not a valid GO term because causing cancer is not the normal function of any gene.
  • Attributes of sequence such as intron/exon parameters: these are not attributes of gene products and are described in a separate sequence ontology (see the Open Biomedical Ontologies website for more information).
  • Protein domains or structural features.
  • Protein-protein interactions.


Points For Style

The following stylistic points should be applied to all aspects of the ontologies.

Spelling conventions

Where there are differences in the accepted spelling between English and US usage, use the US form, e.g. polymerizing, signaling, rather than polymerising, signalling. There is a dictionary of words used in GO terms in the file GODict.DAT.

Abbreviations

Avoid abbreviations unless they're self-explanatory. Use full element names, not symbols. Use hydrogen for H+. Use copper and zinc rather than Cu and Zn. Use copper(II), copper(III), etc., rather than cuprous, cupric, etc.. For biomolecules, spell out the term in full wherever practical: use fibroblast growth factor, not FGF.

Greek symbols

Spell out Greek symbols in full: e.g. alpha, beta, gamma.

Upper vs. lower case

GO terms are all lower case except where demanded by context, e.g. DNA, not dna.

Singular vs. Plural

Use the singular form of the term, except where a term is only used in the plural (e.g. caveolae).

Be Descriptive

Aim to be reasonably descriptive, even at the risk of some verbal redundancy. Remember, databases that refer to GO terms might list only the finest-level terms associated with a particular gene product. If the parent is aromatic amino acid family biosynthesis, then the child should be aromatic amino acid family biosynthesis, anthranilate pathway, not just "anthranilate pathway".

Anatomical Qualifiers

Do not use anatomical qualifiers in the cellular process and molecular function ontologies. For example, GO has the molecular function term DNA-directed DNA polymerase activity but neither "nuclear DNA polymerase" nor "mitochondrial DNA polymerase". These terms with anatomical qualifiers are not necessary because annotators can use the cellular component ontology to attribute location to gene products, independently of process or function.

Gene Products

It is easy to confuse a gene product and its molecular function, because very often these are described in exactly the same words. For example, "alcohol dehydrogenase" can describe what you can put in an Eppendorf tube (the gene product) or it can describe the function of this stuff. There is, however, a formal difference: a single gene product might have several molecular functions, and many gene products can share a single molecular function. For example, there are many gene products that have the function alcohol dehydrogenase activity. Some, but by no means all, of these are encoded by genes with the name "alcohol dehydrogenase". A particular gene product might have both the functions alcohol dehydrogenase activity and acetaldehyde dismutase activity, and perhaps other functions as well. It's important to grasp that, whenever we use terms such as alcohol dehydrogenase activity in GO, we mean the function, not the entity; for this reason, most GO molecular function terms are appended with the word 'activity'.


Referring to Gene Products in Synonyms and Term Names

As noted above, GO terms do not represent gene products; GO term strings should avoid using gene product names if possible. In some cases, however, there are practical reasons for including gene product names, because biologists will search for them.

Gene product names can be used as synonyms for terms that do not name gene products in the primary text strings. Such synonyms are narrower than the terms. For some biological concepts, it would be awkward to use a wording that avoids mentioning a gene product name. In these cases, we use the word 'class' along with the gene product name, to indicate that the term is not restricted to the gene product named or to the species in which the gene product is found. An example is the class of cell cycle regulators known as p53:

term: DNA damage response, signal transduction by p53 class mediator ; GO:0030330
definition: A cascade of processes induced by the cell cycle regulator phosphoprotein p53,
                   or an equivalent protein, in response to the perception of DNA damage.


TO BE MOVED 

Comments

All terms have an optional comments field for adding extra information about an entry. The purpose of this is to help annotators, especially if you have obsoleted or redefined a term. Comments can be anything relevant to the term or term definition. If you write a comment, you must use the appropriate syntax.

To refer to other terms in the ontologies, use the format

comment: Also see '[term name] ; GO:0000000'.

To make any other comment, prefix it with the following:

comment: Note that [comment].

See also the comment syntax for obsoletions and term splits.


Synonyms

Often when terms are created, there are several words or phrases that could be used as the term name. In such cases, one form will be chosen as term name whilst the other possible names are added as synonyms. Despite the name, GO synonyms are not always 'synonymous' in the strictest sense of the word, as they do not always mean exactly the same as the term they are attached to. Instead, a GO synonym may be broader or narrower than the term string; it may be a related phrase; it may be alternative wording, spelling or use a different system of nomenclature; or it may be a true synonym. This flexibility allows GO synonyms to serve as valuable search aids, as well as being useful for applications such as text mining and semantic matching.

Having a single, broad relationship between a GO term and its synonyms is adequate for most search purposes, but for other applications such as semantic matching, the inclusion of a more formal relationship set is valuable. For this reason, GO records a relationship type for each synonym. These relationships are stored in the OBO format GO file.

Synonym scopes

The synonym relationship scopes are:

  • the term is an exact synonym
ornithine cycle is an exact synonym of urea cycle
  • the synonym is broader than the term name
cell division is a broad synonym of cytokinesis
  • the synonym is narrower or more precise than the term name
pyrimidine-dimer repair by photolyase is a narrow synonym of photoreactive repair
  • the terms are related
cytochrome bc1 complex is a related synonym of ubiquinol-cytochrome-c reductase activity; virulence is a related synonym of pathogenesis

The synonym scope related should be used where the relationship between a term and its synonym is NOT exact, narrower or broader.

In some cases, broader and narrower synonyms are created in the place of new parent or child terms because some synonym strings may not be valid GO terms but may still be useful for search purposes. For example, the string "respiration" is synonymous with both cellular respiration, the energy-generating metabolic processes of a cell, and respiratory gaseous exchange, or breathing; as its meaning is ambiguous, it is unsuitable for use as a GO term string, but we can add it as a broad synonym to both terms.

Adding synonyms

When you add a synonym using OBO-Edit, choose a scope from the pull-down selector (see the OBO-Edit user guide for more information). OBO-Edit will incorporate the synonym scope into the OBO format flat file when you save. The default synonym scope is 'related synonym', but this should be changed to a different scope if appropriate.

The number of synonyms for a term is not limited, and the same text string can be used as a synonym for more than one GO term.

Add synonyms if you edit a term name but the old name is still a valid synonym; for example, if you change "respiration" to "cellular respiration", keep "respiration" as a synonym. This helps other users to find familiar terms.

Add synonyms if the term has (or contains) a commonly used abbreviation. For example, FGF binding could be used as a synonym for fibroblast growth factor binding.

Do not add a synonym if the only difference is case (e.g. start vs. START). Synonyms, like term names, are all lower case except where demanded by context (e.g. DNA, not dna).

Rules For Synonyms

  • acronyms are exactly synonymous with the full name, as long as the acronym is not used in any other sense elsewhere
  • include implicit information when making a decision and take into account which ontology the term is in; e.g. an entry that ends in 'factor' is not synonymous with a molecular function
  • jargon type phrases are exactly synonymous with the full name, as long as the phrase is not used in any other sense elsewhere
  • proton is exactly synonymous with hydrogen where hydrogen refers to H+ (hydrogen ion); proton is not synonymous with H2 (hydrogen gas)
  • ligand is not exactly synonymous with binding (ligand is an entity, binding is an action)
  • x receptor ligand is not exactly synonymous with x (x is only one of the potential ligands so XXX receptor ligand is broader than x)
  • x complex is not exactly synonymous with x (x is ambiguous - could be describing the activity of x)
  • x transporter is broader than x porter, x symporter or x antiporter


Cross-referencing other databases

General database cross-references, or general dbxrefs, should be used where a GO term is identical to an object in another database. For more information on syntax, please refer to the GO File Format Guide and for a complete list of dbxrefs, see the database cross-references page.

Database cross-references used in GO
Ontology Database Sample dbxref
Function Enzyme Commission EC:3.5.1.6
Transport Protein Database TC:2.A.29.10.1
University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) UM-BBD_enzymeID:e0310
MetaCyc metabolic pathway database MetaCyc:XXXX-RXN
Process MetaCyc metabolic pathway database MetaCyc:2ASDEG-PWY
University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) UM-BBD_pathwayID:dcb
Component None

The GO database cross-references set is maintained by the BioMOBY project; please email the GO helpdesk to suggest any changes to this file.

Dependent ontology terms

Some GO terms imply the presence of others in the ontology. Examples from the process ontology include the following:

  • If either X biosynthesis or X catabolism exists, then the parent X metabolism must also exist.
  • If regulation of process X exists, then the process X must also exist. Potentially any process in the ontology can be regulated. Note: X may refer to a phenotype (for example cell size in regulation of cell size); in these cases, X should not be added to the ontology.

Asserted relationships

In some cases, the reasoner will not infer all of the appropriate relationships between a term and other terms in the ontology or external ontologies. In these cases, it is necessary to manually assert those relationships. For example _part of_ relationships in developmental processes are not inferred and must be asserted. For example: 'GO:0021549 cerebellum development' and 'part of' some 'metencephalon development' are inferred manually.

Disjoint terms

Disjoint_Documentation

Review Status

Last reviewed:

Back to: Editing the Ontology