Principles for creating a GO term: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
Line 170: Line 170:
{|
{|
|+Database cross-references used in GO
|+Database cross-references used in GO
|Ontology
!Ontology
!Database
!Database
!Sample dbxref
!Sample dbxref
|-
|-
| rowspan=4 |Function
| rowspan=4 valign="top" |'''Function'''
|Enzyme Commission||EC:3.5.1.6
|Enzyme Commission||EC:3.5.1.6
|-
|-
Line 183: Line 183:
|MetaCyc metabolic pathway database||MetaCyc:XXXX-RXN
|MetaCyc metabolic pathway database||MetaCyc:XXXX-RXN
|-
|-
|rowspan=2|Process
|rowspan=2 valign="top"|'''Process'''
|MetaCyc metabolic pathway database||MetaCyc:2ASDEG-PWY
|MetaCyc metabolic pathway database||MetaCyc:2ASDEG-PWY
|-
|-
|University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD)||UM-BBD_pathwayID:dcb
|University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD)||UM-BBD_pathwayID:dcb
|-
|-
|Component
|'''Component'''
|colspan=2|None
|colspan=2 align="center"|None
|}
|}


The GO database cross-references set is maintained by the BioMOBY project; please email the GO helpdesk to suggest any changes to this file.
The GO database cross-references set is maintained by the BioMOBY project; please email the GO helpdesk to suggest any changes to this file.


==Dependent ontology terms==
==Dependent ontology terms==

Revision as of 20:16, 25 May 2009

General Rules For GO Terms

As explained in An Introduction to GO, the purpose of GO is to define particular attributes of gene products. When adding a new term, ensure that it is a valid concept within the scope of GO. The following concepts should not be introduced as GO terms:

  • Gene products: e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as oxidoreductase activity, are.
  • Processes, functions or components that are unique to mutants or diseases: e.g. "oncogenesis" is not a valid GO term because causing cancer is not the normal function of any gene.
  • Attributes of sequence such as intron/exon parameters: these are not attributes of gene products and are described in a separate sequence ontology (see the Open Biomedical Ontologies website for more information).
  • Protein domains or structural features.
  • Protein-protein interactions.


Points For Style

The following stylistic points should be applied to all aspects of the ontologies.

Spelling conventions

Where there are differences in the accepted spelling between English and US usage, use the US form, e.g. polymerizing, signaling, rather than polymerising, signalling. There is a dictionary of words used in GO terms in the file GODict.DAT.

Abbreviations

Avoid abbreviations unless they're self-explanatory. Use full element names, not symbols. Use hydrogen for H+. Use copper and zinc rather than Cu and Zn. Use copper(II), copper(III), etc., rather than cuprous, cupric, etc.. For biomolecules, spell out the term in full wherever practical: use fibroblast growth factor, not FGF.

Greek symbols

Spell out Greek symbols in full: e.g. alpha, beta, gamma.

Upper vs. lower case

GO terms are all lower case except where demanded by context, e.g. DNA, not dna.

Singular vs. Plural

Use the singular form of the term, except where a term is only used in the plural (e.g. caveolae).

Be Descriptive

Aim to be reasonably descriptive, even at the risk of some verbal redundancy. Remember, databases that refer to GO terms might list only the finest-level terms associated with a particular gene product. If the parent is aromatic amino acid family biosynthesis, then the child should be aromatic amino acid family biosynthesis, anthranilate pathway, not just "anthranilate pathway".

Anatomical Qualifiers

Do not use anatomical qualifiers in the cellular process and molecular function ontologies. For example, GO has the molecular function term DNA-directed DNA polymerase activity but neither "nuclear DNA polymerase" nor "mitochondrial DNA polymerase". These terms with anatomical qualifiers are not necessary because annotators can use the cellular component ontology to attribute location to gene products, independently of process or function.

Gene Products

It is easy to confuse a gene product and its molecular function, because very often these are described in exactly the same words. For example, "alcohol dehydrogenase" can describe what you can put in an Eppendorf tube (the gene product) or it can describe the function of this stuff. There is, however, a formal difference: a single gene product might have several molecular functions, and many gene products can share a single molecular function. For example, there are many gene products that have the function alcohol dehydrogenase activity. Some, but by no means all, of these are encoded by genes with the name "alcohol dehydrogenase". A particular gene product might have both the functions alcohol dehydrogenase activity and acetaldehyde dismutase activity, and perhaps other functions as well. It's important to grasp that, whenever we use terms such as alcohol dehydrogenase activity in GO, we mean the function, not the entity; for this reason, most GO molecular function terms are appended with the word 'activity'.


Referring to Gene Products in Synonyms and Term Names

As noted above, GO terms do not represent gene products; GO term strings should avoid using gene product names if possible. In some cases, however, there are practical reasons for including gene product names, because biologists will search for them.

Gene product names can be used as synonyms for terms that do not name gene products in the primary text strings. Such synonyms are narrower than the terms. For some biological concepts, it would be awkward to use a wording that avoids mentioning a gene product name. In these cases, we use the word 'class' along with the gene product name, to indicate that the term is not restricted to the gene product named or to the species in which the gene product is found. An example is the class of cell cycle regulators known as p53:

term: DNA damage response, signal transduction by p53 class mediator ; GO:0030330
definition: A cascade of processes induced by the cell cycle regulator phosphoprotein p53,
                   or an equivalent protein, in response to the perception of DNA damage.


Term Definitions

Always define new terms

If you create a new term, or refine an existing term, you should add a definition for it, and note the references used in composing the definition.

Write definitions carefully

Definitions should explain clearly to the reader what is meant by a particular term. They should be concise, full sentences. They should begin with an upper-case letter and end with a period (full stop). Proofread your definitions carefully to eliminate typos and double spaces. The definition should be written at the same level of specificity as the term itself. It should also be consistent with the guidelines for the contents of each ontology. As with term names, avoid using abbreviations that may be ambiguous (e.g. "ER" can mean "endoplasmic reticulum" or "estrogen receptor").

Use Aristotelian definitions

Ideally, definitions should follow the genus-differentia ("Aristotelian") pattern: they should take the form of a genus (generic term, an is_a parent) and diffferentia (discriminating characteristics which mark instances of the specific term as being different from is_a sibling terms). (Note: cross-products can be represented using "intersection_of" tags in OBO files, and are a means of formally expressing the genus and differentiae; see the Logical Definitions documentation for more information.)

Database cross-references for definitions

If you define a term, you must document where your definition came from. If you use OBO-Edit, the software won't allow you to commit a definition without entering a cross-reference for it. Database cross-references have two parts, separated by a colon: an abbreviation for the database being cross-referenced (see the list of database cross-references used in GO) and the ID of the item in that database.

Definitions may be created by individual curators, groups of curators, community experts, or by consensus at meetings. For any such sources, use the database abbreviation 'GOC'. A list of curator cross-references currently in use is available; the guidelines for creating new dbxrefs are as follows:

  • If the definition comes from an individual curator's head, use the GOC and your initials in lower case as the ID; e.g. a definition written by Michael Ashburner has the dbxref GOC:ma.
  • For a definition created by a group of curators, use the database abbreviation with '_curators' appended; e.g. a definition written by several curators at TAIR has the dbxref GOC:TAIR_curators.
  • If an expert from the community has contributed to a definition, use the expert's initials following 'GOC:expert_'; e.g. a definition from John Pringle has the dbxref GOC:expert_jrp.
  • For definitions created at meetings, the dbxref has 'mtg_' followed by the meeting start date; e.g. definitions written at the June 2006 content meeting on CNS development have the dbxref GOC:mtg_15jun06.
  • If the definition comes from a book, use the ISBN; e.g. a dbxref to the Oxford Dictionary of Molecular Biology would be ISBN:0198506732. Hyphens should be removed from the ISBN.
  • If the definition comes from a paper, use the PubMed ID, e.g. PMID:11910864. If the paper doesn't have a PubMed ID, use another ID such as a DOI or model organism database ID.

Use of standard definitions

Wherever a 'standard' definition exists for a group of related terms, it should be used; please see the ontology guides for standard definitions used in each ontology. If you find yourself repeatedly using the same text string in a series of definitions, please send your standard definition to Amelia Ireland, who keeps an up-to-date version of the list of standard definitions.

Redefining terms

A GO ID is really associated with a definition rather than with the term name. If we change the wording but not the meaning of a term, the GO ID stays the same; a new meaning requires a new GO ID, even if the text string doesn't change. Here's a trivial example that illustrates when we do and don't change GO IDs:

Assume that we have a term mouse, GO ID GO:0000123, in an ontology; it is defined as a small furry mammal.

We decide to change the term wording to Mus musculus, keeping the definition the same. In this case we merely update the text; the GO ID stays the same because the meaning stays the same. We may choose to keep "mouse" as a synonym, but there would still only be one ID associated with the term.

We decide that the term "mouse" should instead mean a piece of computer equipment. In this case, the old term and ID are moved to the obsolete category, and "mouse", as newly defined, gets a new GO ID, GO:0000456. The old GO ID and definitions are saved for posterity in case we ever need to know what happened to them.

See the term obsoletion protocol for details on how to obsolete a term.


Comments

All terms have an optional comments field for adding extra information about an entry. The purpose of this is to help annotators, especially if you have obsoleted or redefined a term. Comments can be anything relevant to the term or term definition. If you write a comment, you must use the appropriate syntax.

To refer to other terms in the ontologies, use the format

comment: Also see '[term name] ; GO:0000000'.

To make any other comment, prefix it with the following:

comment: Note that [comment].

See also the comment syntax for obsoletions and term splits.


Synonyms

Often when terms are created, there are several words or phrases that could be used as the term name. In such cases, one form will be chosen as term name whilst the other possible names are added as synonyms. Despite the name, GO synonyms are not always 'synonymous' in the strictest sense of the word, as they do not always mean exactly the same as the term they are attached to. Instead, a GO synonym may be broader or narrower than the term string; it may be a related phrase; it may be alternative wording, spelling or use a different system of nomenclature; or it may be a true synonym. This flexibility allows GO synonyms to serve as valuable search aids, as well as being useful for applications such as text mining and semantic matching.

Having a single, broad relationship between a GO term and its synonyms is adequate for most search purposes, but for other applications such as semantic matching, the inclusion of a more formal relationship set is valuable. For this reason, GO records a relationship type for each synonym. These relationships are stored in the OBO format GO file.

Synonym scopes

The synonym relationship scopes are:

  • the term is an exact synonym
ornithine cycle is an exact synonym of urea cycle
  • the synonym is broader than the term name
cell division is a broad synonym of cytokinesis
  • the synonym is narrower or more precise than the term name
pyrimidine-dimer repair by photolyase is a narrow synonym of photoreactive repair
  • the terms are related
cytochrome bc1 complex is a related synonym of ubiquinol-cytochrome-c reductase activity; virulence is a related synonym of pathogenesis

The synonym scope related should be used where the relationship between a term and its synonym is NOT exact, narrower or broader.

In some cases, broader and narrower synonyms are created in the place of new parent or child terms because some synonym strings may not be valid GO terms but may still be useful for search purposes. For example, the string "respiration" is synonymous with both cellular respiration, the energy-generating metabolic processes of a cell, and respiratory gaseous exchange, or breathing; as its meaning is ambiguous, it is unsuitable for use as a GO term string, but we can add it as a broad synonym to both terms.

Adding synonyms

When you add a synonym using OBO-Edit, choose a scope from the pull-down selector (see the OBO-Edit user guide for more information). OBO-Edit will incorporate the synonym scope into the OBO format flat file when you save. The default synonym scope is 'related synonym', but this should be changed to a different scope if appropriate.

The number of synonyms for a term is not limited, and the same text string can be used as a synonym for more than one GO term.

Add synonyms if you edit a term name but the old name is still a valid synonym; for example, if you change "respiration" to "cellular respiration", keep "respiration" as a synonym. This helps other users to find familiar terms.

Add synonyms if the term has (or contains) a commonly used abbreviation. For example, FGF binding could be used as a synonym for fibroblast growth factor binding.

Do not add a synonym if the only difference is case (e.g. start vs. START). Synonyms, like term names, are all lower case except where demanded by context (e.g. DNA, not dna).

Rules For Synonyms

  • acronyms are exactly synonymous with the full name, as long as the acronym is not used in any other sense elsewhere
  • include implicit information when making a decision and take into account which ontology the term is in; e.g. an entry that ends in 'factor' is not synonymous with a molecular function
  • jargon type phrases are exactly synonymous with the full name, as long as the phrase is not used in any other sense elsewhere
  • proton is exactly synonymous with hydrogen where hydrogen refers to H+ (hydrogen ion); proton is not synonymous with H2 (hydrogen gas)
  • ligand is not exactly synonymous with binding (ligand is an entity, binding is an action)
  • x receptor ligand is not exactly synonymous with x (x is only one of the potential ligands so XXX receptor ligand is broader than x)
  • x complex is not exactly synonymous with x (x is ambiguous - could be describing the activity of x)
  • x transporter is broader than x porter, x symporter or x antiporter


Cross-referencing other databases

General database cross-references, or general dbxrefs, should be used where a GO term is identical to an object in another database. For more information on syntax, please refer to the GO File Format Guide and for a complete list of dbxrefs, see the database cross-references page.

Database cross-references used in GO
Ontology Database Sample dbxref
Function Enzyme Commission EC:3.5.1.6
Transport Protein Database TC:2.A.29.10.1
University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) UM-BBD_enzymeID:e0310
MetaCyc metabolic pathway database MetaCyc:XXXX-RXN
Process MetaCyc metabolic pathway database MetaCyc:2ASDEG-PWY
University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) UM-BBD_pathwayID:dcb
Component None

The GO database cross-references set is maintained by the BioMOBY project; please email the GO helpdesk to suggest any changes to this file.

Dependent ontology terms

Some GO terms imply the presence of others in the ontology. Examples from the process ontology include the following:

  • If either X biosynthesis or X catabolism exists, then the parent X metabolism must also exist.
  • If regulation of process X exists, then the process X must also exist. Potentially any process in the ontology can be regulated. Note: X may refer to a phenotype (for example cell size in regulation of cell size); in these cases, X should not be added to the ontology.