Annotation Extension: Capturing cell and tissue types

From GO Wiki
Revision as of 12:25, 15 May 2012 by Edimmer (talk | contribs)
Jump to navigation Jump to search

Introduction

This page provides guidelines for including celluar component, cell, tissue or anatomy type contextual information for a GO annotation in the annotation extension field (Column 16 of the GAF2.0 annotation file format).

This is a subset of the guidelines laid out in Annotation Extension guidance page.

Usefulness of capturing cell type or tissue type specific location of action

Highly-specific investigative methods such as laser capture microdissection are becoming more commonplace, allowing investigators to isolate individual cells from heterogeneous tissues and allowing for downstream genetic or proteomic analysis that is not contaminated by the surrounding tissue.

It is therefore important to be able to provide to users this contextual information on the processes and locations of gene products found to occur in specific cell types. This can be of interest, as for instance, organelles may have differing constituents depending on the cell type in which they are located.

Example

~105 distinct human proteins are thought to be primarily located in the peroxisome, however in any one cell or tissue type only ~50
different proteins are present, many having cell-type specific localization (reference).

Curator Usage notes

1. When including cellular component, cell, tissue or anatomy type information in the annotation extension field, no judgment is made as to whether a gene product is involved in a particular process in just a particular location. In other words, curators are simply annotating the available contextual spatial location data from a paper.

Therefore it is incorrect to assume that, for instance, a gene product used in a GO annotation that has a cell type identifier in the annotation extension field is involved in the curated process only in that annotated cell type. Similarly, it would be a mistake to conclude that lack of location information in the annotation extension field indicates that a given gene product is involved in a process in all the locations where the gene product is found.

The only correct interpretation of a GO annotation with a cell type identifier in the annotation extension field is that in one particular experiment a given gene product was found to be involved in a particular process in a particular cell.

2. Cell type location should not be inferred from investigations that use immortalized cell lines. Such cell lines should be treated as an experimental tool rather than an indication of the biological context of function. As the process of immortalization is known to involve multiple genetic changes a curator ensure they are confident that the studied process is carried out in the equivalent normal cell type.

=Specifying the subcellular location in which a process happens

Process terms can be further specified by subcellular location. For example: plastid translational elongation

At the time of writing this term is not declared in GO. Again we use the occurs_in relation:

 Col 5: GO:0006414 
 Col 16: occurs_in(GO:0009536)

Why, you might ask, can we not instead make two annotations to:

  • GO:0032544 ! plastid translation
  • GO:0006414 ! translational elongation

The answer is that co-annotation carries less information. Computationally we have no way of knowing these two processes are linked. See the FAQ

Note that the majority of the time, BP x CC cross-products should be pre-composed in the ontology. If the above scenario comes up, consider requesting a new term plastid translational elongation rather than using col 16.

Also note that when using a GO ID in col 16, a redundant annotation should sometimes be added. See #Guidelines

Appropriate relations for cell, tissue or anatomical structure annotation extensions

  • part_of Indicates a GO Cellular Component is part_of a specific cell type from a cell type ontology.
  • occurs_in - Indicates a GO Molecular Function or GO Biological Process occurs_in a specific cell type from a cell type ontology.

Example:

Q8IFM5 GO:0046789 host cell surface receptor binding occurs_in(CL:0000232)

Q9UHK6 GO:0008111 alpha-methylacyl-CoA racemase activity occurs_in(CL:0000057)

Q03426 GO:0004496 mevalonate kinase activity occurs_in(CL:0002620)|occurs_in(CL:0000542)

  • has_target_cell (this relationship may be obsloted; if a change occurs to the cell then use instead has_output or has_input?)


P21781 GO:0050679 positive regulation of epithelial cell proliferation has_target_cell(CL:0000083)

  • has_regulation_target

Q9ULC5-1 GO:0042981 regulation of apoptotic process has_regulation_target(CL:1000335)

  • has_target_cell

P21781 GO:0050679 positive regulation of epithelial cell proliferation has_target_cell(CL:0000083)

Using cell or tissue type ontologies to enhance Cellular Component annotations

Specifying that a gene product is located in a cellular component of a specific cell type or gross anatomical entity

For example: If a gene product is located to the mitochondrial membrane (GO:0031966) in a spermatocyte (CL:0000017):

 col 5: GO:0031966
 col 16: part_of(CL:0000017)

If a gene product is located to the cell hair (GO:0070451) of a plant root hair cell (PO:0000256):

 col 5: GO:0070451
 col 16: part_of(PO:0000256)

If a gene product is localized to a nucleus (GO:0005634) in the cerebellum (UBERON:0002037) in mouse:

 col 5: GO:0005634
 col 16: part_of(UBERON:0002037)

If a gene product is localized to the plasma membrane (GO:0005886) of epithelial cells (CL:0000017) in the lung (UBERON:0002048) in mouse:

 col 5: GO:0005886
 col 16: part_of(CL:0000017),part_of(UBERON:0002048)

If the same publication shows localization to the nucleus (GO:0005634) in both the cerebellum (UBERON:0002037) and spinal cord (UBERON:0002240):

  col 5: GO:0005634
  col 16: part_of(UBERON:0002037)|part_of(UBERON:0002240)

Use cases

1. Toll-like receptor 4 (TLR4) (O00206) is located intracellularly in the perinuclear region (GO:0048471 perinuclear region of cytoplasm) only in dendritic cells (CL:0000451), PMID:15027902

So the annotation would be;

DB (Col 2) Object (Col 3) GO ID (Col 5) Reference (Col 6) Extension (Col 16)
O00206 TLR4 GO:0048471 PMID:15027902 part_of(CL:0000451)


2. TLR4 is located on the cell surface (GO:0009986) in monocytes (CL:0000576), PMID:15027902

So the annotation would be;

DB (Col 2) Object (Col 3) GO ID (Col 5) Reference (Col 6) Extension (Col 16)
O00206 TLR4 GO:0009986 PMID:15027902 part_of(CL:0000576)

Using cell or tissue type ontologies to enhance Molecular Function and Biological Process annotations

Specifying that a gene product is involved in a process in a specific cell or tissue type

For example: If a gene product is involved in transcription (GO:0006350) in Purkinje cells (CL:0000121):

 col 5: GO:0006350
 col 16: occurs_in(CL:0000121)

or if a gene product is involved in gluconeogenesis (GO:0006094) in the liver (UBERON:0002107):

 col 5: GO:0006094
 col 16: occurs_in(UBERON:0002107)

Use cases

1. Human SLC22A5 (UniProtKB:O76082) is involved in quorum sensing involved in interaction with host (GO:0052106) in colonic epithelial cells (CL:0000066), PMID:18005709

So the annotation would be;

DB (Col 2) Object (Col 3) GO ID (Col 5) Reference (Col 6) Extension (Col 16)
O76082 SLC22A5 GO:0052106 PMID:18005709 occurs_in(CL:0000066)


2. Human Wnt7a (UniProtKB:O00755) is involved in positive regulation of epithelial cell proliferation involved in wound healing (GO:0060054) in corneal epithelial cells (CL:0000575), PMID:15802269

So the annotation would be;

DB (Col 2) Object (Col 3) GO ID (Col 5) Reference (Col 6) Extension (Col 16)
O00755 Wnt7a GO:0060054 PMID:15802269 occurs_in(CL:0000575)


Exception

One exception to using the occurs_in relationship for enhancing Biological Process annotations is when annotating a gene product to terms such as '<X> cell fate commitment'. The commitment actually occurs in a stem cell before 'X cell' forms. For example, an annotation to 'myoblast cell fate commitment' should not have the annotation extension: occurs_in(CL:0000056), which indicates that the commitment to become a myoblast cell is occuring in the myoblast cell (CL:0000056) as, in fact, it is occuring in a stem cell.

Multiple annotation extensions for cell type

The publication may describe the localization of a gene product in two or more distinct cell types.

For example: Theoretical gene 1234 is located in the mitochondrial membrane (GO:0031966) of Purkinje cells (CL:0000121) and bipolar neurons (CL:0000103), PMID:54321

So the annotation would be;

DB (Col 2) Object (Col 3) GO ID (Col 5) Reference (Col 6) Extension (Col 16)
1234 Theo GO:0031966 PMID:54321 part_of(CL:0000121)|part_of(CL:0000103)

N. B. No meaning is attached to the order of the cell type identifiers listed in column 16


Requesting new ontology terms for cell type

If the cell type term you require does not exist, you can make a request on the Cell Type Ontology SourceForge tracker or, for plant cell types, on the Plant Ontology SourceForge tracker.