Annotation Extension: Capturing cell and tissue types: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
 
(43 intermediate revisions by 6 users not shown)
Line 1: Line 1:
== '''This page is under construction!''' ==
[[Category:Annotation extension]]
==Introduction==


This page describes the guidelines for using the [http://www.obofoundry.org/cgi-bin/detail.cgi?id=cell cell ontology] in Column 16 (Annotation Extension) of the Gene Association File. It is a subset of the guidelines laid out in [[Annotation_Cross_Products]]. The use of Column 16 will be incremental, cell type is the first vocabulary to be rolled out.
This page provides guidelines for including celluar component, cell, tissue, or anatomy contextual information as an annotation extension for a GO annotation.  


This is a subset of the guidelines laid out in [[Annotation Extension]] guidance page.


==Allowable relations for cell type annotation extensions==
===Usefulness of capturing cell type or tissue type specific location of action===


* [[part_of]] - Indicates a GO Cellular Component is part_of a specific cell type from the Cell Type (CL) Ontology.
Investigative methods that work solely with a specific tissue or cell type (such as laser capture microdissection) are becoming more commonplace, allow a downstream genetic or proteomic analysis that is not contaminated by surrounding tissue. In addition the separation of subcelluar particles via cell fractionation techniques enables the study of the constituents of a particular cell part/organelle.
* [[occurs_in]] - Indicates a GO Biological Process occurs_in a specific cell type from the Cell Type (CL) Ontology.
* [[response_to]] - Indicates a GO Molecular Function or GO Biological Process occurs in response to a specific cell type from the Cell Type (CL) Ontology.
* [[has_participant]] - Indicates a specific cell type participates in a GO Molecular Function or GO Biological Process. 'has_participant' is the parent of 'has_input' and 'has_output'.
** [[has_input]] - Indicates a specific cell type's presence is required for a GO Molecular Function or GO Biological Process.
** [[has_output]] - Indicates a specific cell type is affected by a GO Molecular Function or GO Biological Process.
N.B. If in doubt about which relation (has_input, has_output) to use, it is always possible to use the most generic relation, has_participant. Of course, this does not carry as much information but at least should be correct.


==Using the Cell Type Ontology to enhance Cellular Component annotations==
It is therefore important to be able to provide users with specific contextual information in annotation statements that describe the processes and locations of gene products found in such specific locations.


===Specifying that a gene product is located in a cellular component of a specific cell type===
==Aspects to consider when capturing localization context==
 
1. When including cellular component, cell, tissue or anatomy type information in the annotation extension field, no judgment is made as to whether the gene product is involved in the annotated function/process in just the location stated in the annotation extension field. In other words, curators are simply annotating the available contextual spatial location data from a paper.
 
Therefore it is incorrect to assume that, for instance, a gene product used in a GO annotation that has a cell type identifier in the annotation extension field is involved in the curated process only in that annotated cell type. Similarly, it would be a mistake to conclude that lack of  location information in the annotation extension field indicates that a given gene product is involved in a process in all the locations where the gene product is found.
 
The only correct interpretation of a GO annotation with a specific spatial location in the annotation extension field is that in one particular experiment a given gene product was found to be involved in a particular process in a particular location.
 
2. Cell type location should not be inferred from investigations that use immortalized cell lines. Such cell lines should be treated as an experimental tool rather than an indication of the biological context of function. As the process of immortalization is known to involve multiple genetic changes a curator ensure they are confident that the studied process is carried out in the equivalent normal cell type.
 
==Specifying the subcellular location in which a function or process occurs ==
 
Some terms in the Biological Process ontology already have subcellular localization specified; these pre-composed cross-product subclasses will have an is_a relationship to Biological Process parents and an occurs_in relation to a term in Cellular Component.  For example: ''GO:0070125; mitochondrial translational elongation''.
 
In many cases, it will be appropriate for a curator to request a specific Biological Process X Cellular Component subclass.
 
However, where a specific subclass of a function or process is not specified by its subcellular location, curators can use the annotation extension field to provide localization contextual information directly in the Biological Process or Molecular Function annotation.
 
''Example:''
 
  Col 5: GO:0006414
  Col 16: occurs_in(GO:0009536)
 
''Why not make two separate GO annotations instead?''
 
* GO:0032544 ! plastid translation
* GO:0006414 ! translational elongation
 
The answer is that co-annotation carries less information. Computationally we have no way of knowing these two processes are linked.
 
Also note that when using a GO ID in the annotation extension field, a redundant annotation should sometimes be added.  At present this additional annotation will need to be done manually, however guidelines are being determined to assist annotation pipelines to automatically determine when such an annotation should be created.
 
==Appropriate relations for adding cell, tissue or anatomical structure contextual information==
 
* [[part_of]] Indicates that a GO Cellular Component is part_of a specific cell, tissue, or anatomical structure.
 
* [[Annotation Extension Relation:occurs_in | occurs_in]] - Indicates a GO Molecular Function or GO Biological Process occurs_in a specific cell, tissue, or anatomical structure.
 
==Using cell or tissue type ontologies to enhance Cellular Component annotations==
 
 
===Specifying that a gene product is located in a cellular component of a specific cell type or gross anatomical entity===


For example:
For example:
Line 24: Line 61:
   col 16: part_of(CL:0000017)
   col 16: part_of(CL:0000017)


If a gene product is located to the cell hair (GO:0070451) of a plant root hair cell (PO:0000256):
  col 5: GO:0070451
  col 16: part_of(PO:0000256)
If a gene product is localized to a nucleus (GO:0005634) in the cerebellum (UBERON:0002037) in mouse:
  col 5: GO:0005634
  col 16: part_of(UBERON:0002037)
If a gene product is localized to the plasma membrane (GO:0005886) of epithelial cells (CL:0000017) in the lung (UBERON:0002048) in mouse:
  col 5: GO:0005886
  col 16: part_of(CL:0000017),part_of(UBERON:0002048)
If the same publication shows localization to the nucleus (GO:0005634) in both the cerebellum (UBERON:0002037) and spinal cord (UBERON:0002240):
  col 5: GO:0005634
  col 16: part_of(UBERON:0002037)|part_of(UBERON:0002240)


===Use cases===
===Use cases===
Line 36: Line 92:
!GO ID (Col 5)
!GO ID (Col 5)
!Reference (Col 6)
!Reference (Col 6)
!Extension (Col 17)
!Extension (Col 16)
|-
|-
|O00206  
|O00206  
Line 56: Line 112:
!GO ID (Col 5)
!GO ID (Col 5)
!Reference (Col 6)
!Reference (Col 6)
!Extension (Col 17)
!Extension (Col 16)
|-
|-
|O00206  
|O00206  
Line 66: Line 122:
|}
|}


==Using cell or tissue type ontologies to enhance Molecular Function and Biological Process annotations==


==Using the cell type ontology to enhance Molecular Function and Biological Process annotations==
===Specifying that a gene product is involved in a process in a specific cell or tissue type===
 
===Specifying that a gene product is involved in a process in a specific cell type===


For example:
For example:
Line 77: Line 132:
   col 16: occurs_in(CL:0000121)
   col 16: occurs_in(CL:0000121)


===Specifying that a gene product is involved in a process that also involves a specific cell type===
or if a gene product is involved in gluconeogenesis (GO:0006094) in the liver (UBERON:0002107):
 
For example:
If a gene product is involved in cell migration (GO:0001755 neural crest cell migration) of neural crest cells (CL:0000333):
 
  col 5: GO:0001755
  col 16: has_output(CL:0000333)
 
N.B. The relationship here is has_output because the effect on the cell is the result of the process.
 
or if a gene product is involved in antigen presentation (GO:0002457 T cell antigen processing and presentation) on a T cell (CL:0000084):
 
  col 5: GO:0002457
  col 16: has_input(CL:0000084)
 
N.B. The relationship here is has_input because the presence of the cell is necessary for the process to occur. If you are unsure whether the relationship should be has_input or has_output, you may use the generic has_participant which will be correct but less specific.
 
===Specifying that a gene product is involved in a process or function that occurs in response to a particular cell type===
 
For example:
If a gene product is involved in killing (GO:0070947 neutrophil mediated killing of fungus) a fungus (CL:0000521):


   col 5: GO:0070947
   col 5: GO:0006094
   col 16: response_to(CL:0000521)
   col 16: occurs_in(UBERON:0002107)


===Use cases===
===Use cases===


1.  Human SLC22A5 (O76082) is involved in quorum sensing involved in interaction with host (GO:0052106) in colonic epithelial cells (CL:0000066), PMID:18005709
1.  Human SLC22A5 (UniProtKB:O76082) is involved in quorum sensing involved in interaction with host (GO:0052106) in colonic epithelial cells (CL:0000066), PMID:18005709


So the annotation would be;
So the annotation would be;
Line 113: Line 148:
!GO ID (Col 5)
!GO ID (Col 5)
!Reference (Col 6)
!Reference (Col 6)
!Extension (Col 17)
!Extension (Col 16)
|-
|-
|O76082  
|O76082  
Line 124: Line 159:




2. Human angiopoietin-1 (Q15389) is involved in positive chemotaxis (GO:0050918) in blood vessel endothelial cells (CL:0000071), PMID:19424712
2. Human Wnt7a (UniProtKB:O00755) is involved in positive regulation of epithelial cell proliferation involved in wound healing (GO:0060054) in corneal epithelial cells (CL:0000575), PMID:15802269


So the annotation would be;
So the annotation would be;
Line 133: Line 168:
!GO ID (Col 5)
!GO ID (Col 5)
!Reference (Col 6)
!Reference (Col 6)
!Extension (Col 17)
!Extension (Col 16)
|-
|-
|Q15389
|O00755
|ANGPT1
|Wnt7a
|GO:0050918
|GO:0060054
|PMID:19424712
|PMID:15802269
|has_output(CL:0000071)
|occurs_in(CL:0000575)
|-
|-
|}
|}




3. Mouse Icam1 (MGI:96392) is involved in antigen processing and presentation (GO:0002457) on T lymphocytes (CL:0000084), PMID:2479693
===Exception===


So the annotation would be;
One exception to using the occurs_in relationship for enhancing Biological Process annotations is when annotating a gene product to terms such as '<X> cell fate commitment'. The commitment actually occurs in a stem cell before 'X cell' forms. For example, an annotation to 'myoblast cell fate commitment' should not have the annotation extension: occurs_in(CL:0000056), which indicates that the commitment to become a myoblast cell is occuring in the myoblast cell (CL:0000056) as, in fact, it is occuring in a stem cell.
 
{| class="wikitable" border="1"
!DB (Col 2)
!Object (Col 3)
!GO ID (Col 5)
!Reference (Col 6)
!Extension (Col 17)
|-
|MGI:96392
|Icam1
|GO:0002457
|PMID:2479693
|has_input(CL:0000084)
|-
|}
 
4. Mouse Elane (MGI:2679229) is involved in neutrophil mediated killing of fungus (GO:0070947; CL:0000521), PMID:11907569
 
So the annotation would be;
 
{| class="wikitable" border="1"
!DB (Col 2)
!Object (Col 3)
!GO ID (Col 5)
!Reference (Col 6)
!Extension (Col 17)
|-
|MGI:2679229
|Elane
|GO:0070947
|PMID:11907569
|response_to(CL:0000521)
|-
|}
 
 
5. Mouse Ncf1 (MGI:97283) is involved in neutrophil mediated killing of gram-positive bacterium (GO:0070946; CL:0000520), PMID:11907569
 
So the annotation would be;
 
{| class="wikitable" border="1"
!DB (Col 2)
!Object (Col 3)
!GO ID (Col 5)
!Reference (Col 6)
!Extension (Col 17)
|-
|MGI:97283
|Ncf1
|GO:0070946
|PMID:11907569
|response_to(CL:0000520)
|-
|}


==Multiple annotation extensions for cell type==
==Multiple annotation extensions for cell type==


The publication may describe the localization of a gene product in two or more distinct cell types
The publication may describe the localization of a gene product in two or more distinct cell types.


For example:
For example:
Line 216: Line 197:
!GO ID (Col 5)
!GO ID (Col 5)
!Reference (Col 6)
!Reference (Col 6)
!Extension (Col 17)
!Extension (Col 16)
|-
|-
|1234  
|1234  
Line 225: Line 206:
|-
|-
|}
|}
N. B. No meaning is attached to the order of the cell type identifiers listed in column 16




==Requesting new Cell Type Ontology terms==
==Requesting new ontology terms for cell type==


If the cell type term you require does not exist, you can make a request on the [http://sourceforge.net/tracker/?group_id=76834&atid=925065 Cell Type Ontology SourceForge tracker].
If the cell type term you require does not exist, you can make a request on the [http://sourceforge.net/tracker/?group_id=76834&atid=925065 Cell Type Ontology SourceForge tracker] or, for plant cell types, on the [http://sourceforge.net/tracker/?group_id=76834&atid=835555 Plant Ontology SourceForge tracker].

Latest revision as of 09:13, 12 October 2021

Introduction

This page provides guidelines for including celluar component, cell, tissue, or anatomy contextual information as an annotation extension for a GO annotation.

This is a subset of the guidelines laid out in Annotation Extension guidance page.

Usefulness of capturing cell type or tissue type specific location of action

Investigative methods that work solely with a specific tissue or cell type (such as laser capture microdissection) are becoming more commonplace, allow a downstream genetic or proteomic analysis that is not contaminated by surrounding tissue. In addition the separation of subcelluar particles via cell fractionation techniques enables the study of the constituents of a particular cell part/organelle.

It is therefore important to be able to provide users with specific contextual information in annotation statements that describe the processes and locations of gene products found in such specific locations.

Aspects to consider when capturing localization context

1. When including cellular component, cell, tissue or anatomy type information in the annotation extension field, no judgment is made as to whether the gene product is involved in the annotated function/process in just the location stated in the annotation extension field. In other words, curators are simply annotating the available contextual spatial location data from a paper.

Therefore it is incorrect to assume that, for instance, a gene product used in a GO annotation that has a cell type identifier in the annotation extension field is involved in the curated process only in that annotated cell type. Similarly, it would be a mistake to conclude that lack of location information in the annotation extension field indicates that a given gene product is involved in a process in all the locations where the gene product is found.

The only correct interpretation of a GO annotation with a specific spatial location in the annotation extension field is that in one particular experiment a given gene product was found to be involved in a particular process in a particular location.

2. Cell type location should not be inferred from investigations that use immortalized cell lines. Such cell lines should be treated as an experimental tool rather than an indication of the biological context of function. As the process of immortalization is known to involve multiple genetic changes a curator ensure they are confident that the studied process is carried out in the equivalent normal cell type.

Specifying the subcellular location in which a function or process occurs

Some terms in the Biological Process ontology already have subcellular localization specified; these pre-composed cross-product subclasses will have an is_a relationship to Biological Process parents and an occurs_in relation to a term in Cellular Component. For example: GO:0070125; mitochondrial translational elongation.

In many cases, it will be appropriate for a curator to request a specific Biological Process X Cellular Component subclass.

However, where a specific subclass of a function or process is not specified by its subcellular location, curators can use the annotation extension field to provide localization contextual information directly in the Biological Process or Molecular Function annotation.

Example:

 Col 5: GO:0006414 
 Col 16: occurs_in(GO:0009536)

Why not make two separate GO annotations instead?

  • GO:0032544 ! plastid translation
  • GO:0006414 ! translational elongation

The answer is that co-annotation carries less information. Computationally we have no way of knowing these two processes are linked.

Also note that when using a GO ID in the annotation extension field, a redundant annotation should sometimes be added. At present this additional annotation will need to be done manually, however guidelines are being determined to assist annotation pipelines to automatically determine when such an annotation should be created.

Appropriate relations for adding cell, tissue or anatomical structure contextual information

  • part_of Indicates that a GO Cellular Component is part_of a specific cell, tissue, or anatomical structure.
  • occurs_in - Indicates a GO Molecular Function or GO Biological Process occurs_in a specific cell, tissue, or anatomical structure.

Using cell or tissue type ontologies to enhance Cellular Component annotations

Specifying that a gene product is located in a cellular component of a specific cell type or gross anatomical entity

For example: If a gene product is located to the mitochondrial membrane (GO:0031966) in a spermatocyte (CL:0000017):

 col 5: GO:0031966
 col 16: part_of(CL:0000017)

If a gene product is located to the cell hair (GO:0070451) of a plant root hair cell (PO:0000256):

 col 5: GO:0070451
 col 16: part_of(PO:0000256)

If a gene product is localized to a nucleus (GO:0005634) in the cerebellum (UBERON:0002037) in mouse:

 col 5: GO:0005634
 col 16: part_of(UBERON:0002037)

If a gene product is localized to the plasma membrane (GO:0005886) of epithelial cells (CL:0000017) in the lung (UBERON:0002048) in mouse:

 col 5: GO:0005886
 col 16: part_of(CL:0000017),part_of(UBERON:0002048)

If the same publication shows localization to the nucleus (GO:0005634) in both the cerebellum (UBERON:0002037) and spinal cord (UBERON:0002240):

  col 5: GO:0005634
  col 16: part_of(UBERON:0002037)|part_of(UBERON:0002240)

Use cases

1. Toll-like receptor 4 (TLR4) (O00206) is located intracellularly in the perinuclear region (GO:0048471 perinuclear region of cytoplasm) only in dendritic cells (CL:0000451), PMID:15027902

So the annotation would be;

DB (Col 2) Object (Col 3) GO ID (Col 5) Reference (Col 6) Extension (Col 16)
O00206 TLR4 GO:0048471 PMID:15027902 part_of(CL:0000451)


2. TLR4 is located on the cell surface (GO:0009986) in monocytes (CL:0000576), PMID:15027902

So the annotation would be;

DB (Col 2) Object (Col 3) GO ID (Col 5) Reference (Col 6) Extension (Col 16)
O00206 TLR4 GO:0009986 PMID:15027902 part_of(CL:0000576)

Using cell or tissue type ontologies to enhance Molecular Function and Biological Process annotations

Specifying that a gene product is involved in a process in a specific cell or tissue type

For example: If a gene product is involved in transcription (GO:0006350) in Purkinje cells (CL:0000121):

 col 5: GO:0006350
 col 16: occurs_in(CL:0000121)

or if a gene product is involved in gluconeogenesis (GO:0006094) in the liver (UBERON:0002107):

 col 5: GO:0006094
 col 16: occurs_in(UBERON:0002107)

Use cases

1. Human SLC22A5 (UniProtKB:O76082) is involved in quorum sensing involved in interaction with host (GO:0052106) in colonic epithelial cells (CL:0000066), PMID:18005709

So the annotation would be;

DB (Col 2) Object (Col 3) GO ID (Col 5) Reference (Col 6) Extension (Col 16)
O76082 SLC22A5 GO:0052106 PMID:18005709 occurs_in(CL:0000066)


2. Human Wnt7a (UniProtKB:O00755) is involved in positive regulation of epithelial cell proliferation involved in wound healing (GO:0060054) in corneal epithelial cells (CL:0000575), PMID:15802269

So the annotation would be;

DB (Col 2) Object (Col 3) GO ID (Col 5) Reference (Col 6) Extension (Col 16)
O00755 Wnt7a GO:0060054 PMID:15802269 occurs_in(CL:0000575)


Exception

One exception to using the occurs_in relationship for enhancing Biological Process annotations is when annotating a gene product to terms such as '<X> cell fate commitment'. The commitment actually occurs in a stem cell before 'X cell' forms. For example, an annotation to 'myoblast cell fate commitment' should not have the annotation extension: occurs_in(CL:0000056), which indicates that the commitment to become a myoblast cell is occuring in the myoblast cell (CL:0000056) as, in fact, it is occuring in a stem cell.

Multiple annotation extensions for cell type

The publication may describe the localization of a gene product in two or more distinct cell types.

For example: Theoretical gene 1234 is located in the mitochondrial membrane (GO:0031966) of Purkinje cells (CL:0000121) and bipolar neurons (CL:0000103), PMID:54321

So the annotation would be;

DB (Col 2) Object (Col 3) GO ID (Col 5) Reference (Col 6) Extension (Col 16)
1234 Theo GO:0031966 PMID:54321 part_of(CL:0000121)|part_of(CL:0000103)

N. B. No meaning is attached to the order of the cell type identifiers listed in column 16


Requesting new ontology terms for cell type

If the cell type term you require does not exist, you can make a request on the Cell Type Ontology SourceForge tracker or, for plant cell types, on the Plant Ontology SourceForge tracker.