Annotation Cross Products

From GO Wiki
Jump to: navigation, search

Please note, this page no longer contains the primary documenation for the annotation extension field. Information here is being moved to the main Annotation Extension wiki page, and supporting pages.


NB: Annotation cross-products are now referred to as annotation extensions

Each GO annotation pairs a single gene product to a single term from the ontology. This restricts annotators in what they can say - there must be a pre-existing term in the ontology, or one must be requested. It would be far less restrictive if the annotator could combine additional terms in a single annotation. These terms could even come from other OBO ontologies, or they could be gene products.

This page describes the proposed new column 16 in the GAF, which allows additional terms to be specified to extend the meaning of an annotation. If an when an annotator chooses to do this, they are effectively creating on "on-the-fly" cross product term. We say "on-the-fly" because the combinatorial term is not added to the ontology (although it could be at a later stage, if the ontology editors choose to do do).

Edimmer info now here

This proposal owes a lot to the MGI structured notes internal field in the MGD database.

See also the minutes from the Column 16 phone call


== The basic idea ==

An existing GO term can be enhanced by one or more relationship-term pairs. These are written as relation(term)

For example, if a gene product Slp1 is localized to the plasma membrane of T-cells, the GAF would look like this (most columns omitted for brevity):

Gene (col 2) Term (col 5) Ref (col 6) Ext (col 16)
Slp1 GO:0005886 PMID:1234567 part_of(CL:0000084)

Here CL:0000084 is the ID for T-cell in the OBO Cell ontology.


The ontology used need not be external - a GO ID can be used, for example to state that a function F is executed during process P. This is stronger than co-annotation as the F and the P are linked.

gene product identifiers can also be used.

information now present here and here Edimmer

Relations used

<strike> The following set of relations are used. Their use will be explained in the examples that follow:

For biological processes and molecular functions:

For cell components:

This information is now out-of-date, as we now have the go_annotation_extension_relations.obo file Edimmer

Enhancing Cell Component Annotations

=== Localization ===

<strike>Localization annotations can be enhanced by specifying either

<strike>* the cell type which that cell component is part of <strike>* the gross anatomical entity which the cell component is part of

<strike>If a gene product is located to the mitochondrial membrane in a spermatocyte:

 col 5: GO:0031966
 col 16: part_of(CL:0000017)

If a gene product is localized to a nucleus in the cerebellum in mouse:

 col 5: GO:0005886
 col 16: part_of(MA:0000198)

If a gene product is localized to the plasma membrane of epithelial cells in the lung in mouse:

 col 5: GO:0031966
 col 16: part_of(CL:0000017),part_of(MA:0000415)

If the same publication shows localization to the nucleus in both the cerebellum and spinal cord:

  col 5: GO:0005886
  col 16: part_of(MA:0000198)|part_of(MA:0000216)


Above examples are now at: http://wiki.geneontology.org/index.php/Annotation_Extension:_Capturing_cell_and_tissue_types R. Huntley 3-May 2012

<strike>=== Specifying the stage at which a localization is observed ===

<strike>Cellular component annotations can be enhanced by specifying that localization is observed during a cell cycle or developmental stage, or in the context of a specific biological process.

<strike>If a gene product is localized to the nuclear periphery in S phase, G2, and mitosis (S. pombe Ulp1; PMID:11884512):

 col 5: GO:0034399 ! nuclear periphery
 col 16: exists_during(GO:0000084)|exists_during(GO:0000085)|exists_during(GO:0007067) ! S phase of mitotic cell cycle,
 G2 phase of mitotic cell cycle, and mitosis respectively

<strike>If a gene product is localized to the spindle pole body (SPB) and nucleolus in interphase and to the actin contractile ring, the mitotic spindle, and kinetochores during mitosis (S. pombe Clp1; PMID:16085490):

 col 5: GO:0005816 ! spindle pole body
 col 16: exists_during(GO:0051329) ! interphase of mitotic cell cycle
 col 5: GO:0005730 ! nucleolus
 col 16: exists_during(GO:0051329) ! interphase of mitotic cell cycle
 col5: GO:0005826 ! actomyosin contractile ring
 col 16: exists_during(GO:0007067) ! mitosis
 col5: GO:0005819 ! spindle
 col 16: exists_during(GO:0007067) ! mitosis
 col5: GO:0000777 ! condensed chromosome kinetochore
 col 16: exists_during(GO:0007067) ! mitosis

<strike>Note that an experiment that supports 'CC exists_during BP' may also support an annotation of the 'BP occurs_in CC' pattern.

<strike>Also see the go-discuss mailing list for more information.

Above examples are now at: here and here R. Huntley 03-05-2012

Enhancing Molecular Function and Biological Process Annotations

=== Specifying the location in which a process happens ===

<strike>This is very similar to cellular component, except we cannot used part_of, as part_of must hold between either two physical entities or two process/functions, but not between a process and a cell component. Instead we use occurs_in

<strike>If a gene product is involved in transcription in Purkinje cells:

 col 5: GO:0006350
 col 16: occurs_in(CL:0000121)

<strike>If a gene product is involved in gluconeogenesis in the liver:

 col 5: GO:0006094
 col 16: occurs_in(MA:0000358)

Above examples are now at: http://wiki.geneontology.org/index.php/Annotation_Extension:_Capturing_cell_and_tissue_types R. Huntley 03-05-2012


Process terms can be further specified by subcellular location. For example: plastid translational elongation

At the time of writing this term is not declared in GO. Again we use the occurs_in relation:

 Col 5: GO:0006414 
 Col 16: occurs_in(GO:0009536)

Why, you might ask, can we not instead make two annotations to:

  • GO:0032544 ! plastid translation
  • GO:0006414 ! translational elongation

The answer is that co-annotation carries less information. Computationally we have no way of knowing these two processes are linked. See the FAQ

Note that the majority of the time, BP x CC cross-products should be pre-composed in the ontology. If the above scenario comes up, consider requesting a new term plastid translational elongation rather than using col 16.

Also note that when using a GO ID in col 16, a redundant annotation should sometimes be added. See #Guidelines

Above example is now: here and here

=== Specifying the developmental stage at which a process occurs ===

We can use a developmental stage ontology and the part_of relation. part_of is used because both the process/function and the developmental stages are things with temporal parts (they are occurrents in bfo-speak).

For example, apoptosis during Segmentation:1-4 somites in zebrafish

 col5: GO:0006915
 col16: part_of(ZFS:0000023)

Above example is now at:http://wiki.geneontology.org/index.php/Part_of

=== Response to chemicals ===

Here we use the has_input relation - the input to the process is a chemical, the output is some change in state as a result of exposure.

 col5: GO:nnnnnn
 col16: has_input(CHEBI:nnnnn)

Sometimes it is better to request a GO term here

Above example is now at: http://wiki.geneontology.org/index.php/Has_input

Functions carried out as part of a process

We use the part_of relation to link function and process (this relation is already used for the inter-ontology links)

For example, if a gene product is observed to have GTPase activity as a part of the nerve growth factor receptor signaling pathway, you would annotate:

 col5: GO:0003924
 col16: part_of(GO:0048011)

Note you should also include a separate annotation in which GO:0048011 is in col5, so that people who are not using col 16 will not be worse off than they are now. See guidelines.

Note that you would not say something like this:

 col5: GO:0016301
 col16: part_of(GO:0016310)
  • GO:0016301 - kinase activity
  • GO:0016310 - phosphorylation

This is harmless but pointless, because we know that kinase activity is part_of phosphorylation from gene_ontology_ext

Above examples are now at: http://wiki.geneontology.org/index.php/Part_of

Function-Process-Component threesomes

 col5: GO Function ID
 col16: part_of(GO PROCESS ID),occurs_in(GO CC ID)

Also include 2 redundant annotation lines

Above example is now at: http://wiki.geneontology.org/index.php/Part_of and http://wiki.geneontology.org/index.php/Occurs_in

Specifying interacting partner gene products

Gene products are participants in processes/functions. We would use the has_participant relation or one of it's subtypes to indicate the particular role that additional gene products play. These will typically be has_input or has_output

Note that the gene product in col 2 is also a participant. However, the relationship between the process in col5 and this gene product is currently implicit, so we don't need to worry about it here.

If in doubt about which relation to use, it is always possible to use the most generic relation, has_participant. Of course, this does not carry as much information but at least should be correct.

The gene product could be a UniProtKB ID, a gene product ID from the same MOD that is contributing the annotation (e.g. FBtr or FBpp for FlyBase). If the target is a protein then you should always use a protein ID, rather than a gene ID as proxy. Similarly, if the target is a gene (e.g. transcription) then use a gene ID.

As an example, consider SIRP beta2 which in concert with CD47 positively regulates cell-cell adhesion (PMID:15383453)

The annotation for SIRP beta2 would be:

 col5: GO:0022409
 col16: has_participant(UniProtKB:Q08722)

Here Q08722 is the ID for CD47

NOTE: (2010-04-30) - we decided there is a simpler way to do this using GO:0050839 ! cell adhesion molecule binding, however this has less information that saying specifically that it's involved in cell-cell adhesion, and I don't think it's appropriate to add a F->P link from 'cell adhesion molecule binding'.


Phosphorylation targets

If protein SGD:A phosphorylates protein SGD:B then annotate A to:

 col5: GO:0004672
 col16: has_input(SGD:B)

GO:0004672 is protein kinase activity

note we would not include a separate annotation line for B, because we only have annotation lines for active participants

strictly speaking, the input is SGD:B in the unphosphorylated state and the output is SGD:B in the phosphorylated state. However, currently we do not have IDs for these separate protein forms. Really B is both an input and an output. We standardize on has_input here.

Note there is no need to say

 col16: has_input(SGD:B),has_input(CHEBI:15422)

or even

 col16: has_input(SGD:B),has_input(CHEBI:15422),has_output(SGD:B),has_output(CHEBI:16761)

These are correct but this is pointless because the additional info is redundant with what we already know about kinase activity (this is actually made computable in MF x CHEBI)

Also there is no need to make a separate col 16 annotation for the phosphorylation process as this can be inferred

If protein SGD:A phosphorylates protein SGD:B and SGD:C then annotate A to:

 col5: GO:0004672 (protein kinase activity)
 col16: has_input(SGD:B),has_input(SGD:C)

There is some redundancy with interaction databases here. Capturing this as GO annotation is more expressive as you can say "A phosphorylates B during pathway C". But if you want to capture this in interaction databases exclusively we have tools for generating GO annotations from these (just as we have tools for capturing GO annotations from pathway databases)

transport targets

TODO

Examples of this are now at http://wiki.geneontology.org/index.php/Localizes#Annotation_Extension_Usage_Examples

transcription targets

TODO

Specifying inter-species protein binding partners

If an experiment showed binding of two proteins from the same species, then the identifier for the binding partner would go in both column 8 and column 16. If it was an inter-species experiment, i.e. a protein from one species and a binding partner from another species, then the accession for the binding partner actually used in the binding experiment would go in column 8 and the accession for the inferred in vivo binding partner would go in column 16.

Use case

1. Chicken SFRP1 (Q9DEQ4) interacts with mouse Frizzled-2 (Q9JIP6) PMID:16172602. The actual experiment was performed with chicken and mouse proteins, but a curator can infer that the chicken SFRP1 would bind the chicken Frizzled-2 (Q9IA06) and the mouse Frizzled-2 would bind the mouse SFRP1 (Q8C4U3). The GO term used to annotate chicken SFRP1 should be 'frizzled binding' (GO:0005109).

So the reciprocal annotations would be;

DB (Col 2) Object (Col 3) GO ID (Col 5) Reference (Col 6) With (Col 8) Extension (Col 16)
Q9DEQ4 SFRP1 GO:0005109 PMID:16172602 Q9JIP6 has_participant(Q9IA06)
Q9JIP6 FZD2 GO:0005515 PMID:16172602 Q9DEQ4 has_participant(Q8C4U3)


External Ontologies required

Only ontologies committed to the principles of the [http:obofoundry.org OBO Foundry] should be included.

  • CHEBI : Chemical Entities
  • CL : Cell ontology
  • taxon-centric anatomy ontologies (AOs):
    • ZFA (zebrafish)
    • MA (adult mouse)
    • FMA (human)
    • XAO (xenopus)
    • FBbt (fly)
    • WBbt (worm)
    • PO (plant anatomy)
    • (add others here)

Open questions:

  • terms such as "blastocyst" in human? FMA is only adult structures

Use Cases

Immune System regulation terms: BP and CL

(see email thread from Evelyn on GO list, "another immune related query GO and CL")

chicken IL-10 is secreted from say.e.g macrophages BUT causes 'negative regulation of interferon gamma biosynthesis' in chicken splenocytes..

TODO: need help refining this use case. It was decided that splenocytes were not a great example

Subcellular localisation (CC) within a specific type of cell (CL)

  • Toll-like receptor 4 (TLR4) (O00206) is located intracellularly in the perinuclear region (GO:0048471) only in immature DC, PMID:15027902
  • TLR4 is located on the cell surface (GO:0005887) in monocytes, PMID:15027902

Evelyn's comments: So protein localisation is cell type specific and for immune gene GO annotation I think we need to be able to capture this.

Another example:

We want to annotate "localised to nucleus of spermatocyte"

Note that we have some pre-coordinated CC-CL terms in GO. See XP:cellular_component_xp_cell

Example from MGI: TODO

Regulation of expression and specific gene products

The GO will never pre-coordinate terms such as:

  • regulation of oskar mRNA translation
  • regulation of oskar mRNA transcription

But this is perfectly appropriate to post-compose this term at annotation time.

The GO term used would be "regulation of transcription/translation"

The properties column would contain an ID for oskar or oskar mRNA. Technically it should be

  • a gene ID for "regulation of gene expression"
  • a transcript ID for "regulation of transcription"
  • a protein ID for "regulation of translation"

However, this can often be difficult. We can relax this so long as we are clear on what it means to provide a gene ID for "regulation of translation"

Ruth: example of protein ID in Column 16:

PMID:9368760 From the experiment summarized by 'In vitro, expressed PDPK1 (PDK1) O15530 phosphorylated Thr308 of ATK1 (PKB alpha) P31749. The following annotations could be made using column 16

PDPK1 (PDK1) O15530 GO:0018107 peptidyl-threonine phosphorylation IDA PMID: 9368760 column 16: PKBalpha/ATK1 P31749
PDPK1 (PDK1) O15530 GO:0032148 activation of protein kinase B activity IDA PMID: 9368760 column 16: PKBalpha/ATK1 P31749
PDPK1 (PDK1) O15530 GO:0004674 protein serine/threonine kinase activity IDA PMID: 9368760 column 16:PKBalpha/ATK1 P31749

Additional examples of GO annotation protein targets in column 16: For Molecular Function annotations:

  • P01023 GO:0004867 serine-type endopeptidase inhibitor activity IDA PMID:12538697 column16:P48740
  • P01023 GO:0004867 serine-type endopeptidase inhibitor activity IDA PMID:12538697 column16:O00187
  • Q9BRA2 GO:0047134 protein-disulfide reductase activity IDA PMID:1859519 column16:P63167
  • Q13535 GO:0004672 protein kinase activity IDA PMID:14657349 column16:Q14683

For Biological Process annotations:

  • P31749 GO:0006469 negative regulation of protein kinase activity IMP PMID:9373175 column16:P49841
  • Q92574 GO:0031397 negative regulation of ubiquitination IDA PMID: 11175345 column16:P49815
  • Q8K4B2 GO:0043407 negative regulation of MAP kinase activity IMP PMID:17379480 column16:P47811

But would we include information in Column 16 for function and process terms?

Also the above in vitro experiment provides very good evidence for function and process terms, but would column 16 be completed for less direct experiment evidence, eg:

PMID:9373175 co-expression of ATK1 (ATK/PKB alpha) P31749 with GSK3B (GSK3beta) P49841 in human 293 cells leads to the inactivation of GSK3B. This effect is also seen with transfection with PDK1 and GSK3B.

Could this be interpreted as

  • ATK1 P31749 GO:0006469 negative regulation of protein kinase activity IMP PMID:9373175 column 16: GSK3B P49841
  • PDPK1 (PDK1) O15530 GO:0006469 negative regulation of protein kinase activity IMP PMID:9373175 column 16: GSK3B P49841

Maybe a way of restating this is will column 16 be limited to use when there is evidence of really direct interaction between 2 proteins? Or will it be used more generally when a protein is part of a cascade that leads to an effect on many proteins in which case a large number of proteins will probably end up in column 16?

Would it be possible to pipe together multiple accessions which are 'targets' of GO annotation into column16?

Binding

https://sourceforge.net/tracker2/?func=detail&aid=2175326&group_id=36855&atid=440764

Response to drug (BP + CHEBI)

See tracker item discussion.

We don't want to make children of "response to drug" as this would violate the TP rule ("drugs" do not always play the role of drugs). Instead we would like to indicate when the response to chemical X is a drug-response at annotation time

Linking together annotations

Question from Emily:

"In addition, would this column be the place to specifically link together annotations from the different GO vocabularies? For instance if you had say, four annotations for protein X which had been annotated to: 'regulation of transcription', 'protein stabilization', 'cytoplasm' and 'nucleus' - a curator might want to link the 'regulation of transcription' process annotation specifically with the cellular component 'nucleus'."

The two options here are:

  1. group the annotations together somehow, perhaps using a grouping ID.
  2. redundantly indicate the localisation information

In the second scenario, there would be a normal looking annotation to 'nucleus' with nothing in the properties column. There would also be an annotation to 'regulation of transcription' annotation, and this would have 'nucleus' in the properties column.

Guidelines

Constraints on relations

To help ensure the correct relations are used in the correct circumstances we provide this table

Relation Column 4 (core term) Col 16
occurs_in BP CC or CL or gross anatomy term
part_of CC CC or CL or gross anatomy term
part_of MF MF or BP
part_of BP BP
has_participant BP or MF CC or CL or gross anatomy or CHEBI or (typically) gene product
has_input BP or MF CC or CL or gross anatomy or CHEBI or (typically) gene product
has_output BP or MF CC or CL or gross anatomy or CHEBI or (typically) gene product
results_in_transport_of BP or MF CC or CHEBI or (typically) gene product
response_to BP or MF CL or CHEBI or (typically) gene product


For example, part_of would not be used between a process and a component. It *could* be used in a CC annotation, to note the cell; eg spermatocyte:

This is the CL ID for "spermatocyte". If the GO term in the annotation was for "nucleus", then the overall meaning of the annotation would be "a nucleus that is in a spermatocyte"

Additional Example (obsolete)

THIS SECTION DEPRECATED - CURRENTLY REARRANGING EXAMPLES

The following examples are expressed as pseudo-GAFs. We omit some columns for brevity. (note that the parts after the ! would not be in the actual file, we are just including them here to make the examples readable!)


BP-MF Example

Here is gene 1234 that executes GTPase activity as part of an intracellular signaling cascade

Gene (col 2/3) Term (col 5) Ref (col 6) Ext (col 16)
Gene1234 GO:0003924 ! GTPase activity PMID:nnnn part_of(GO:0007242) ! intracellular signaling cascade
Gene1234 GO:0007242 ! intracellular signaling cascade PMID:nnnn (empty)

CC-CL Example

Here is an imaginary gene localized to the mitochondrial membrane in a spermatocyte:

Gene (col 2/3) Term (col 5) Ref (col 6) Ext (col 16)
Gene1234 GO:0031966 ! mitochondrial membrane PMID:nnnn part_of(CL:0000017) ! spermatocyte

BP x CC

  • Gene1234 has a gene product that is involved in plastid translational elongation

At the time of writing this term is not declared in GO. Here we use the occurs_in relation:

Gene (col 2/3) Term (col 5) Ref (col 6) Ext (col 16)
gene1234 GO:0006414 ! translational elongation PMID:nnnn occurs_in(GO:0009536) ! plastid

|- | gene1234 | GO:0009536 ! plastid | PMID:nnnn | |}

Why, you might ask, can we not just co-annotate to

  • GO:0032544 ! plastid translation
  • GO:0006414 ! translational elongation

The answer is that co-annotation carries less information. Computationally we have no way of knowing these two processes are linked. See the FAQ

BP x anatomy example

Example of a gene product executing its function in a particular location. Here we use the occurs_in relation:

Gene (col 2/3) Term (col 5) Ref (col 6) Ext (col 16)
CREB GO:0006094 ! gluconeogenesis PMID:nnnn occurs_in(MA:0000358) ! liver

binding example

See 2175326

Also: http://gowiki.tamu.edu/wiki/index.php/RefGenome_Electronic_Jamboree_2008-10_PFKL

E coli pfkA has a function in PEP binding

DB (col 1) GeneID (col 2) Gene Symbol (col 3) Term (col 5) Ref (col 6) Ext (col 16)
UniProt P0A796 pfkA GO:0042301 ! phosphate binding PMID:17307338 has_input(CHEBI:44897) ! phosphoenolpyruvic acid

Important points:

  • The most specific available pre-coordinated term goes in Col 5 (i.e. phosphate binding, not binding). This ensures that searches for phosphate binding work in the absence of a reasoner
    • Note that we used GO:0005488 binding, not phosphate binding. Not sure I understand why one would use the latter --JimHu 07:40, 27 March 2009 (PDT)
  • It's not clear which CHEBI term to use: CHEBI:44897 or CHEBI:18021 (phosphoenolpyruvate)?
    • I chose the former in this example simply because it has an is_a parent. CHEBI terms without is_a parents should NOT be used. This is because we need the is_a parent to figure out the correct parentage in GO
    • See the thread on the GO list for further discussion

TLR example

  • Toll-like receptor 4 (TLR4) (O00206) is located intracellularly in the perinuclear region (GO:0048471) only in immature DC, PMID:15027902
  • TLR4 is located on the cell surface (GO:0005887) in monocytes, PMID:15027902

In this example, one of the CL terms is not present, so the GO annotator would make a request on the CL tracker (for a list of trackers, see the front page of http://obofoundry.org)

Gene (col 2/3) Term (col 5) Ref (col 6) Ext (col 16)
TLR4 O00206 perinuclear region (GO:0048471) PMID:15027902 part_of(CL:new) ! immature dendritic cell
TLR4 O00206 cell surface (GO:0005887) PMID:15027902 part_of(CL:0000576) ! monocyte

Multiple localizations example

What if the publication describes separate observations - perhaps one for biopolar neuron and one for Purkinje cell?

We can separate these using the pipe symbol |. This is equivalent to splitting the annotation over two lines. For example:


Gene (col 2/3) Term (col 5) Ref (col 6) Ext (col 16)
Gene1234 GO:0031966 ! mitochondrial membrane PMID:nnnn part_of(CL:0000121) PIPE part_of(CL:0000103) ! biopolar neuron & Purkinje cell


(I can't figure out how to include a pipe in a wiki table so I just wrote PIPE!)

The "|" separator indicates that this is a separate localization of a different instance of this gene product.

The remember that the CL term names would not be in the GAF - they are included here to make the examples readable

What if we want to annotate two separate observations of the same subcellular localization - one from an astrocyte of the hippocampus, the other from a B cell in the lymph?

We use the "," to indicate an additional extension for the same observation. So the above would be:

Gene (col 2/3) Term (col 5) Ref (col 6) Ext (col 16)
Gene1234 GO:0031966 ! mitochondrial membrane PMID:nnnn part_of(CL:0000127),part_of(MA:0000953) PIPE part_of(CL:0000236),part_of(MA:0002520) ! one from an astrocyte of the hippocampus, the other from a B cell in the lymph

This would be equivalent to two annotations

Gene (col 2/3) Term (col 5) Ref (col 6) Ext (col 16)
Gene1234 GO:0031966 ! mitochondrial membrane PMID:nnnn part_of(CL:0000127),part_of(MA:0000953) ! astrocyte of the hippocampus
Gene1234 GO:0031966 ! mitochondrial membrane PMID:nnnn part_of(CL:0000236),part_of(MA:0002520) ! a B cell in the lymph

Here is another, real life example from MGI:


Gene (col 2/3) Term (col 5) Ref (col 6) Ext (col 16)
MGI:1919277 Slc39a4 GO:0016324 ! apical plasma membrane PMID:nnnn part_of(MA:0000337),part_of(CL:0000584) ! enterocyte of small intestine
MGI:1919277 Slc39a4 GO:0016324 ! apical plasma membrane PMID:nnnn part_of(EMAP:6894),part_of(CL:0000223) ! endodermal cell of TS22\,extraembryonic component

Response to drug

E.g. "response to cocaine".

Option 1:

Gene (col 2/3) Term (col 5) Ref (col 6) Ext (col 16)
moody (FBgn0025631) GO:0042220 ! response to cocaine PMID:nnnn response_to(CHEBI:23888) ! drug

Here we need a new relation, "response_to"


Transport

See: http://mcb.asm.org/cgi/content/full/22/4/1266?view=long&pmid=11809816

Gene (col 2/3) Term (col 5) Ref (col 6) Ext (col 16)
Ipo11 (MGI:nnnn) ribosomal import into nucleus PMID:11809816 results_in_transport_of(Uniprot:nnnnnn) ! rpL12

Implementation Plan

  1. test annotation files will be made available to Berkeley (contributors: MGI, GOA, Dicty...?) with col16 populated
  2. Berkeley will populate a test database (Seth)
  3. toy version of AmiGO with CL IDs queryable
  4. change schema of production db
  5. officially add spec for col16
  6. annotation contributors start adding columns
  7. CL populated and queryable in public amigo
  8. Extend scheme to other OBO ontologies

the toy v of amigo should be ready by the GO meeting

Database Implementation

See SWUG:Database


FAQ

Will this replace existing combinatorial GO terms like "B cell differentiation"

No! It is important to keep terms like this pre-coordinated in the GO.

When do I request a new term and when do I use the annotation xp column?

Request a new term if it seems like a sensible new term to have in GO. Combinatorial terms in GO are fine if it corresponds to a commonly used scientific term, and the combination is not completely arbitrary and accidental.

For more on this important issue, and a discussion of when to pre-composed and when to compose at annotation time, see this thread on the GO list from March 2009: http://fafner.stanford.edu/pipermail/go/2009-March/016501.html

How will this column be used by tools?

Tools and databases do not have to use col 16. If they elect not to use it, they are no worse off than prior to the introduction of column 16. It is an optional extensions.

However, we do recommend that tools start using it in order to provide more accurate results and queries ASAP. For example, using the annotation XP column it may be possible to get more sensitive term enrichment results.

What happens when new specific GO terms corresponding to the annotation XPs are added?

Let's say annotator A wishes to annotate to "plastid translational elongation", but there is no such term in GO, because it is (for example) deemed to be not sufficiently different from generic translational elongation.

They should then annotate to "translational elongation" and also put "occurs_in(plastid)" in col16

Then let's say later on we discover that "plastid translational elongation" does belong in GO after all (policy changes or we discover something about the biology), so the term gets added

Crucially, the annotator need do nothing. Their annotation can be automatically mapped forward, once an entry for "plastid translational elongation" is added to XP:biological_process_xp_cellular_component

=== Why allow GO IDs in col 16? Can I just co-annotate instead === Information now available here

co-annotation is not sufficient. Important information is lost. For example, if a gene has 4 annotations to

  • mitochondrion
  • nucleus
  • translation
  • transport
We have no way of knowing whether the gene is involved in
  • nuclear translation vs mt translation (or both)
  • transport within, to or from cytoplasm or nucleus

Appendix

Grammar for col 16

This is specified as a BNF grammar. This is necessary to keep the field extensible enough for future use. Note that the column is optional, so there is no requirements for people to parse it. It is an 'added bonus' column

 PropertiesSet := Properties | Properties "|" PropertiesSet
 Properties := Property | Property ',' Properties
 Property := Relation '(' Term ')'
 Term := ID
 Relation := Relation-Abbrev | ID
 ID := ID-Space ':' Local-ID
 ID-Space := XML-NMToken
 Local-ID := chars
 Relation-Abbrev := chars

Relations can be abbreviated; eg part_of can be used in place of OBO_REL:part_of

This can be extended to allow for nested expressions:

 Term := ID | ID '^' Properties


Meetings