Relationships between annotation objects and ontology terms

From GO Wiki
Jump to: navigation, search


 TO REVIEW

[gp] [relationship] [GO term]

The relationship between an annotation object and GO ID should be made explicit.

As this relationship is currently implicit, annotation groups and users have assumed them to mean slightly different things (for instance does a CC annotation mean the gp is just found to located in associated location, or active there?). Therefore to automatically apply relationships to all GO annotations we would now need to use the most conservative values: [gp] part_of [cc], participates_in [bp], actively_participates_in [mf] to update existing annotations. These default relationships should be correct in all cases. Groups are welcome to subsequently apply more descriptive relationships to sets of annotations where they are able.

The action of having the relationship always explicit would improve interpretation of GO annotation for users and allow us to generate a set of relationships that would provide more information in an annotation and help us with existing ontology issues.

Relationships could be added into the qualifier field (column 4). The existing qualifier 'NOT' could be piped in front of the chosen relationship (e.g. NOT | part_of, NOT | participates_in), however the other current qualifiers: colocalizes_with and contributes_to would become relationships in their own right.

Greater use of the qualifier field would ensure values would be more visible. Currently many users ignore column 4 as it is too often emtpy. In addition, use of column 4 would mean annotation databases would not need to add in a further field into their annotation tools/displays, which for some is a time-consuming issue for their databases.

Groups could choose at which level to use these relationships - perhaps some groups would decide just apply the default relationships, or use a limited range of more descriptive relationships that the GOC agrees upon.

Relationships would be fully defined including the appropriate scope of usage and made available here:

http://www.geneontology.org/scratch/xps/go_annotation_extension_relations.obo

Cellular Component terms

Default: [gp] part_of [GO cc term]

- automatically retrofitted to all non-qualified cellular component annotations.

Other possible relationships:

[gp] colocalizes_with [GO CC term - relationship currently exists as a qualifier.

[gp] active_in [GO cc term]


[gp] transported_by [GO cc term]


[gp] Posttranslationally_modified_in [GO cc term]


Advantages:

- allow curators to indicate more precisely the relationship between the annotated gp and GO term.

- Many users assume that annotated GO CC terms indicate where the gp is active. This is not always the case - especially if a curated paper only supplies subcellular location information and does not indicate any link between function and gp location.

[gp] located_in_other_organism [GO cc term]

[gp] located_in_host [GO cc term]

Advantages:

- Usage of the above two relationship would mean the different CC terms for host cell/cell CC terms would disappear. Why have the term 'host cell nucleus' and 'nucleus' separate when they describe the same entity, but from the differing perspectives of the annotated gene product?

- If a user were to review the composition of a virally-infected human nucleus they would expect both the viral and human proteins to be both annotated to the same location. The relationship that an annotation object has to a GO term would seem to be better placed in the annotation rather than the ontology.

- Use of such qualifiers would mean many of the CC terms under 'GO:0043657; host cell' would be merged into cell child terms, with the annotations automatically updated to apply the appropriate GO term relationship, based on the the GO term used. For example, if a gp applies the term 'GO:0042025; host cell nucleus' then the annotation would be updated to located_in_host + 'GO:0005634; nucleus'

- this would help correct annotations from the Reactome, UniProtKB and PAMGO groups. Annotations created by groups that do not annotate to pathogens/venoms would not need to change any annotations. As a limited number of groups/files would be disrupted by this action, this could be seen as a second step, after the default relationships are added into all files

- the inference pipeline using interontology relationships between BP and CC terms, could be further developed, so that a multi-organism BP would infer a CC annotation using the correct [gp] to GO term qualifier.


[gp] member_of [GO cc protein complex term]

[gp] intrinsic_to [GO cc term]

[gp] extrinsic_to [GO cc term]

[gp] spans [GO cc term]

[gp] partially_spans (?) [GO cc term]

Advantages:

- such qualifiers would remove membrane topology information from the GO and move it into the annotation, simplifying the CC ontology.

- again, these qualifiers could be retrofitted automatically based on the GO term originally applied in an annotation.

Molecular Function

Default: [gp] 'actively_participates_in' [GO MF term]

- automatically retrofitted to all non-qualified molecular function annotations.

[gp] 'contributes_to' [GO MF term - relationship exists as a qualifier.

[gp] functions_in_other_organism [GO MF term]


[gp] functions_in_host [GO MF term]

Advantages:

- Currently, MF annotations do not indicate any multi-organism aspect to their activity.

Example: m1-Toxin1 from the green mamba venom binds to the human M1 muscarinic receptors and competes with antagonist NMS or slows the disassociation of the antagonist NMS when it has been previously applied and prevents the action of muscarinic agonists (PMID:1562434). This interaction has been annotated to GO:0048019 receptor antagonist activity.

As the curator had not anotated a BP term, one was inferred by the GOC pipeline from the annotation to GO:0048019 and inferred: 'GO:0009968 negative regulation of signal transduction'

However, ideally the inferred annotation would have been 'GO:NEW; negative regulation of signal transduction in other organism'

If the curator had added 'functions_in_other organism' qualifier (or equivalent), then this would allow the GOC MF->BP inference pipeline tosuggest a more appropriate BP term.

gp [substrate_of] [GO MF term]

- currently the targets of an annotation can only be supplied in the with or annotation extension fields (columns 8 and 16), and the curator needs to know the object of an annotation. However, allowing targets of GO terms into column 2 would help curators capture further information.

Example:PMID:10085113 describes the caspase cleavage site in Atrophin-1; indicating that it is a target of executioner caspases and involved in the execution phase of apoptosis. While caspase 3 is used in the paper to demonstrate this protein is a caspase substrate, it is likely to be the target of other executioner caspases as well.

This information could be captured as:

Atrophin-1 [substrate_of] GO:0004197 cysteine-type endopeptidase activity

Biological Process

Default:[gp] participates_in [bp]

- automatically retrofitted to all non-qualified molecular function annotations.

Relationships used by other GOC groups

TAIR

    76 colocalizes with        C
   966 constituent of  C
    15 contributes to  F
    63 expressed during        P
   172 expressed in    C
     1 expressed only during   C
    13 expressed only during   P
   858 functions as    F
 16354 functions in    F
     1 has     C
 47277 has     F
    61 has     P
  1569 has protein modification of type        P
     4 involved in     F
 98344 involved in     P
   127 is downregulated by     P
   796 is subunit of   C
 58973 located in      C
     1 member of       C
     5 regulates       C
   547 related to      P
    16 represses       P
   140 required for    P


Additional Qualifiers