Annotation Extension: Capturing participants

From GO Wiki
Jump to navigation Jump to search

N.B. This page is under construction!

Introduction

Allowable relations for participant annotation extensions

has_participant

Usage notes

1. The type of participants that can be captured in the annotation_extension column (Col 16) include gene products (using a UniProtKB ID or MOD gene product ID such as FBtr or FBpp for FlyBase), chemicals (using a ChEBI ID) or a complex ID (e.g. GO:protein complex or child).

2. If the participant is a protein then you should always use a protein ID, rather than a gene ID as proxy. Similarly, if the target is a gene (e.g. transcription) then use a gene ID.

3. An annotation cannot differ solely in the contents of the annotation_extension column. This is because it will be optional for users to process this field. Therefore all information should be added to one annotation line, and separate statements in the annotation_extension column should be separated by pipes (|).

Specifying participant gene products

Gene products are participants in Biological Processes and Molecular Functions. We would use a subtype of the has_participant relation to indicate the particular role that additional gene products play.

Note that the gene product in column 2 (DB_Object_ID) is also a participant. However, the relationship between the process/function in column 5 (GO_ID) and this gene product is currently implicit.

If in doubt about which relation to use, it is always possible to use the most generic relation, has_participant. Of course, this does not carry as much information but at least should be correct.

Need to revise this as we are discouraging use of has_participant directly in annotation


Use case

SIRP beta2 (UniProtKB:Q9P1W8) acts in concert with CD47 (UniProtKB:Q08722) to positively regulate cell-cell adhesion (PMID:15383453)

The annotation for SIRP beta2 would be:

DB (Col 2) Object (Col 3) GO ID (Col 5) Reference (Col 6) Extension (Col 16)
Q9P1W8 SIRPG GO:0022409 PMID:15383453 has_participant(UniProtKB:Q08722)


transcription targets

TODO

regulation targets

TODO

Specifying inter-species protein binding partners

If an experiment showed binding of two proteins from the same species, then the identifier for the binding partner would go in both column 8 and column 16. If it was an inter-species experiment, i.e. a protein from one species and a binding partner from another species, then the accession for the binding partner actually used in the binding experiment would go in column 8 and the accession for the inferred in vivo binding partner would go in column 16.

Use case

1. Chicken SFRP1 (Q9DEQ4) interacts with mouse Frizzled-2 (Q9JIP6) PMID:16172602. The actual experiment was performed with chicken and mouse proteins, but a curator can infer that the chicken SFRP1 would bind the chicken Frizzled-2 (Q9IA06) and the mouse Frizzled-2 would bind the mouse SFRP1 (Q8C4U3). The GO term used to annotate chicken SFRP1 should be 'frizzled binding' (GO:0005109).

So the reciprocal annotations would be;

DB (Col 2) Object (Col 3) GO ID (Col 5) Reference (Col 6) With (Col 8) Extension (Col 16)
Q9DEQ4 SFRP1 GO:0005109 PMID:16172602 Q9JIP6 has_participant(Q9IA06)
Q9JIP6 FZD2 GO:0005515 PMID:16172602 Q9DEQ4 has_participant(Q8C4U3)


Use cases

Regulation of expression and specific gene products

The GO will never pre-coordinate terms such as:

  • regulation of oskar mRNA translation
  • regulation of oskar mRNA transcription

But this is perfectly appropriate to post-compose this term at annotation time.

The GO term used would be "regulation of transcription/translation"

The properties column would contain an ID for oskar or oskar mRNA. Technically it should be

  • a gene ID for "regulation of gene expression"
  • a transcript ID for "regulation of transcription"
  • a protein ID for "regulation of translation"

However, this can often be difficult. We can relax this so long as we are clear on what it means to provide a gene ID for "regulation of translation"

Ruth: example of protein ID in Column 16:

PMID:9368760 From the experiment summarized by 'In vitro, expressed PDPK1 (PDK1) O15530 phosphorylated Thr308 of ATK1 (PKB alpha) P31749. The following annotations could be made using column 16

PDPK1 (PDK1) O15530 GO:0018107 peptidyl-threonine phosphorylation IDA PMID: 9368760 column 16: PKBalpha/ATK1 P31749
PDPK1 (PDK1) O15530 GO:0032148 activation of protein kinase B activity IDA PMID: 9368760 column 16: PKBalpha/ATK1 P31749
PDPK1 (PDK1) O15530 GO:0004674 protein serine/threonine kinase activity IDA PMID: 9368760 column 16:PKBalpha/ATK1 P31749

Additional examples of GO annotation protein targets in column 16: For Molecular Function annotations:

  • P01023 GO:0004867 serine-type endopeptidase inhibitor activity IDA PMID:12538697 column16:P48740
  • P01023 GO:0004867 serine-type endopeptidase inhibitor activity IDA PMID:12538697 column16:O00187
  • Q9BRA2 GO:0047134 protein-disulfide reductase activity IDA PMID:1859519 column16:P63167
  • Q13535 GO:0004672 protein kinase activity IDA PMID:14657349 column16:Q14683

For Biological Process annotations:

  • P31749 GO:0006469 negative regulation of protein kinase activity IMP PMID:9373175 column16:P49841
  • Q92574 GO:0031397 negative regulation of ubiquitination IDA PMID: 11175345 column16:P49815
  • Q8K4B2 GO:0043407 negative regulation of MAP kinase activity IMP PMID:17379480 column16:P47811

But would we include information in Column 16 for function and process terms?

Also the above in vitro experiment provides very good evidence for function and process terms, but would column 16 be completed for less direct experiment evidence, eg:

PMID:9373175 co-expression of ATK1 (ATK/PKB alpha) P31749 with GSK3B (GSK3beta) P49841 in human 293 cells leads to the inactivation of GSK3B. This effect is also seen with transfection with PDK1 and GSK3B.

Could this be interpreted as

  • ATK1 P31749 GO:0006469 negative regulation of protein kinase activity IMP PMID:9373175 column 16: GSK3B P49841
  • PDPK1 (PDK1) O15530 GO:0006469 negative regulation of protein kinase activity IMP PMID:9373175 column 16: GSK3B P49841

Maybe a way of restating this is will column 16 be limited to use when there is evidence of really direct interaction between 2 proteins? Or will it be used more generally when a protein is part of a cascade that leads to an effect on many proteins in which case a large number of proteins will probably end up in column 16?

Would it be possible to pipe together multiple accessions which are 'targets' of GO annotation into column16?

Multiple annotation extensions for targets