2010 GO camp binding documentation issues

From GO Wiki
Jump to: navigation, search

Background

Agreed Guidelines for GOC website 19 July 2010

The binding group presented annotation policy suggestions at the last GO Consortium meeting, see Binding_terms_working_group#2010_discussion (beware: this is a huge page!) . These annotation guidelines should be finalized, and a full set of annotation guidelines on how annotation data is curated and presented in GAF files should be fully documented. For instance determine:

  • that binding annotations, especially those where the GO term does not specify a particular binding partner, should where possible, indicate the interacting partner in the 'with' column
  • fully describe the usage of pipes etc in the 'with' column.
  • ask that protein binding annotations should be reciprocal - that if protein A is annotated as binding protein B, the reverse annotation should be provided

The use of column 16 to identify protein/gene targets of an molecular function GO annotation, could benefit this binding discussion. Go to protein binding Annotation consistency]

Links to pages of relevance

Discussion 30-04-2010

Present: Emily Dimmer, Jennifer Smith, Tom (RGD), Ruth Lovering, Ben Hitz, Chris Mungall, Ursula Hinz, Lakshmi Pillai, Serenella Ferro Rojas.

Chair: Emily

Minutes: Ruth

Minutes:

Emily described how Column 16 should be used to provide additional annotation information when the GO term and its definition does not fully describe the experimental data available.

For example, an appropriate use of column 16: With a GO:0004713 protein tyrosine kinase activity annotation (column 5) the specific target protein identifier could be included in column 16.

Chris pointed out that inappropriate use of column 16 would be to include a target identifier which is implied by the GO term.

For example, an inappropriate use of column 16: With a GO:0005524 ATP binding annotation (column 5) the ChEBI ID for ATP should not be included in column 16 as binding to ATP is explicitly implied by the GO term.

Chris and Emily discussed the inconsistent column usage that will be created through the use of coumn 16 and the 'with' column. This issue to be discussed elsewhere!

Ruth asked if column 16 would be completed for both the function and process terms, for example for GO:0004713 protein tyrosine kinase activity and GO:0018108 peptidyl-tyrosine phosphorylation. Chris explained that linking the function and process ontologies will mean that these annotations will be redundant as GO:0018108 peptidyl-tyrosine phosphorylation will be the parent of GO:0004713 protein tyrosine kinase activity.

Serenella asked how users would identify proteins which bind ATP and Chris explained that this was something that the ontology developers need to improve the ontology to facilitate this.

It was suggested that the use of column 16 should be discussed at the GO camp. Possible example papers to be suggested by working group, ideally to cover protein kinase activity on proteins x, y, z; the use of column 16 to specify the cell type in which a specific process is occurring; the use of column 16 to specify a gene regulated by a specific transcription factor.

Specifying interacting partner gene products

Chris reviewed the discussions that have taken place on the column 16 wiki and the relationship ontologies (has-participant, has_input) that have been proposed to capture the relationship between a protein and a target.

The first example in this wiki is SIRP beta2 which in concert with CD47 positively regulates cell-cell adhesion (PMID:15383453)

The annotation for SIRP beta2 would be:

col5: GO:0022409
col16: has_participant(UniProtKB:Q08722)

Here Q08722 is the ID for CD47

Chris suggested a more descriptive functional child term would be helpful, such as 'cell adhesion binding involved in cell-cell adhesion'. Ruth pointed out the existence of 'cell adhesion molecule binding' as a term. However, this has less information that saying specifically that it's involved in cell-cell adhesion, and it is not appropriate to add a F->P link from 'cell adhesion molecule binding'.

Chris has added a SF tracker item: https://sourceforge.net/tracker/?func=detail&aid=2994920&group_id=36855&atid=440764 proposing a term link 'cell adhesion molecule binding involved in (regulating?) cell-cell adhesion'

Cross species experiments Emily suggested that when cross species experiments are annotated and the direct binding protein is added to column 8 (with column) then the orthologous gene (invivo participant) in the same species as the annotated protein should be added to column 16. eg human protein A; GO:0005515 protein binding; [with column] mouse protein B; [column 16] human ortholog protein B.

Ruth suggested that this suggestion should be commented on by GOC members before this idea is developed fully.

In theory Pro Ids, or InterPro Ids could be included in column 16 (or with column?)

Chris also confirmed that caution should be applied when using column 16 and a process annotation. For example it would not be appropriate to add all proteins phosphorylated following activation of a kinase in a cell based system; because many of the proteins would be phosphorylated by other kinases (due to a cascade of activation) not by the protein annotated with the GO term 'phosphorylation'. Therefore, only targets known to be the direct target of the annotated protein should be included in column 16 regardless of the GO term associated with the annotated protein.

To close Ruth brought up Pascales request to remove the GO term 'water binding'. It was agreed that this request should be submitted to SF.


Agenda for discussion 26-05-2010

Survey open: Click here to take survey Results of survey

1. Cross species experiments and capture of in vivo targets/participant in column 16 (see Cross species experiments in minutes 30/04/2010 above) Binding_discussion_emails#Use_of_column_16_to_capture_in_vivo_participant

2. Use of column 16 to identify specific targets of molecular function annotations Binding_discussion_emails#Using_column_16_in_conjunction_with_an_catalytic_activity_annotation

3. Suggestion to obsolete gene specific terms such as P53 binding

4. Any other issues raised by the survey not covered by the above discussions

5. Format of the Binding Group's Focused Annotation Session at the GO camp.

Possible GO Camp agenda

1. 20 min to go over current GO guidelines on binding, as listed as agreed see Binding_Guidelines

  • Background and examples
  • Quality control checks now operational

2. 15 min summary from Jane/Chris on ontology development to enable users to retrieve information on specific chemical substrates and what impact this has on GO annotation.

3. 40 min to summarise decisions that have been made with respect to binding since the GOC meeting in March

4. 15 min for further discussion

Minutes 26-05-2010

Present: Emily Dimmer, Shur-Jen, Ruth Lovering, Ursula Hinz, Lakshmi Pillai, Serenella Ferro Rojas, Peter, Harold, Li, Pascale, Suzi

Chair: Emily

Minutes: Ruth

Minutes: 1A. From looking at the surve and subsequent discussions, it became evident that many curators were concerned with the idea of capturing in vivo targets/participant in column 16, where the curator would need to make the judgement call as to what would be the in vivo target of an assay. It was agreed that this idea should be laid to rest for the moment.(see Cross species experiments in minutes 30/04/2010 above) Binding_discussion_emails#Use_of_column_16_to_capture_in_vivo_participant

1B. It was agreed that Question 1 from the binding survey would be circulated to the annotation email list to find out what people think about all proteins identifiers present in column 8 being represented in column 16. It was proposed that this data could be automatically created by Mike Cherry via the GAF1 to GAF2 conversion script, so this would not be a financial burden to each MOD.


1B. Cross species experiments and capture of in vivo targets/participant in column 16

Survey suggests people do not want to follow suggestions Ursula pointed out that GO has evidence supported statements whereas this would not be evidence supported. Harold agreed with Ursula.


2. Use of column 16 to identify specific targets of molecular function annotations Binding_discussion_emails#Using_column_16_in_conjunction_with_an_catalytic_activity_annotation (this is covered by Question 3 and 4 on the binding survey).

Question 3 from survey: It was agreed that the proposal: The target of an enzyme activity (when not specifically implied by the GO term) should be included in column 16 should be circulated to the GO annotation email list.

Question 4 from survey: It was agreed that Emily would write this as a proposed binding QC, with a link so that people can check their annotations to catalytic activity using IPI evidence code and decide if they would be happy for these annotations to be prohibited or not.

3. Suggestion to obsolete gene specific terms such as P53 binding, a lot of debate about this. In general it was agreed that 'family' binding terms were useful, such as actin binding, but there was no agreement about the usefulness of binding terms such as P53 binding. Harold pointed out how the current system makes it hard for users to find all 'actin binding' proteins as some will be annotated to 'protein binding' IPI WITH actin protein ID, whereas others are annotated to 'actin binding'. Emily pointed out that QCs could be created to pick up these inconsitancies, eg adding 'actin binding' to any proteins binding an 'actin protein ID'. It was agreed that this issue would be discussed at another binding call with GO editors present to explain the rational here.

4. no other issues to be discussed

5. The outline proposed agenda for the GO camp was agreed, with the exception that the 'in vivo' targets would not be included. Ruth, Ursula and Emily to circulate examples that will be used in the GO camp to the binding group before the camp, for comments.

GO Camp agenda

Minutes: Damien Lieberherr and Jim Hu

1. (Ursula) 20 min to go over current GO guidelines on binding, see Binding_Guidelines

  • Background and examples
  • Quality control checks now operational

2. (Chris) 15 min summary of ontology development to enable users to retrieve information on specific chemical substrates and what impact this has on GO annotation.

3. (Ruth) 40 min to summarise decisions that have been made with respect to binding since the GOC meeting in March

4. (Ursula/Ruth) 15 min for discussion of annotation examples

Agenda for discussion 02-07-2010

1. Confirmation of guidelines and QC statements for GOC website

2. Inclusion of an exception to the guidelines to resolve the conflict between the two guidelines/QCs:

Use the ‘with’ column or column 16 to add information to the annotation, only if this information is not included in the GO term and/or definition. All IPI annotations should be reciprocal,

Proposed exception to guidelines: Prioritise using the IPI evidence code rather than the IDA evidence code even when the protein ID is inferred by the GO term and/or definition, to follow protein binding guideline that all IPI annotations should be reciprocal.

For example:

Gene Product GO term and ID Evidence Code Gene product in 'WITH' column Notes
Tp53 GO:0008134 transcription factor binding IPI Trp73 none
Trp73 GO:0002039 Tp53 binding IPI Tp53 rather than IDA and no 'with'


3. How specific to make substrate/product target information? I think this was discussed at a GOC meeting, with the decision that this would rely on curator judgement and restriction to in vivo targets. However I can't find this documented anywhere.

4. What if paper (1) shows ligand binding, and paper (2) shows enzyme activity, where the ligand=substrate. Should the curator delete the ligand binding annotation in line with current policy?

5. Propagation of CHEBI IDs in function ontology to process terms?

Minutes 02-07-2010

Ruth, Pascale, Peter, Harold, Jim, Shur-Jen, Tom, Serenella, Li Ni, Debby, Ursula

1. Confirmation of guidelines and QC statements for GOC website

Comments:

  • Jim to qualify the use of the with column with IPI, whereas column 16 is pre/post composed GO term
  • Delete: The GO is committed to ‘annotating to the experiment’.
  • For general GO guidelines Add: The curator should annotate the 'intention of the experiment’ and implied in vivo targets.
  • Concern about the inclusion of 'catalytic subunit' to the second sentence. Keep the statement more vague for now but ask the Protein complex working group to review this paragraph, ask Chris if a reasoner could be applied to these annotations.
  • Debby to incorporate suggestions from this call
  • Discuss guidelines again before these are put on GOC web site.

2. Inclusion of an exception to the guidelines to resolve the conflict between the two guidelines/QCs:

Use the ‘with’ column or column 16 to add information to the annotation, only if this information is not included in the GO term and/or definition. All IPI annotations should be reciprocal,

Proposed exception to guidelines: Prioritise using the IPI evidence code rather than the IDA evidence code even when the protein ID is inferred by the GO term and/or definition, to follow protein binding guideline that all IPI annotations should be reciprocal.

For example:

Gene Product GO term and ID Evidence Code Gene product in 'WITH' column Notes
Tp53 GO:0008134 transcription factor binding IPI Trp73 none
Trp73 GO:0002039 Tp53 binding IPI Tp53 rather than IDA and no 'with'
  • Actin binding: Pascale does not want to use with column because it just used for 'experimental' not in vivo. However many groups would rather annotate to the specific target included in the with column. IMEX discussions may lead to these statements being further defined.
  • ACTION item, groups to discuss how a consistant approach to annotation of the experimental target such as actin 1.

3. How specific to make substrate/product target information? I think this was discussed at a GOC meeting, with the decision that this would rely on curator judgement and restriction to in vivo targets. However I can't find this documented anywhere.

  • Agreed that statement needed to confirm curator judgement required.

4. What if paper (1) shows ligand binding, and paper (2) shows enzyme activity, where the ligand=substrate. Should the curator delete the ligand binding annotation in line with current policy?

  • Redundancy, consistancy or comprehensive?
  • Agreed that IC for binding based on enzyme activity is redundant. Should we remove all IC to binding based on enzyme activites?
  • Action item: discuss with GOC to comprehensively capture binding experiments rather than not capture this has been shown in experimental data.

5. Propagation of CHEBI IDs in function ontology to process terms?

  • not discussed

Agenda for discussion 12-07-2010

1. Confirmation of guidelines and QC statements for GOC website

2. Other discussion items only if time

  • How to tackle the list of unresolved issues?
  • Propagation of CHEBI IDs in function ontology to process terms?
  • When to request a new catalytic activity or binding GO term rather than to provide specificity using Column 8 (with)/column 16?

minutes from discussion 12-07-2010

Present: Ruth Lovering (Chair), Peter D'Eustachio, Debby Siegele, Rebecca Foulger (Minutes), Jodi , Shur-Jen, Jim Hu, Li

1. Confirmation of guidelines and QC statements for GOC website.

Ruth read through the guidlines at: http://wiki.geneontology.org/index.php/Talk:2010_GO_camp_binding_documentation_issues#Debby_8_Jul_2010_draft_of_Binding_Policy_combining__Jim_and_Ruth.27s_edits_and_last_week.27s_discussion

We went through each sentence in turn to keep/modify, and Debby has since updated the wording of the guidelines as we agreed in the call.

Key points:

  • The wording needs to be consistent in the naming of column 8 ('with column') and column 16 ('annotation extension column').
  • The original wording stated that the 'with column' could only be used with IPI, IC, IEA, IGI, or ISS evidence codes. Li pointed out that MGI use allele identifiers in column 8, with the IMP evidence code. Although this is outside of the binding documentation, it was confirmed after the call that this annotation policy is valid and in the guidelines (http://www.geneontology.org/GO.evidence.shtml#imp).
  • Debby will rewrite the 4th paragraph to make it clearer to curators when they should be using column 8 vs column 16. It currently reads: 'Thus, the annotation of Protein A to GO:0005515:protein binding with evidence code IPI and Protein B in the with/from column (8) makes the statement that Protein A has the molecular function of "interacting selectively and non-covalently with any protein or protein complex…". The function being annotated is not selective binding of Protein B.'
  • The examples will move to be closer to the paragraph that describes them.
  • Jodi asked whether the documentation was for curators or users, as wording will vary accordingly.
  • For cross-species binding annotations, it was noted that many databases (including MGI) cannot make reciprocal cross-species interaction annotations. In addition there was concern that in some cases it may not be appropriate to make the reciprocal cross-species annotation. In example 1 in the binding guideline documentation binding guideline documentation human p300 was shown to bind specifically to Drosophila histone H3, so human EP300(Q09472) is annotated as GO:0042393:histone binding IPI WITH:Drosophila histone H3(P02299). Is it appropriate to annotate Drosophila histone H3(P02299) as GO:0035035 histone acetyltransferase binding IPI WITH:human EP300(Q09472). This also touches on Pascale's point that perhaps some annotations should be made using the IDA code if the bound target is representing a class of proteins rather than a specific protein, eg binding to human H2AFB1 (H2A histone family, member B1) to represent binding any histone H2A rather than specifically this histone.
  • Sanity checks on reciprocal binding annotations will move to be an unresolved issue.
  • For the sentence: 'The with/from column (8) and the annotation extension column (16) should be used only for direct interactions and only when the binding relationship is not already included in the GO term and/or definition. See column16 documentation for relationship types to use when adding IDs in the annotation extension column (16)'. If the 'with column' can only be used for direct interactions, what happens for a pull-down where you can't tell if it's direct or indirect? Ruth pointed out that this comes down to curator judgement, and they'll leave the wording for now.
  • By the end of the week, we need to come up with 2 clear examples (send to Ruth) of when to use column 16.

Agenda for discussion 23-07-2010

1. Reinstate quality control procedure: Reciprocal annotations for protein binding should be made

  • a) How to address cross species reciprocal annotations when many MODs cannot create these
  • b) Should curators be concerned about whether the reciprocal annotation is likely to be relevant in vivo, how will this be decided? eg does Drosophila histone bind orthologs of those bound by human histones?
  • c) If the identity of the protein is known should it always be included in the 'with' column? eg should protein ID be included in column 8 when intent of the experiment is to show binding to a class of proteins rather than a specific protein (eg histone or actin binding).


2. How to tackle the list of unresolved issues?

  • Volunteers to suggest a guideline to address the issues to make discussions quicker
  • Prioritisation

Comment from Sandra Orchard (IntAct): one way of dealing with the issue is that the MODS do not add such entries - if such papers are sent to IntAct we can export the interactions to GO with a true reflection of how the experiment was done and the MODS can then each decide whether to import cross species interactions or not.

Minutes 23-07-2010

Present: Ruth Lovering (Chair), Rama, Jodi, Jim Hu, Li, Becky

1. Reinstate quality control procedure: Reciprocal annotations for protein binding should be made

  • a) How to address cross species reciprocal annotations when many MODs cannot create these

Jim: wanting to propagate the inference not the evidence (column 8) put experimental substrate in column 8 and in vivo substrate in column 16. Physiologically relevant experiments should only be captured.

Rama: tool being developed to create these reciprocal annotations. This was a soft QC not a hard QC, so inferences will be generated by script. Upto MODs as to whether they include these annotations.

Jim: Intraspecies should always aim to do reciprocal annotation.

Becky: use of column 16 and reciprocal annotations are related but distinct issues.

Rama: How is IMEX handling this?

Action Item Rama: to find out how IMEX discussions going.

Action Item Ruth: rewrite the QC and circulate (maybe Doodle poll) so that this can be added to the guidelines.

  • b) Should curators be concerned about whether the reciprocal annotation is likely to be relevant in vivo, how will this be decided? eg does Drosophila histone bind orthologs of those bound by human histones?
  • c) If the identity of the protein is known should it always be included in the 'with' column? eg should protein ID be included in column 8 when intent of the experiment is to show binding to a class of proteins rather than a specific protein (eg histone or actin binding).

2. How to tackle the list of unresolved issues?

  • Volunteers to suggest a guideline to address the issues to make discussions quicker
  • Prioritisation

It was agreed that new working groups should be set up to tackle individual unresolved issues.

  • Li pointed out that Propagation of CHEBI IDs in function ontology to process terms is being covered by David and Harold and that
  • Removing protein complexes in MF and BP ontologies has been discussed by Harold and others in emails.
  • Becky requested that When to request a new catalytic activity or binding GO term rather than to provide specificity using Column 8 (with)/column 16 was prioritised as decision needed urgently on this.
  • Jim agreed to look at protein C-terminus/ N-terminus binding
  • Becky to look at Enzyme binding see wiki enzyme binding discussion.

Action item Ruth: rearrange unresolved issues into subheadings, and create new section for unresolved issues being dealt with by other (new) working groups.

Action item Rama: prioritise unresolved issues and request new working groups are established.

Action item Rama: create a wiki with instructions on how to search archived emails.

Agenda for discussion ?-2010

1. Review minutes (action items) from 23-07-2010

2. Enzyme binding see wiki enzyme binding discussion

Issues raised which have been resolved without the creation of new guidelines

1. How to ensure that ATPases are included in a set of “ATP-binding” proteins? Ontology development needed to represent ATP binding always associated with a specific function, non-heme iron as cofactor or zinc-dependent alcohol dehydrogenases. (Chris/Jane's new 'has part relationship' will address this issue)

2. New binding guideline not to create redundant binding annotations will lead to a lack of coherence between manual (experimental) GO annotation and annotation inferred from electronic annotation (IEA). How is this to be addressed? (Chris/Jane's new 'has part relationship' will address this issue)

3. The use of IC (inferred by curator) to enable curators to annotate to a binding term where catalytic activity has been shown, but no binding assays were performed. (Chris/Jane's new 'has part relationship' will address this issue)

4. Can we guide curator judgement on the interpretation of the boundary between binding and catalysis or is there a legitimate hybrid boundary region? (Chris/Jane's new 'has part relationship' will address this issue)

5. Should we distinguish substrate binding from effector binding? (Chris/Jane's new 'has part relationship' will address this issue)

6. Can we develop a way to annotate the "process" relationships with the various "molecular functions"? (Cross-ontology reglationships will address this issue)

Unresolved issues

1. protein C-terminus/ N-terminus binding [2] Jim to Chair

I'm not sure that this needs much more than what I just added to the SF item. --JimHu 16:45, 23 July 2010 (UTC)

2. Enzyme binding. See this SF item. Should eznyme binding be only to the catalytic subunit in multimeric enzymes, or does e.g. binding to regulatory subunits also count as enzyme binding? What about if the regulatory subunit is not in complex with its catalytic subunit (e.g. bound to an inhibitor)? Do we need an enzyme binding term at all, or would protein binding + WITH column be sufficient? Jane & Becky to Chair

3. Impact of collaboration with the protein-protein interaction curation community IMEx on protein binding guidelines Rama to Chair

  1. Use of multiple ids in the 'with' field for binding assays - to indicate binary or 1:many interactants. Currently binary interactions shown by piped values, 1:many interactants by comma-separated list. Should we restrict binding annotations to just binary interactions?
  2. How does IMEx deal with cross species experiments
  3. If the identity of the protein is known should it always be included in the 'with' column? eg should protein ID be included in column 8 when intent of the experiment is to show binding to a class of proteins rather than a specific protein (eg histone or actin binding).

Unresolved issues being discussed by other working groups

1. Transferring cross species information by ISS and inclusion of non-in-vivo targets in column 8 or 16 Column 16 working group

2. Can / should the GO hierarchy be used to accommodate catalogues of specific molecules and their behaviors, if not by a core group of GO annotators then by collaborating groups? What if there were 40 distinct substrates identified, all physiologically relevant in some instance (or more likely, all tested in vitro and possibly physiologically relevant) is this full list going to be added to column 16? As we accumulate more and more high-throughput data we are going to need a much better way of dealing with this. Column 16 working group

3. Annotations to GO:0008144 drug binding. New working group - Drug Binding

4. Priority: Scope of protein binding terms in GO: [[3], [4]) and when to request a new catalytic activity or binding GO term rather than to provide specificity using Column 8 (with)/column 16? for example New working group or Column 16 working group?

5. Propagation of CHEBI IDs in function ontology to process terms David and Harold resolving

6. Should all substrate/target-binding information should go in Column 16 or should protein substrate/target information be put in Column 8? Column 16 working group

7. Removing protein complexes in MF and BP ontologies? Protein complex Working group

  • MF: 'homodimer/heterodimer' molecular functions,
  • versus BP: protein oligomerization and children (protein homooligomerization , protein tetramerization, etc)
  • versus CC annotations
    • Example: ALAD Human; "Human PBGS purifies with eight Zn(II) per homo-octamer" PMID: 11032836
    • Proposal: move the term ' protein oligomerization' to cellular component ontology as 'protein oligomer'
  • Those terms represent related entities; so they should be in a single ontology if the ontologies are supposed to be orthogonal. It seems bad ontological practice, and leads to redundant annotations to F/P/C.
  • Do people agree that we try and move all the MF/BP 'components' to CC?

8. structural constituents terms Protein complex Working group

  • Are structural constituents terms helpful for genes lacking a catalytic activity? for example 'structural constituent of ribosome'.
  • What do we want users to do with this information? This is already captured in 'complexes'.
  • Can we live with some subunits of complexes having no known MF? (remember they have a BP)
  • the 'scaffold' function of actin seems legitimate; however this has been shown. For most of the ribosomal subunits, we dont know which one has a role in the structure of the complex. Or, we can say that any protein in any complex provides structure, and create MF functions for all PC terms.

Agreed annotation policy

Binding_Guidelines

As many terms in the Molecular Function ontology implicitly or explicitly imply the binding of a chemical or protein, it is unnecessary to co-annotate a gene product to a term from the binding node of GO to describe the binding of substrates or products that are already adequately captured in the definition of the Molecular Function term.

For instance, an enzyme MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a transporter MUST bind the molecules it transports. Therefore, as binding is implied, curators should avoid making redundant annotations.

Future ontology development efforts should be relied upon to improve the searching capability of any user using GO who is specifically interested in gene products carrying out a certain type of substrate/product binding.

There will be some cases, however, where it is appropriate to annotate a binding relationship. For example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity. In such a case, it would be appropriate to annotate to GO:0005524 ATP binding using an IDA evidence code.

The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible; use the binding term if the experiment shows binding directly, don’t use the binding term if the experiment shows catalysis, but not the specific binding activity.

The above paragraph indicates that curators will want to include additional information in their annotations where the definition of an associated Molecular Function term is unable to adequately describe the specific substrate/target being bound, and where the request of a more-specific Molecular Function term would be considered inappropriate. The annotation extension column (16) can be used to capture this information. However, use the ‘with’ column (8) or annotation extension column (16) to add information to the annotation, only if this information is not included in the GO term and/or definition.

The annotation extension (column 16) should only be used for direct (target of catalytic activity (using relationship ontology).

Curators should use their judgment to decide whether the interaction is physiologically relevant and capture information relevant to the in vivo situation, not artificial substrates.

Annotations to the protein binding terms should be maximally informative. Where possible the precise identity of the interacting protein should be captured in the 'with' field of an annotation. Similarly, usage of child terms that describe a particular class of protein binding (e.g. receptor tyrosine kinase binding) should be applied in preference to the parent term 'protein binding'; GO:0005515.

Ongoing relevant ontology development Has_part relationships provide links to implied substrate binding (Chris and Jane are developing has_part relationships to implying substrate binding) existing GO to follow this new format eg Transcription factor activity has_part DNA binding. Request new 'has_part' relationships (and terms) if these do not exist.

Proposed additions to annotation policy

  1. Quality Control procedure: IPI evidence code annotations must be to the molecular function 'binding' (GO:0005488) term or child terms (if SGD annotation review supports this).
  2. To be proposed to GOC: What if paper (1) shows ligand binding, and paper (2) shows enzyme activity, where the ligand=substrate? Should the curator delete the ligand binding annotation in line with current policy? discussed 02-07-2010
    1. Add to QC: Remove all IC to binding based on enzyme activites, once binding part_of parent is created
    2. Comprehensively capture binding experiments.
  1. Soft Quality Control procedure: reciprocal annotations for protein binding should be made

This rule applies to GO:0005515 and it's descendants when the IPI evidence code is used. Some MODs can't annotate non-MOD proteins so cannot reciprocally annotate cross-species experiments. Plus if the experiment uses a conserved protein, eg histone from a very distantly related species is the reciprocal annotation really physiologically relevant?

Examples (papers) and discussion of GO annotation issues

Agreed Quality Control procedures

Annotation_Quality_Control_Checks

1. No use of the 'NOT' qualifier with 'protein binding'; GO:0005515.

2. Annotations to 'protein binding'; GO:0005515, should only be supplied with an evidence code where the interactor can be identified in the 'with' field

3. Reciprocal annotations for protein binding should be made

4. Annotations to 'protein binding' should not use the ISS evidence code

Suggestions for Quality Control procedures

Annotation_Quality_Control_Checks

1. Only use the IEP evidence code with terms from the Biological Process Ontology

2. Curators should not use the IPI evidence code along with catalytic activity molecular function terms

3. Annotation Intersection Alerts



Back to 2010_GO_camp_Meeting_Agenda