Talk:2010 GO camp Meeting Agenda: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
Line 494: Line 494:
===How is Downstream Effect defined (Rachael and Varsha)===
===How is Downstream Effect defined (Rachael and Varsha)===


Rachael and Varsha: Annotating to downstream processes
'''Rachael and Varsha: Annotating to downstream processes'''
Minutes: Yasmin  & Ursula
Minutes: Yasmin  & Ursula


Line 500: Line 500:
Examples (1-4): see presentation
Examples (1-4): see presentation


Discussion of Survey (see presentation)
*Discussion of Survey (see presentation)


Everybody does at least occasionally annotate down-stream processes.
Everybody does at least occasionally annotate down-stream processes.
Line 506: Line 506:
Most participants felt that annotating down-stream effect was ok, when no other information was available. Many participants felt it would be desirable to revise such annotations at a later time, but that this was not always feasible for various good reasons (see presentation)
Most participants felt that annotating down-stream effect was ok, when no other information was available. Many participants felt it would be desirable to revise such annotations at a later time, but that this was not always feasible for various good reasons (see presentation)


Proposed guidelines:
'''Proposed guidelines:'''


*'''Guideline 1''': Request new, specific terms describing a process involved in another process. Example: for growth factor BMP2 that regulates cardiac cell differentiation, it is more informative to use a composite term, such as “regulation of transcription involved in cardiac cell differentiation” as opposed to using two unlinked terms, e.g. “regulation of transcription” and “regulation of cardiac cell differentiation”. (The terms do not exactly match the case of BMP2).


*Guideline 1: Request new, specific terms describing a process involved in another process. Example: for growth factor BMP2 that regulates cardiac cell differentiation, it is more informative to use a composite term, such as “regulation of transcription involved in cardiac cell differentiation” as opposed to using two unlinked terms, e.g. “regulation of transcription” and “regulation of cardiac cell differentiation”. (The terms do not exactly match the case of BMP2).
*'''Guideline 2''': for small scale experiments one should annotate to the experimental evidence in the paper. However, use curator judgment, and also take account of the quality of the evidence, etc.  
 
*Guideline 2: for small scale experiments one should annotate to the experimental evidence in the paper. However, use curator judgment, and also take account of the quality of the evidence, etc.  


If a gene product has a central role affecting multiple down-stream processes one should only annotate the core process. When a gene product is specific for a particular pathway and/or has just a few targets, one should annotate the down-stream processes.
If a gene product has a central role affecting multiple down-stream processes one should only annotate the core process. When a gene product is specific for a particular pathway and/or has just a few targets, one should annotate the down-stream processes.


Discussion of examples:  
'''Discussion of examples:'''


a) yeast RNA polII subunit should only be annotated to the core process.  
a) yeast RNA polII subunit should only be annotated to the core process.  
Line 529: Line 528:
Li: ontology developers in group should discuss this ACTION ITEM:
Li: ontology developers in group should discuss this ACTION ITEM:


*Guideline 3: If a gene product has limited experimental literature, such as a newly characterized protein, it is acceptable to annotate to more general 'downstream' process
*'''Guideline 3''': If a gene product has limited experimental literature, such as a newly characterized protein, it is acceptable to annotate to more general 'downstream' process
 
Lively discussion of example of RNA polII subunit: should one keep the experimental annotation (indirect effects)?  
Lively discussion of example of RNA polII subunit: should one keep the experimental annotation (indirect effects)?  


Mike A: rpb2 is required for every transcription process; it is not useful to list indirect effects. The gene product should be annotated to the core process using ISS, and the phenotype-based experimental annotation should be removed. Describing the k/o phenotype is not informative.
Mike A: rpb2 is required for every transcription process; it is not useful to list indirect effects. The gene product should be annotated to the core process using ISS, and the phenotype-based experimental annotation should be removed. Describing the k/o phenotype is not informative.


Comment: if it has a specific effect, one should keep both specific down-stream effect and description of core process.
*COMMENT: if it has a specific effect, one should keep both specific down-stream effect and description of core process.


Kimberly: rpb2 annotation originated from phenotype to GO mappings (ISS). We will review the pipelines issue.  
Kimberly: rpb2 annotation originated from phenotype to GO mappings (ISS). We will review the pipelines issue.  
Line 568: Line 568:
Judy: we need to clarify what is the appropriate use of these evidence codes
Judy: we need to clarify what is the appropriate use of these evidence codes


*Guideline 4: annotation of ligand receptor signaling pathways (intercellular vs. intracellular)
*'''Guideline 4''': annotation of ligand receptor signaling pathways (intercellular vs. intracellular)
For intercellular signaling, the ligand is part of the pathway. For intracellular signaling, the ligand regulates the pathway.
For intercellular signaling, the ligand is part of the pathway. For intracellular signaling, the ligand regulates the pathway.


Line 579: Line 579:
Becky: ligand is part of pathway. The pathway ends when response is initiated
Becky: ligand is part of pathway. The pathway ends when response is initiated


CONSENSUS: need to clarify where pathways start and end.  
'''CONSENSUS''': need to clarify where pathways start and end.  


ACTION: take intracellular example back to signaling group for clarification.
'''ACTION''': take intracellular example back to signaling group for clarification.


Going through slides showing (simplified) insulin receptor pathway and NF-kappa-B pathway: everybody agreed.
Going through slides showing (simplified) insulin receptor pathway and NF-kappa-B pathway: everybody agreed.
Line 597: Line 597:
Suggested Quality Control checks: (see presentation).
Suggested Quality Control checks: (see presentation).


Discussion of survey example:
'''Discussion of survey example:
 
'''
*Question 1: Functions of the deubiquitinating protease CYLD  
*'''Question 1''': Functions of the deubiquitinating protease CYLD  


Most (almost 90%) would annotate to core process = regulation of microtubule organization
Most (almost 90%) would annotate to core process = regulation of microtubule organization
Line 610: Line 610:
Ursula: survey result may reflect limited time available for doing the survey. Reading several papers on the subject makes a difference.
Ursula: survey result may reflect limited time available for doing the survey. Reading several papers on the subject makes a difference.


*Q2: Bre1-Histone H2B monoubiquitination regulates histone H3 methylation
*'''Q2: Bre1-Histone H2B monoubiquitination regulates histone H3 methylation
 
'''
Most (85%) of the participants chose a “histone ubiquitination” term
Most (85%) of the participants chose a “histone ubiquitination” term


Line 634: Line 634:
Paul: allow distinction between core process and regulation of core process
Paul: allow distinction between core process and regulation of core process


*CONCLUSION: Ontology should be revised, annotation checked.
*'''CONCLUSION''': Ontology should be revised, annotation checked.


*ACTION POINTS:  
*'''ACTION POINTS''':  
**revise process terms for transcription
**revise process terms for transcription
**define start and end points of signaling processes
**define start and end points of signaling processes

Revision as of 08:38, 21 June 2010


Day 1 morning session

9:00 Introductions and objectives of the meeting

  1. Introductions & Logistics: Serenella Ferro Rojas
  • Poll for Thursday lunch reservations, depending on weather.
  • Dinner at Brasserie la Bourse on the Carouge
    • ~ 1.9 km from meeting site

Friday Reception at noon for Amos Bairoch celebration of the Otto Naegeli prize.

Introductions

Goals: Pascale Gaudet

GO – Ontology, annotation, tools and technical aspects

Chairs: Serenella Ferro Rojas and Pascale Gaudet

GO overview

An introduction to the GO ontology : terms, definitions, synonyms, relationships, cross-products. Jane Lomax

  • Inter-ontology links
    • Most tools don't make inferences across the ontoogies. Make redundant annotations.
    • Cross products
      • between GO ontologies
      • external ontologies (cell ontology; CHEBI)
  • Ontology development
    • large scale targeted projects
    • logical consistency
    • small scale requests (Sourceforge tracker; future via Amigo)

Q/A: classical relationships (e.g. part_of within an ontology) are subset of cross-products.

Annotation Process

General overview of the annotation guidelines used by GO, and contributing resources. Rama Balakrishnan
    • Annotation guidelines

Goal:say as much as possible about a gene product. Be useful to bench and computational biologists.

  • GO annotation: Gene product association with GO terms and other info.
    • Core
      • gene product identifiers
      • GO term
      • Reference
      • Evidence code
    • Additional info
      • qualifiers
      • with/from
      • Annotation detail (16)
      • Isoform
  • Sources
    • Manual
    • Automated
    • PAINT (new)
      • inter-ontology inferences (new)

Differences between previous GO camps and this one. This one more internal and focused on strengthening guidelines.

  • Challenges ...
  • Avoiding redundancy.
    • Authoritative sources
      • no MOD - UniProt-GOA.


General overview UniProtKB/SwissProt manual annotation. Serenella
  • protein selected for manual annotation based on priorities
    • Recent papers chosen for high impact
    • Curation of specific processes (e.g ubiquitin-like conjugation)
    • User requests

Flow

  • sequence curation
    • One record for all different products for the same gene
  • Sequence analysis. - automated. manual checking. domains, ptms, etc.
  • Literature curation. Species, protein names, gene names, journals, tissues, plasmids
    • Store as comment lines free text with controlled tags(?)
    • Sequence annotation of features (relation to SO?)
    • GO annotation 50 curators, Automated: spkw2go, mappings2GO, etc.
  • Family-based curation
  • Attribution
  • QA and integration
    • e.g. throw error when nucleus kw for bacterial protein

Q: Isoforms?

A: linked to parent ID - ACCESSION_#

Q: Connection between references and items.

A: Findable in the XML. This is being retrofitted to older entries.

Q: What is the unit of annotation - Genes, isoforms?

A: Isoforms yes. Not yet things like cleavage products, but should be in the future.

Break

Binding documentation

Binding has been discussed at three consortium meetings.

Current guidelines

Ursula Hinz presents guidelines on binding annotation (see presentation)

  • Binding biological entity (not today)

Macromolecules (proteins)

Binding of macromolecules

  • If possible, use one of the numerous child terms of GO:0005515 protein binding
    • Protein binding should always be annotated with IPI evidence code
    • Curators must use the “with” column for interaction partner
    • Do not forget reciprocal annotation
  • Evidence
    • IPI for specific proteins
    • Use IDA evidence code if the partner cannot be identified, i.e. IDA for classes of protein
  • Annotation with IPI should not be propagated with ISS, but child terms can
  • No use of the NOT qualifier with GO:0005515 Protein binding because it means no interaction with other proteins in any circumstances
    • NOT with chilld terms is OK.


Small molecules

Binding small molecules

  • To avoid redundant annotation, GO terms for small molecule binding should not be annotated when they are already mentioned in the MF GO term

But sometimes it is not clear or not included in the description of the MF GO term, so it can be annotated (see example in the presentation)

    • avoid redundant annotation of substrates, including transporter substrates
    • e.g. ATP binding for ATPases (exceptions where hydrolysis not shown)
    • Example DNA demethylase/dioxygenase
      • are annotations to alkylated DNA binding, O2 binding etc. redundant.

Discussion

Q: protein binding - evidence that it does not bind a specific protein. Need a new GO term?

A: No. Use column 16 or create new GO term. Still in discussion. GO terms if the proteins can be put into groups. Don't want specific protein terms.

Q: What is wrong with having 25K GO terms?

A: Does it matter? May be able to do all PRO classes. Instantiate as needed.

Comment: NOT terms.. IntAct only annotates negative interactions for isoforms where a different isoform has a positive isoform. Negatives are not exported to GO.

Judy summary: discussion of are we going to instantiate lots of protein binding terms. PRO families could be used for terms. Column 16 could be used for NOT and specific isoforms.

Emily: some things are not well captured by GO.

Is there possible redundancy if there is annotation of the MF without experimental evidence and the indication of the target binding in column 16 (e.g. the target protein is a transcription factor and MF term is transcription factor binding without evidence)? Is this a source of inconsistency between organism-specific annotation?

The level of experiments is different among organisms (e.g. yeast vs human) which implies different ways of doing annotation. This is not seen as a negative point.

Annotation extension discussion

Ruth

  • Annotation extension = column 16
  • Should only be used for direct targets.
  • Examples
    • Co-IP. Lnx-I and Boz. Use two txn factor binding annotations with IPI and with for partner.
      Q: Do we need exp evidence that (e.g.) Boz is a txn factor?
      A: curator judgement at present. Rama: SGD would read the paper and make check other annotations of Boz, not just based on assertion in the paper. Same paper does not have to show Boz is a txn factor. Ruth: in humans, would use sequence analysis, e.g. domains. Actually SGD doesn't annotate protein binding.

Paul: Annotations for the target must exist somewhere. Does this create redundancy to annotate binding to proteins of function X where target has function X?

Jane: Won't always be function terms. e.g. LIM binding domain binding.

Ruth: GOC still needs more discussion.

Judy: no inconsistency in what SGD does and what Ruth does. Annotations are consistent but SGD chooses different annotations to make. MODs bring specific special experimental strengths. This is a difference, not an inconsistency.

Mike L.: Biogrid curation does a lot of this. How much can be transferred. Ruth: more on this later.

  • Column 16 example: Lnx-1 ubiquitinates Boz but not Gsc.
    • Annotation. Lnx-1 has ubiquitin-protein ligase activity IDA Col 16:Boz
    • Annotate preteen ubiquitination IDA w/o target.

Q: problem of propagation across species. Col 16 identifier is species-specific.

A: Transferring from human to mouse. Use col 16 or not?

One problem raised with the column 16 is the annotation propagation by ISS, because the ID used in column 16 is species specific. Alternatives:

  • Column 16 should be excluded of propagation by ISS, which is consistent with the current ISS procedure for with/from
  • Column 16 should use protein classes from sources like PRO to allow propagation

Q: is this redundant annotation of enzyme substrates?

A: No, we are doing substrate binding if the GO term does not provide the information.

Judy: knowledge statements vs description of the experiment.

Jim: column 16 post composition is equivalent to creation of a precomposed term, so ISS should be allowed (as appropriate, depending on whether the 16 ID is a class vs a specific product).

Paul: Think in terms of how we will do this with PAINT. We are annotating to ancestor nodes.

Comment: is the discussion generalizing? More general solution is to associate records with an external reference. Relational structure problem. In terms of binding let the protein interaction databases handle these.

Several people suggest that we should not have terms like "txn factor binding".

Ruth: Quick summary

  • Use with term with IPIs if the GO term definition does not provide information
  • Use column 16 for target
  • In disagreement about propagation of column 16 by ISS
  • Ideally info from with or col 16 to make inferences about the function of the protein. Other functions could come from other annotations of the target.

Kimberly: this has major implications for display. Keep the more specific terms (at least for now).

Ruth: enumeration of the kinds of targets could make things less clear.

When not to use Col 16

  • For indirect targets
  • FGF2 -> receptor -> phosporylation of Erk2 goes up. Erk2 is NOT a direct target of FGF2. Activation goes via Ras.

Ruth gives an example when annotators should not use column 16 (see presentation). She mentioned that the relationship ontology is in a renaming process. The relationship ontology with has_input (substrate) and has_output (product) with the CHEBI IDs in column 16 represents complicated way of annotation. To simplify the annotation, it is proposed not to use relationship ontology and a column 16 containing RHEA ID (reaction DB) which gives substrate and product information.

The annotation rules specify that catalytic activity terms should not be annotated with the evidence code IPI. There are 144 of these annotation in GO DB and 88 are from SGD. The evidence code IMP is stronger and should be preferred for the annotation. However, particular cases can occur and they have to be considered individually.


Col 16 relationship ontology

  • has_participant
    • has_input
    • has_output

Relationships go along with the ID in Col 16.

Usage

  • Lnx-1 is_a ub protein ligase IDA has_input Boz.

Col 16 and CHEBI

Concerning the annotation of small molecule binding, the idea is that they could be mentioned in column 16 of a MF term which does not already described the molecule in its definition. There can be inconsistency when annotating calcium binding (small molecule binding), because calcium binding can be required for the function or not. This calcium binding issue has to be discussed further.

Annotation in the column 16 provides a certain level of knowledge (e.g. the function of the target protein is known) which could be also displayed. What should be annotated in column 16 and how far to go (e.g. annotation of small molecule binding with CHEBI ID) and where to stop? There are concerns on how far to push up the annotation in GO regarding what GO has been defined for: describe what the genes are doing.

Example: steroid hydroxylase.

  • CYP11B2 is_a steroid hydroxylase activity IDA has_input CHEBI:16827 Corticosterone
  • CYP11B2 is_a steroid hydroxylase activity IDA has_output CHEBI:16827 Aldosterone

Where do we draw the lines with respect to specificity continues to be an issue of discussion.

Kimberly: Connections between CHEBI IDs and process terms - how will these be handled by GO. Will CHEBI IDs in function ontology propagate to process terms.

IPI and catalytic activity. Deprecate these?

  • Rama: in SGD these came from combination of IPI and IMP evidence (Editorial comment: this is because SGD doesn't do GO:0005515).

Binding is not sufficient to infer activity by itself. GO does not capture multiple experiments in a single annotation. This is a general problem.

Judy: rules are made to be broken. (!)

Interaction with the IMEx consortium.


Survey responses

See slides.

Results of the survey

Part2

  • Consistency of the annotators on evidence code usage, but difference in MF terms annotation (parent vs child term)

Part1

  • Seems ok to use column 16 in case of MF term, but not in case of BP term

Possible action items

More discussion by the working group:

  • ISS propagation of binding across species requires additional discussion. Should column 16 identifier be to a class. Should column 16 be transferred in ISS transfer.
  • CHEBI IDs and process terms - how will these be handled by GO. Will CHEBI IDs in function ontology propagate to process terms.

Day 1 afternoon session

Annotation and Annotation Propagation

HAMAP presentation (Alan Bridge)

Questions

Rama: How do you know which annotations are propagated and which derived from literature?

Alan: By the evidence tags, e.g. references, by similarity etc.

Paul: You said that you don’t propagate isoforms?

Alan: Isoform information is sourced from TrEMBL, we don’t project any isoform information

Judy: How does UniProt envisage to integrate their system with all the other available orthology prediction sources, to ensure that everyone works with a common set of proteins/families for GO annotation propagation?

Suzi: There is an initiative to create a common set of sequences in a common set of species to start building orthology groups. A set of species has been prepared by Dan Barrell at the EBI.

Judy: this effort needs to understand its relationship with other propagation methods

Pascale: Rolf participated in QFO meetings, the current session is only to highlight the differences between methods

Alan: In a first step, UniProt will also compare the output of their annotations with those produced by the Reference genome project using PAINT on selected protein families.

Judy: HAMAP and Quest for Orthologs both have related groupings. Sets of proteins with similarities, what is your global view. The utility of this effort is integration into global network

Alan: We have integrated into InterPro, and see several trends emerging, from this we are separating into groups

Paul: If groups want to use HAMAP will they have to fill out identity card for their species in order for it to work properly

Alan: You can either specify a species most closely related to yours or can ask a curator to fill one in for you as it is a closed system and is quite involved process

Suzi: There will be a follow up meeting for QFO next year, other groups can join in and contribute

Compara presentation (Javier Herrero)

Questions

Judy: What is the source statement for GO annotations derived from Compara and how can all these annotations be retrieved?

Emily: Compara annotations are in the GOA database, there is a GOref 19 specific for Compara-derived annotations and their annotations are present in the UniProtKB-GOA GAF.

Reference Genome presentation (Pascale)

Questions

How are the ‘high quality’ protein sets defined that are used by the project?

The sequences are from different sources for the different species and are put in a standard format using UniProtKB accession numbers.

Tree-based GO annotation presentation (Paul)

Questions

Cecilia: Which GO term to choose to annotate nodes of common ancestors? Is it better to use less specific GO terms to be able to move up to a higher node in the phylogenetic tree?

Paul: It’s better to annotate to the most specific term possible (explained in more detail in the PAINT demo presentation of Mike)

PAINT demonstration (Mike)

Questions

Judy: Concerned that correcting already existing GO annotations on proteins by going back to already curated papers during the process of annotating a tree with PAINT may be too time consuming and is not very efficient.

Cecilia: When single sequences below an annotated node are deselected for GO annotation propagation (because of curator judgement), how are these ‘negative’ GO annotations shown to the user? Is it more useful to not have an annotation there or to have a NOT annotation there?

Paul/Mike: There are two possibilities. On the one hand, if annotation propagation has been deselected because of rapid divergence of a branch, the annotation is not shown at all in the concerned entries. If the annotation propagation has been deselected because of missing critical residues in the sequence, the GO annotation is propagated with a ‘NOT’ qualifier and is available to the user.

'Response to' terms

Pascale’s presentation: http://wiki.geneontology.org/images/9/9b/WG-Response-to-Becky-Pascale.pdf. The aim of the working group is to improve the representation of biological responses. This has a lot of overlap with downstream events and signalling.

1. Definition is very wide The current GO definition of “response to stimulus” is shown on slide 3. This is a very wide definition and the term is being over-annotated as the definition is very broad. Slide 7 shows numbers of annotations to some high-level “response to” terms. There are a lot of child terms under these high-level terms which should be used if possible rather than annotating to the high-level terms. This doesn’t currently affect many annotations but annotation to high-level terms should be avoided in the future. Judy: We seem to spending a lot of time discussing a small number of annotations. And the annotations to high-level terms are not wrong. Curators wouldn’t use a high-level term if they can use a more specific one. Rama: Sometimes curators use high level terms to group a number of child terms. Kimberly: It’s not always clear when to create new terms. Paul: ‘response to stress’ means a response to at least one stress. If the response is to more stresses, we should annotate to each stress. Judy: Agrees with this. GO is now 12 years old. If there are few annotations, they are legacy and are fine. Pascale: Would like the guidelines clarified for future use. Judy: High-level terms haven’t been used much. Pascale: We need to be careful about grouping stresses to a parent term as the parent terms then mean 2 different things. This is a general issue with GO. For example, DNA-binding can be annotated to both positive and negative strands. Binding to the parent term is not the same as annotating to multiple child terms. Judy: Agree and need to clarify this if it is a confusing issue. Li: If something is a general core factor and annotated to lot of child terms, is there a danger of over-annotating? Pascale: If this is what it does, it’s not wrong to annotate to all the child terms. Summary of above discussion from Pascale: Avoid annotating to high-level terms if possible. Annotating to child terms is preferable and is not equivalent to annotating to the parent term. Proposal 1: High-level ‘response to’ terms should not be used.

Day 2 Morning

Binding continued

Summary of ontology development

Chris Mungall presents rules for binding propagation (see presentation)

In the case of transcription factor activity which has DNA binding as parent, will it go to the same format? This has to be considered.

  • It has been decided to add a has_part relationship as a link in the ontology.
  • The propagation of has part relationship is not suitable in all cases (see example given in the presentation) and this makes the rules more difficult.

Example G capable_of ATPase activity -> G capable_of ATP binding

  • Materialize relationships at central location

Workflow:

  • Curator annotates to ATPase activity
  • GAF pipeline materializes ATP binding using same EC
  • Reimport allows query against ATP binding query to recover ATPases etc.
    • Q: does redundancy of annotation raise issues? Probably not?
  • Alternatives
    • Navigation via CHEBI too complex.
    • is_a between AATPase activity and ATP binding

Automated population of ontology using intersection_of terms ... has_input + has_output The has_part links will be mainly populated automatically in the ontology using MF X CHEBI logical definition, but this can generates errors. Also it is important to stick with the original evidence code and original PubMed ID which gives the possibility to go back and have the ATP binding.

Concerning the problems of propagation of has_part, why do not use a link like “necessitate” ? This could be an alternative.

Ontology will contain information to relieve annotators of making redundant annotations.

Q: How will the chain of evidence work for the materialized ATP binding added to the GAF. A: original EC, reference, and ...?

Q: Look at other ontologies, e.g. txn factors. A: Don't want txn factor as a child of binding.

Q: is materializing a permanent solution? A: See later discussion.

Extended GO

  • Problem of software development assumes prior version of GO structure
  • Links are only in GO_ext files.
  • Future: more links. Software will have to catch up.
  • Materialization service for function to process links

Column 16

  • Want to limit prcomposition
  • Annotate as if relationships are there

Syntax:

relation (class)
  • When to request new term vs use col16 - would the term make sense in an enrichment analysis
  • Reasoner can find equivalent terms if they exist, and materializer will add lines to the GAF.

Column 17

Isoforms. No time to discuss

Discussion

  • Extensions provide greater expressivity
  • Possibility of expressing things different ways, but reasoner can link synonymous annotations made in different ways by annotators.

Q: relationship matrix? A: this exists in part

GO browsers

Rachael Huntley

  • AmigGO
  • QuickGO

AmiGO

Live demo

  • Gene search
  • Term search
    • View direct or include annotations to child terms
  • More tools
    • GOOSE: SQL environment
      • precomposed SQL query list. Can request new ones via help
    • GO slimmer
    • Visualization - input GOIDs and see relationships
    • OpenSearch - Browser widgets and OSX dashboard
    • Homolog Set Summary - for reference genomes
  • AmiGO labs - more stuff
    • Cross-product term request will issue GOIDs for specific types of cross-products (regulation, part_of, downstream process terms)
    • Coannotation - see genes annotated to two GO terms

QuickGO

EBI

  • Gene search
    • download options, web services
  • Term search also shows co-occurrence with other terms. Default EC selection was discussed.
  • Annotation views have filtering options.
    • Unlike current AmiGO, taxon filtering uses hierarchical relationships.

Annotation of complexes

Minutes by Kristian Axelsen and edited by Mike Livstone

Quick summary of session: There has been a need to address the following situation: Complexes are multiprotein machines that carry out a specific process or reaction. While it is clear that there should be annotations to the process for the catalytic subunit, there is a desire to annotate, using experimental evidence codes, other subunits in the complex based on their membership in the complex. One proposal has been to create a new experimental evidence code "ICM" (Inferred from Complex Membership). The general consensus in the session was that this type of inference should not be made and, as a consequence, ICM should not created.


More detailed notes:

The background for the sessions at this GO camp is that, after making group annotation sessions of groups of 5-10 genes, it was always the same 3 types of problems that appeared.

So the working groups were created to identify the issues, improve annotation, make annotation guidelines, and provide QC checks.

Bernd presented the current situation with a very broad definition of a complex, but stressed that "complex" terms should be defined so that they could be used in other organisms and not only in the organism where they were first seen.

Current Guidelines by Ontology:

  • CC: gene products can be annotated to complexes; "colocalizes_with" qualifier also allowed. (slides 8, 9)
  • MF and BP: Gene products are not annotated to complexes
  • MF allows "contributes_to" in the context of a complex (slide 10)
  • MF: catalytic and regulatory subunits can get different annotations (slide 20)

The use of contributes_to was discussed in the MF ontology. This was to be used for essential subunits only.

Annotations to MF should NOT be done based on IPI alone.


A lot of the discussion in the working group was concerned with how to annotate the subunits which are not responsible for the catalytic activity.


Working group suggestion: to create a new evidence code: ICM (Inferred from Complex Membership)

(Note: The consensus at the end of discussion was not to create this code.)

Furthermore, it was urged that annotators are better at putting "unknown" as MF if this is the case. It is acceptable not to know.

General consensus: We need to be more conservative when assigning MFs

This would also be more in line with the biologists' view.


Working group suggestion: From the evidence code documentation (IDA): "a fractionation experiment might provide "direct assay" evidence that a gene product is in the nucleus, but "protein interaction" (IPI) evidence for its function or process." Proposal 2: Remove this statement from the annotation documentation

General consensus: This statement should be removed (this was also a conclusion from the Binding session).

An important example that was discussed: Yeast RNA polymerase II vs. III. PolII is much better studied, and subunits that are indispensable for PolII function are annotated to transcription with "contributes_to." In contrast, the same level of detaile is not available for PolIII, so all subunits get contributes_to transcription. This reflects the level of understanding for both complexes, but does not sit well with many curators because it means that in cases where we know less, we make more annotations.

Summary (by Paul Thomas): We would like to be able to annotate entire complexes to MF and BP. For single gene products we should only annotate a MF for the subunits essential for the complex activity.

The use of contributes_to was raised. Pascale said incautiously that personally, she would have no problem getting rid of contributes_to.

Again, it should only be used for MF annotations of the subunits essential for activity.

Minute taker's comment (KA): This is perhaps an issue for the next camp/the continued work of the working group

Another issue: When MF terms are added to a complex based on early experiments. When more detailed knowledge appears and terms are added, it should be possible (more easy) to remove the old annotations when they have been added by different groups.

Misc.:

  • Michael pointed out that ICM really is an ISS inference
  • Paul says we need to be able to annotate complexes directly, the same way we annotate gene products.


Day 3 Morning

How is Downstream Effect defined (Rachael and Varsha)

Rachael and Varsha: Annotating to downstream processes Minutes: Yasmin & Ursula

  • Definition of down-stream process, as proposed by work group - everyone thinks this is OK

Examples (1-4): see presentation

  • Discussion of Survey (see presentation)

Everybody does at least occasionally annotate down-stream processes.

Most participants felt that annotating down-stream effect was ok, when no other information was available. Many participants felt it would be desirable to revise such annotations at a later time, but that this was not always feasible for various good reasons (see presentation)

Proposed guidelines:

  • Guideline 1: Request new, specific terms describing a process involved in another process. Example: for growth factor BMP2 that regulates cardiac cell differentiation, it is more informative to use a composite term, such as “regulation of transcription involved in cardiac cell differentiation” as opposed to using two unlinked terms, e.g. “regulation of transcription” and “regulation of cardiac cell differentiation”. (The terms do not exactly match the case of BMP2).
  • Guideline 2: for small scale experiments one should annotate to the experimental evidence in the paper. However, use curator judgment, and also take account of the quality of the evidence, etc.

If a gene product has a central role affecting multiple down-stream processes one should only annotate the core process. When a gene product is specific for a particular pathway and/or has just a few targets, one should annotate the down-stream processes.

Discussion of examples:

a) yeast RNA polII subunit should only be annotated to the core process.

b) for proteins associated with the yeast spliceosome, annotation describing indirect effects has been removed.

c) S.pombe sre1 (direct transcriptional regulator of genes which have a role in heme and lipid biosynthesis): new terms should be requested, e.g. “Regulation of transcription involved in heme biosynthesis“

Li - Are we ready to go for this transcriptional regulation process in the GO - directed at Chris: everything involves transcriptional regulation - does GO want to represent this?

Chris - yes - we should represent this. For the time being we should use precomposed terms. Use AmiGO Labs to request terms. Later it may be possible to use column 16 instead.

Li: ontology developers in group should discuss this ACTION ITEM:

  • Guideline 3: If a gene product has limited experimental literature, such as a newly characterized protein, it is acceptable to annotate to more general 'downstream' process

Lively discussion of example of RNA polII subunit: should one keep the experimental annotation (indirect effects)?

Mike A: rpb2 is required for every transcription process; it is not useful to list indirect effects. The gene product should be annotated to the core process using ISS, and the phenotype-based experimental annotation should be removed. Describing the k/o phenotype is not informative.

  • COMMENT: if it has a specific effect, one should keep both specific down-stream effect and description of core process.

Kimberly: rpb2 annotation originated from phenotype to GO mappings (ISS). We will review the pipelines issue.

Kimberly: How can one connect the core process with the biological knowledge? This is what is being tested in C.elegans. We need feedback

Sylvain: should one have GO terms for knockout data? Propose to use them only if there is further experimental characterization of the gene product. There are multiple phenotypes for any mutation, especially if these affect an important gene product. It is not the goal of GO to describe phenotypes.

Li: in this case this is a core process, but when the underlying function of a gene product is unknown, then making these annotations will give more information for the user

Many participants agreed with the above statements. But: it really depends on the MODs whether they want to keep the annotation or not.

Mike A: proposes to delete the evidence code IMP. IMP should be used very sparingly

Pascale: annotation based only on mutants may be misleading. In such cases, we’d need further information.

Kimberley: but users may want that information. If we can use a different evidence code then that will be welcome.

Rachael - if you didn't do IMP then what would you use? IDA?

Sylvain: HTP data effects were all annotated to different development processes

Pascale at organismal level it's hard to annotate directly

Michael: IMP is an absolutely valid code in some process - need to set a boundary for when to use it for capturing phenotypes.

Mutant data can be essential and have yielded precious information.

Phenotypes should be captured using existing phenotype descriptions, and maybe by a dedicated database.

Need to take into account if there is a paper discussing the mutant phenotype, or if this is stand-alone (HTP) data. We want to capture what the authors are saying, and what was accepted for publications by the reviewers.

Judy: we need to clarify what is the appropriate use of these evidence codes

  • Guideline 4: annotation of ligand receptor signaling pathways (intercellular vs. intracellular)

For intercellular signaling, the ligand is part of the pathway. For intracellular signaling, the ligand regulates the pathway.

Pascale: this is confusing.

Becky: the goal is to avoid over-annotation. Is the ligand part of the pathway?

Becky, Pascale: Yes. Varsha: this is another discussion.

Becky: ligand is part of pathway. The pathway ends when response is initiated

CONSENSUS: need to clarify where pathways start and end.

ACTION: take intracellular example back to signaling group for clarification.

Going through slides showing (simplified) insulin receptor pathway and NF-kappa-B pathway: everybody agreed.

Becky: a lot of the time you don't know that the stimulus/ligand/receptor is involved in multiple pathways would prefer to request new terms e.g. X signaling involved in Y pathway

Pascale: likes this representation and is useful for helping to think about how to correctly represent the biology

Varsha: signaling diagram will be presented to the signaling group

Present summary slide (documentation) for dealing with cases such as RNA polII subunit (guideline 3): if reasonable, remove annotation of downstream effects once core activity is known. But: there may be good reasons to keep such annotation. It really depends on the case, and on the contributing MOD (see presentation).

Suggested Quality Control checks: (see presentation).

Discussion of survey example:

  • Question 1: Functions of the deubiquitinating protease CYLD

Most (almost 90%) would annotate to core process = regulation of microtubule organization A sizeable minority also selected downstream effects.

Ursula’s example. Ursula: was strongly inclined to stick to “regulation of microtubule cytoskeleton organization”. This protein regulates everything, and it does not make sense to annotate “everything”.

Rama - SGD would be cautious too

Ursula: survey result may reflect limited time available for doing the survey. Reading several papers on the subject makes a difference.

  • Q2: Bre1-Histone H2B monoubiquitination regulates histone H3 methylation

Most (85%) of the participants chose a “histone ubiquitination” term

65% chose also histone methylation. Most participants would like “Regulation of histone methylation”, or a new term such as “Histone monoubiquitination involved in regulation of histone methylation”.

Val: would annotate to main activity of enzyme but also regulation - has many pombe papers with experimental evidence

Problem: “Regulation of pathway X” is part of pathway X. People are not sure how to display the experimental data.

Rachael, Sylvain: it is important to avoid ending up with all histone-modifying proteins having exactly the same annotation. They have distinct activities, and this should be shown by the annotation.

Val: will ask people to look into this.

Sylvain: each histone modification affects other histone modifications (positive and negative regulation). There are about 80 histone-modifying enzymes, and each has down-stream effects on other histone modifications.

Ruth: these are process terms and users would find it useful to know which genes are involved in this process: how to convey the information?

Sylvain: Propose to change the definitions of existing terms, so that ubiquitination includes effect on subsequent methylation, etc.

Sylvain: Propose to split “Regulates” into cases where “Regulates process X” is a part of process X, and cases, where “Regulates process X” is NOT part of process X? Is this possible?

Paul: allow distinction between core process and regulation of core process

  • CONCLUSION: Ontology should be revised, annotation checked.
  • ACTION POINTS:
    • revise process terms for transcription
    • define start and end points of signaling processes