2010 GO camp Geneva minutes

From GO Wiki
Jump to navigation Jump to search

Day 1 morning session

9:00 Introductions and objectives of the meeting

  1. Introductions & Logistics: Serenella Ferro Rojas
  • Poll for Thursday lunch reservations, depending on weather.
  • Dinner at Brasserie la Bourse on the Carouge
    • ~ 1.9 km from meeting site

Friday Reception at noon for Amos Bairoch celebration of the Otto Naegeli prize.

Introductions

Goals: Pascale Gaudet

GO – Ontology, annotation, tools and technical aspects

Chairs: Serenella Ferro Rojas and Pascale Gaudet

GO overview

An introduction to the GO ontology : terms, definitions, synonyms, relationships, cross-products. Jane Lomax

  • Inter-ontology links
    • Most tools don't make inferences across the ontoogies. Make redundant annotations.
    • Cross products
      • between GO ontologies
      • external ontologies (cell ontology; CHEBI)
  • Ontology development
    • large scale targeted projects
    • logical consistency
    • small scale requests (Sourceforge tracker; future via Amigo)

Q/A: classical relationships (e.g. part_of within an ontology) are subset of cross-products.

Annotation Process

General overview of the annotation guidelines used by GO, and contributing resources. Rama Balakrishnan
    • Annotation guidelines

Goal:say as much as possible about a gene product. Be useful to bench and computational biologists.

  • GO annotation: Gene product association with GO terms and other info.
    • Core
      • gene product identifiers
      • GO term
      • Reference
      • Evidence code
    • Additional info
      • qualifiers
      • with/from
      • Annotation detail (16)
      • Isoform
  • Sources
    • Manual
    • Automated
    • PAINT (new)
      • inter-ontology inferences (new)

Differences between previous GO camps and this one. This one more internal and focused on strengthening guidelines.

  • Challenges ...
  • Avoiding redundancy.
    • Authoritative sources
      • no MOD - UniProt-GOA.


General overview UniProtKB/SwissProt manual annotation. Serenella
  • protein selected for manual annotation based on priorities
    • Recent papers chosen for high impact
    • Curation of specific processes (e.g ubiquitin-like conjugation)
    • User requests

Flow

  • sequence curation
    • One record for all different products for the same gene
  • Sequence analysis. - automated. manual checking. domains, ptms, etc.
  • Literature curation. Species, protein names, gene names, journals, tissues, plasmids
    • Store as comment lines free text with controlled tags(?)
    • Sequence annotation of features (relation to SO?)
    • GO annotation 50 curators, Automated: spkw2go, mappings2GO, etc.
  • Family-based curation
  • Attribution
  • QA and integration
    • e.g. throw error when nucleus kw for bacterial protein

Q: Isoforms?

A: linked to parent ID - ACCESSION_#

Q: Connection between references and items.

A: Findable in the XML. This is being retrofitted to older entries.

Q: What is the unit of annotation - Genes, isoforms?

A: Isoforms yes. Not yet things like cleavage products, but should be in the future.

Break

Binding documentation

Binding has been discussed at three consortium meetings.

Current guidelines

Ursula:

  • Binding biological entity (not today)

Macromolecules (proteins)

    • specific proteins vs. protein classes vs. protein domains
  • GO:0005505 must be with IPI and reciprocal annotation should be made.
  • Use child terms
  • Evidence
    • IPI for specific proteins
    • IDA for clases of protein
  • Propagation
    • GO0005515 should not be propagated via ISS.
    • propagation of child term annotations is OK
  • Do not use NOT with GO:0005515
  • NOT with chilld terms is OK.

Small molecules

  • avoid redundant annotation of substrates, including transporter substrates
    • e.g. ATP binding for ATPases (exceptions where hydrolysis not shown)
    • Example DNA demethylase/dioxygenase
      • are annotations to alkylated DNA binding, O2 binding etc. redundant.

Discussion

Q: protein binding - evidence that it does not bind a specific protein. Need a new GO term?

A: No. Use column 16 or create new GO term. Still in discussion. GO terms if the proteins can be put into groups. Don't want specific protein terms.

Q: What is wrong with having 25K GO terms?

A: Does it matter? May be able to do all PRO classes. Instantiate as needed.

Comment: NOT terms.. IntAct only annotates negative interactions for isoforms where a different isoform has a positive isoform. Negatives are not exported to GO.

Judy summary: discussion of are we going to instantiate lots of protein binding terms. PRO families could be used for terms. Column 16 could be used for NOT and specific isoforms.

Emily: some things are not well captured by GO.

Annotation extension discussion

Ruth

  • Annotation extension = column 16
  • Should only be used for direct targets.
  • Examples
    • Co-IP. Lnx-I and Boz. Use two txn factor binding annotations with IPI and with for partner.
      Q: Do we need exp evidence that (e.g.) Boz is a txn factor?
      A: curator judgement at present. Rama: SGD would read the paper and make check other annotations of Boz, not just based on assertion in the paper. Same paper does not have to show Boz is a txn factor. Ruth: in humans, would use sequence analysis, e.g. domains. Actually SGD doesn't annotate protein binding.

Paul: Annotations for the target must exist somewhere. Does this create redundancy to annotate binding to proteins of function X where target has function X?

Jane: Won't always be function terms. e.g. LIM binding domain binding.

Ruth: GOC still needs more discussion.

Judy: no inconsistency in what SGD does and what Ruth does. Annotations are consistent but SGD chooses different annotations to make. MODs bring specific special experimental strengths. This is a difference, not an inconsistency.

Mike L.: Biogrid curation does a lot of this. How much can be transferred. Ruth: more on this later.

  • Column 16 example: Lnx-1 ubiquitinates Boz but not Gsc.
    • Annotation. Lnx-1 has ubiquitin-protein ligase activity IDA Col 16:Boz
    • Annotate preteen ubiquitination IDA w/o target.

Q: problem of propagation across species. Col 16 identifier is species-specific.

A: Transferring from human to mouse. Use col 16 or not?

Propagation of Col 16 issues. Binding across species requires additional discussion. Should column 16 identifier be to a class. Should column 16 be transferred in ISS transfer. More discussion!

Q: is this redundant annotation of enzyme substrates?

A: No, we are doing substrate binding if the GO term does not provide the information.

Judy: knowledge statements vs description of the experiment.

Jim: column 16 post composition is equivalent to creation of a precomposed term, so ISS should be allowed (as appropriate, depending on whether the 16 ID is a class vs a specific product).

Paul: Think in terms of how we will do this with PAINT. We are annotating to ancestor nodes.

Comment: is the discussion generalizing? More general solution is to associate records with an external reference. Relational structure problem. In terms of binding let the protein interaction databases handle these.

Several people suggest that we should not have terms like "txn factor binding".

Ruth: Quick summary

  • Use with term with IPIs if the GO term definition does not provide information
  • Use column 16 for target
  • In disagreement about propagation of column 16 by ISS
  • Ideally info from with or col 16 to make inferences about the function of the protein. Other functions could come from other annotations of the target.

Kimberly: this has major implications for display. Keep the more specific terms (at least for now).

Ruth: enumeration of the kinds of targets could make things less clear.

When not to use Col 16

  • For indirect targets
  • FGF2 -> receptor -> phosporylation of Erk2 goes up. Erk2 is NOT a direct target of FGF2. Activation goes via Ras.

Col 16 relationship ontology

  • has_particiipant
    • has_iinput
    • has_output

Relationships go along with the ID in Col 16.

Usage

  • Lnx-1 is_a ub protein ligase IDA has_input Boz.

Col 16 and CHEBI

Example: stroid hydroxylase.

  • CYP11B2 is_a steroid hydroxyls activity IDA has_input CHEBI:16827 Corticosterone
  • CYP11B2 is_a steroid hydroxyls activity IDA has_output CHEBI:16827 Aldosterone

Where do we draw the lines with respect to specificity continues to be an issue of discussion.

Kimberly: Connections between CHEBI IDs and process terms - how will these be handled by GO. Will CHEBI IDs in function ontology propagate to process terms.


IPI and catalytic activity. Deprecate these?

  • Rama: in SGD these came from combination of IPI and IMP evidence (Editorial comment: this is because SGD doesn't do GO:0005515).

Binding is not sufficient to infer activity by itself. GO does not capture multiple experiments in a single annotation. This is a general problem.

Judy: rules are made to be broken. (!)

Interaction with the IMEx consortium.

Survey responses

See slides.

Day 2 Morning

Binding continued

Summary of ontology development

Chris Mungall presentation postponed from yesterday

  • addition of has_rapt for reactions
  • True path issues and propagation

Rules for has_part are more difficult than for other propagation.

Example G capable_of ATPase activity -> G capable_of ATP binding

  • Materialize relationships at central location

Workflow:

  • Curator annotates to ATPase activity
  • GAF pipeline materializes ATP binding using same EC
  • Reimport allows query against ATP binding query to recover ATPases etc.
    • Q: does redundancy of annotation raise issues? Probably not?
  • Alternatives
    • Navigation via CHEBI too complex.
    • is_a between AATPase activity and ATP binding

Automated population of ontology using intersection_of terms ... has_input + has_output

Ontology will contain information to relieve annotators of making redundant annotations.

Q: why not have a necessitates relationship. e.g. ATPase necessitates ATP_binding. That's an alternative (different from the is_a alternative?)

Q: How will the chain of evidence work for the materialized ATP binding added to the GAF. A: original EC, reference, and ...?

Q: Look at other ontologies, e.g. txn factors. A: Don't want txn factor as a child of binding.

Q: is materializing a permanent solution? A: See later discussion.

Extended GO

  • Problem of software development assumes prior version of GO structure
  • Links are only in GO_ext files.
  • Future: more links. Software will have to catch up.
  • Materialization service for function to process links

Column 16

  • Want to limit prcomposition
  • Annotate as if relationships are there

Syntax:

relation (class)
  • When to request new term vs use col16 - would the term make sense in an enrichment analysis
  • Reasoner can find equivalent terms if they exist, and materializer will add lines to the GAF.

Column 17

Isoforms. No time to discuss

Discussion

  • Extensions provide greater expressivity
  • Possibility of expressing things different ways, but reasoner can link synonymous annotations made in different ways by annotators.

Q: relationship matrix? A: this exists in part

GO browsers

Rachael Huntley

  • AmigGO
  • QuickGO

AmiGO

Live demo

  • Gene search
  • Term search
    • View direct or include annotations to child terms
  • More tools
    • GOOSE: SQL environment
      • precomposed SQL query list. Can request new ones via help
    • GO slimmer
    • Visualization - input GOIDs and see relationships
    • OpenSearch - Browser widgets and OSX dashboard
    • Homolog Set Summary - for reference genomes
  • AmiGO labs - more stuff
    • Cross-product term request will issue GOIDs for specific types of cross-products (regulation)