2010 GO camp Geneva minutes

From GO Wiki
Jump to navigation Jump to search

Day 1 morning session

9:00 Introductions and objectives of the meeting

  1. Introductions & Logistics: Serenella Ferro Rojas
  • Poll for Thursday lunch reservations, depending on weather.
  • Dinner at Brasserie la Bourse on the Carouge
    • ~ 1.9 km from meeting site

Friday Reception at noon for Amos Bairoch celebration of the Otto Naegeli prize.

Introductions

Goals: Pascale Gaudet

GO – Ontology, annotation, tools and technical aspects

Chairs: Serenella Ferro Rojas and Pascale Gaudet

GO overview

An introduction to the GO ontology : terms, definitions, synonyms, relationships, cross-products. Jane Lomax

  • Inter-ontology links
    • Most tools don't make inferences across the ontoogies. Make redundant annotations.
    • Cross products
      • between GO ontologies
      • external ontologies (cell ontology; CHEBI)
  • Ontology development
    • large scale targeted projects
    • logical consistency
    • small scale requests (Sourceforge tracker; future via Amigo)

Q/A: classical relationships (e.g. part_of within an ontology) are subset of cross-products.

Annotation Process

General overview of the annotation guidelines used by GO, and contributing resources. Rama Balakrishnan
    • Annotation guidelines

Goal:say as much as possible about a gene product. Be useful to bench and computational biologists.

  • GO annotation: Gene product association with GO terms and other info.
    • Core
      • gene product identifiers
      • GO term
      • Reference
      • Evidence code
    • Additional info
      • qualifiers
      • with/from
      • Annotation detail (16)
      • Isoform
  • Sources
    • Manual
    • Automated
    • PAINT (new)
      • inter-ontology inferences (new)

Differences between previous GO camps and this one. This one more internal and focused on strengthening guidelines.

  • Challenges ...
  • Avoiding redundancy.
    • Authoritative sources
      • no MOD - UniProt-GOA.


General overview UniProtKB/SwissProt manual annotation. Serenella
  • protein selected for manual annotation based on priorities
    • Recent papers chosen for high impact
    • Curation of specific processes (e.g ubiquitin-like conjugation)
    • User requests

Flow

  • sequence curation
    • One record for all different products for the same gene
  • Sequence analysis. - automated. manual checking. domains, ptms, etc.
  • Literature curation. Species, protein names, gene names, journals, tissues, plasmids
    • Store as comment lines free text with controlled tags(?)
    • Sequence annotation of features (relation to SO?)
    • GO annotation 50 curators, Automated: spkw2go, mappings2GO, etc.
  • Family-based curation
  • Attribution
  • QA and integration
    • e.g. throw error when nucleus kw for bacterial protein

Q: Isoforms?

A: linked to parent ID - ACCESSION_#

Q: Connection between references and items.

A: Findable in the XML. This is being retrofitted to older entries.

Q: What is the unit of annotation - Genes, isoforms?

A: Isoforms yes. Not yet things like cleavage products, but should be in the future.

Break

10: 30 Binding documentation

Binding has been discussed at three consortium meetings.

Current guidelines

Ursula:

  • Binding biological entity (not today)

Macromolecules (proteins)

    • specific proteins vs. protein classes vs. protein domains
  • GO:0005505 must be with IPI and reciprocal annotation should be made.
  • Use child terms
  • Evidence
    • IPI for specific proteins
    • IDA for clases of protein
  • Propagation
    • GO0005515 should not be propagated via ISS.
    • propagation of child term annotations is OK
  • Do not use NOT with GO:0005515
  • NOT with chilld terms is OK.

Small molecules

  • avoid redundant annotation of substrates, including transporter substrates
    • e.g. ATP binding for ATPases (exceptions where hydrolysis not shown)
    • Example DNA demethylase/dioxygenase
      • are annotations to alkylated DNA binding, O2 binding etc. redundant.

Discussion

Q: protein binding - evidence that it does not bind a specific protein. Need a new GO term?

A: No. Use column 16 or create new GO term. Still in discussion. GO terms if the proteins can be put into groups. Don't want specific protein terms.

Q: What is wrong with having 25K GO terms?

A: Does it matter? May be able to do all PRO classes. Instantiate as needed.

Comment: NOT terms.. IntAct only annotates negative interactions for isoforms where a different isoform has a positive isoform. Negatives are not exported to GO.

Judy summary: discussion of are we going to instantiate lots of protein binding terms. PRO families could be used for terms. Column 16 could be used for NOT and specific isoforms.

Emily: some things are not well captured by GO.

Annotation extension discussion

Ruth

  • Annotation extension = column 16
  • Should only be used for direct targets.
  • Examples
    • Co-IP. Lnx-I and Boz. Use two txn factor binding annotations with IPI and with for partner.
      Q: Do we need exp evidence that (e.g.) Boz is a txn factor?
      A: curator judgement at present. Rama: SGD would read the paper and make check other annotations of Boz, not just based on assertion in the paper. Same paper does not have to show Boz is a txn factor. Ruth: in humans, would use sequence analysis, e.g. domains. Actually SGD doesn't annotate protein binding.

Paul: Annotations for the target must exist somewhere. Does this create redundancy to annotate binding to proteins of function X where target has function X?

Jane: Won't always be function terms. e.g. LIM binding domain binding.

Ruth: GOC still needs more discussion.

Judy: no inconsistency in what SGD does and what Ruth does. Annotations are consistent but SGD chooses different annotations to make. MODs bring specific special experimental strengths. This is a difference, not an inconsistency.

Mike L.: Biogrid curation does a lot of this. How much can be transferred. Ruth: more on this later.

  • Column 16 example: Lnx-1 ubiquitinates Boz but not Gsc.
    • Annotation. Lnx-1 has ubiquitin-protein ligase activity IDA Col 16:Boz
    • Annotate preteen ubiquitination IDA w/o target.

Q: problem of propagation across species. Col 16 identifier is species-specific.

A: Transferring from human to mouse. Use col 16 or not?

Propagation of Col 16 issues. Binding across species requires additional discussion. Should column 16 identifier be to a class. Should column 16 be transferred in ISS transfer. More discussion!

Q: is this redundant annotation of enzyme substrates?

A: No, we are doing substrate binding if the GO term does not provide the information.

Judy: knowledge statements vs description of the experiment.

Jim: column 16 post composition is equivalent to creation of a precomposed term, so ISS should be allowed (as appropriate, depending on whether the 16 ID is a class vs a specific product).

Paul: Think in terms of how we will do this with PAINT. We are annotating to ancestor nodes.

Comment: is the discussion generalizing? More general solution is to associate records with an external reference. Relational structure problem. In terms of binding let the protein interaction databases handle these.

Several people suggest that we should not have terms like "txn factor binding".

Ruth: Quick summary

  • Use with term with IPIs if the GO term definition does not provide information
  • Use column 16 for target
  • In disagreement about propagation of column 16 by ISS
  • Ideally info from with or col 16 to make inferences about the function of the protein. Other functions could come from other annotations of the target.

Summary of ontology development

Chris not available until after lunch