Reference Genome minutes (Archived)

From GO Wiki
Revision as of 07:38, 2 October 2007 by Pascale (talk | contribs)

Jump to: navigation, search

Minutes for the reference genome meeting, September 26-27, 2007

Orthology determination

Kara Dolinski

Background information

  • ‘Aggregator’ tools:
    • YOGY (PMID 16845020; not really maintained; has all except chicken and zebra fish – methods include KOGs, InParanoid, homologene, orthoMCL and a table of curated orthologs between budding yeast and fission yeast.),
    • bioPIXIE (PMID 16420673; Princeton), a data intergration approach: ; incorporates data from several methods to generate a ‘probability of orthology’ with Troyanskaya. Use same protein sets with all the algorithms and update as required. Agreed as a good idea (see action item).
    • P-POD (PMID 17712414; (based on OrthoMCL ( and Jaccard Coefficient Cluster.
    • problem is that none of them doing exactly what they wanted… There is no gold standard set. It is sometimes necessary to manually look for an ortholog when no tool finds them (short proteins, for example, or divergent, like E, coli proteins.
    • Another problem is the way databases handle orthologs – the mouse schema can’t cope with many to many relationships.
    • Methods to do orthology comparisons (see slide)

Comparison of tools is made more difficult due to:

  1. different species are covered by different tools
  2. problems with inconsistent use of identifiers (Treefam is a mess - MA) (Emily points out that UniProt and ensembl have joined forces in trying to reduce differences (in proteins?) and fill in holes in UniProt – doing human first with mouse second on the list for clean up.)
  3. that not all sets are based on the same proteins due to varying frequency of updates/maintenance.

- Quality of ortholog tools: PMID 17440619: assessing performance of orthology detection methods