Reference Genome minutes (Archived): Difference between revisions

From GO Wiki

Jump to navigation Jump to search

Revision as of 11:37, 2 October 2007

Minutes for the reference genome meeting, September 26-27, 2007

Orthology determination

Kara Dolinski

Background information

Available tools:
- inparanoid (PMID: 15608241),
- homologene (PMID: 17170002): Does get updated but doesn’t have all species in; also doesn’t perform very well, reciprocal blast based and not phylogeny based.
- HCOP (PMID: 16284797),
- treefam (PMID: 16381935),
- Compara (no pub yet); produces trees
- OrthoMCL (PMID: 12952885)

‘Aggregator’ tools:
- YOGY (PMID: 16845020; not really maintained; has all except chicken and zebra fish – methods include KOGs, InParanoid, homologene, orthoMCL and a table of curated orthologs between budding yeast and fission yeast.),
- bioPIXIE (PMID: 16420673; Princeton), a data intergration approach: ; incorporates data from several methods to generate a ‘probability of orthology’ with Troyanskaya. Use same protein sets with all the algorithms and update as required. Agreed as a good idea (see action item).
- P-POD (PMID: 17712414; http://ortholog.princeton.edu/findorthofamily.html) (based on OrthoMCL (http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi) and Jaccard Coefficient Cluster.
- problem is that none of them doing exactly what they wanted… There is no gold standard set. It is sometimes necessary to manually look for an ortholog when no tool finds them (short proteins, for example, or divergent, like E, coli proteins.
- Another problem is the way databases handle orthologs – the mouse schema can’t cope with many to many relationships.

- Methods to do orthology comparisons (see slide)

Comparison of tools is made more difficult due to:

different species are covered by different tools
problems with inconsistent use of identifiers (Treefam is a mess - MA) (Emily points out that UniProt and ensembl have joined forces in trying to reduce differences (in proteins?) and fill in holes in UniProt – doing human first with mouse second on the list for clean up.)
that not all sets are based on the same proteins due to varying frequency of updates/maintenance.

- Quality of ortholog tools: PMID 17440619: assessing performance of orthology detection methods

Retrieved from "https://wiki.geneontology.org/index.php?title=Reference_Genome_minutes_(Archived)&oldid=10327"