RefGenome12Jun07 Phone Conference (Archived)
- 1 Present
- 2 Agenda
- 3 Discuss agenda items for meeting
- 3.1 1. Strategies for identifying orthologs
- 3.2 2. How to prioritize disease genes
- 3.3 3. How to assess the progress made towards curation of reference genome genes; strategies for improvement
- 3.4 4. Discussions regarding metrics, including making a plan for how to use metrics
- 3.5 5. Review of progress toward database and tool development (software group)
- 3.6 6. Annotation consistency discussion
- 3.7 7. Other points
- 4 Review annotation stats
- 5 Regular monthly phone conference. Use to review stats, open ref genome source forge items
- Rex Chisholm (dictyBase)
- Petra Fey (dictyBase)
- Pascale Gaudet (dictyBase)
- Karen Christie (SGD)
- Ruth Lovering (HGNC)
- Fiona McCarthy (AgBase)
- Judy Blake (MGI)
- David Hill (MGI)
- Harold Drabkin (MGI)
- Mary Dolan (MGI)
- Emily Dimmer (GOA)
- Kimberly Van Auken (wormbase)
- Donghui Li(TAIR)
- Doug Howe (zfin)
- Susan Tweedie (flybase)
Discuss agenda items for meeting
- We should be able to have the meeting at Princeton after the GOC meeting
1. Strategies for identifying orthologs
- now = YOGY, inparanoid, treefam
- one issue is about using consistent strategies
- we'd like to call in an expert at the meeting to help us, either Richard Durbin (pfam, treefam), Erik Sonnhammer (pfam, in paranoid), Paul Thomas (Panther)
- Judy: orthology analysis: we want to have tools but we need to make sure we dont inappropriately infer information from that
- Rex: it's about specificity
- KarenC: different tools give different results; I'd like to know why and understand how those tools work
- Judy: I completely agree. We need to have by the meeting a white paper about how those tools work to provide a basis for discussion
- Kimberley: also noticed different results with different tools
- Emily: GOA has started to transfer electronic annotations; it's hard to keep track of the orthology information with different genome versions, especially for multispecies databases
- MaryD/David: we'd like to to produce a tool that would help make ISS annotations
- Emily: if this is automated, it should be IEA
- Rex: I think we should look at the superset of all the orthologs identified by different tools and processes.
- Val Wood (extra comments by email): I have been using all of the predictors for many years. In summary (details later) they all have false positives and false negatives, and so are useful in different respects. However, Treefam, because it uses full alignments and phylogenetic methods, has much higher accuracy, and I'd recommend we use Treefam first as standard (rather than the predictors included in YOGY; KOGS, Homologene, Inparanoid, Orthomcl).
Obviously organisms which have synteny with human would probably be better to use in combination with this. Using any of the predictors for the myosins (at least for fungal genomes) gives nonsense. This is often the case for coiled coil or repetitive or low complexity proteins with any reciprocal blast based method.
The most likely issues we would have with Treefam are missed ortholog calls for divergent orthologs, but this will mainly affect the fungal genomes. When Treefam misses predictions (I have found examples where the fungal out groups are missing, but usually the predictors miss these too), I can submit the relevant out groups and the families can be updated. Treefam has many other advantages over the predictors, and should allow you to spot lineage specific gene losses by inspecting the trees.
I am in the process of checking if any fungal orthology calls are missed by Treefam in the current reference genome set, and I will report back on this. So far, I haven't found any problems with Treefam (other than the myosins), but there are a few fungal outgroup omissions.
ACTION ITEM: Judy, Petra, Karen, DongHui, Kimberley and Val will write a white paper with an overview of how the different tools work, algorithm explanations, order of preference and pitfalls in identifying orthologs
2. How to prioritize disease genes
- Emily: will NCBI display that set of genes? This is a nice morbid map to provide and would give publicity
- Ruth: write a paper?
- Rex: important good addition
- Emily: do we have a target number of genes?
- Rex, Judy: all disease genes (~20,000)
- Emily: do the genes have to be in morbid map?
- Pascale: if there is a paper, then it's a good target gene. Data must be convincing (not just expression)
- Judy: Should we think of new ways to target new genes? including complex diseases
ACTION ITEM: Rex, Pascale, Emily will summarize the strategy used to identifying target genes and suggest possible improvements
3. How to assess the progress made towards curation of reference genome genes; strategies for improvement
- We need to have a way to measure progress; right now it's rather crude. The data Rex sent last week was just counting how many genes each database has looked at. It included genes with no orthologs. See Status of Reference Genome Project, June 5, 2007
- David: We leave the lines with no orthologs blank.
- Rex: We had agreed to write it down that we checked
- We should have a monthly meeting and everyone would provide stats
- Synchronicity of curation: it would be more helpful if we were all curating at the same time
- (a few) perhaps there are too many genes per month?
- Rex: we discussed how many papers needs to be curated at the last GOC meeting (ie, we dont need to do everything)
ACTION ITEM: Rex, Karen, Emily, Susan will write a list of information needed from curators to measure annotation progress
4. Discussions regarding metrics, including making a plan for how to use metrics
ACTION ITEM: Rex, Judy, Ruth will summarize ideas on to how to capture breath and depth of annotations
5. Review of progress toward database and tool development (software group)
ACTION ITEM: Mary, Sohel, Chris will send a summary of what is being done by the software group
- See wiki page  for Sohel's outline of use cases for the reference genomes database. He plans to have a prototype available in early July.
- Please be sure to consult the Reference Genome Database Requirements Discussion page  to add your input.
6. Annotation consistency discussion
- Karen: John Mullen annotation consistency study --we'll see if data is available
- Kimberley: maybe hard to compare across such different organisms
- Petra: Maybe we can look at a couple of specific disease genes that have been annotated and compare
ACTION ITEM: Pascale, Donghui will provide a framework for discussing consistency of annotations across genomes
7. Other points
- Karen E wants a gff3 file for the sequence of all organisms in the Reference Genome project.
- Judy: she wanted that for some tool development.
ACTION ITEM: Susan and Rex will present suggestions for publicizing the project and developing a web presence
Review annotation stats
Regular monthly phone conference. Use to review stats, open ref genome source forge items
- Every second Tuesday of each month; alternate times: 8 AM/11AM Pacific time
- Next meeting: Tuesday July 10, 1 PM CDT (11 AM PDT, 7 PM BST)
- Tentative agenda: 20070710_ReferenceGenomeCall