RefGenome07Aug07 Phone Conference (Archived)

From GO Wiki
Jump to navigation Jump to search

Present

  • Petra Fey (dictyBase)
  • Pascale Gaudet (dictyBase)
  • Karen Christie (SGD)
  • Ruth Lovering (HGNC)
  • Judy Blake (MGI)
  • David Hill (MGI)
  • Harold Drabkin (MGI)
  • Mary Dolan (MGI)
  • Emily Dimmer (GOA)
  • Kimberly Van Auken (wormbase)
  • Donghui Li(TAIR)
  • Susan Tweedie (flybase)
  • Val Wood (Sanger- pombe)
  • Victoria Petri (RGD)
  • Chris Mungall (NCBO)

Review Action Items

Tuesday August 7, 2007, 10 AM CDT (8 AM PDT, 4 PM BST)

  1. Judy, Petra, Karen, DongHui and Kimberley summarize how the different Tools for identifying orthologs work, algorithm explanations, order of preference and pitfalls in identifying orthologs
    • Judy recommended reading Alexeyenko, Lindberg, Perez-Bercoff and Sonnhammer (2006) Overview and comparison of ortholog datavases. Drug Discovery Today: Technologies, 3: 137.
  2. Annotation consistency/ quality control: Rex and Pascale will send all the curators on the RG mailing list a list of genes to verify; we'll see how it goes and discuss next time. See example ATPAF2 below
  3. Continue discussion on Outreach: publicizing the project and developing a web presence (lead by Susan and Rex)
  4. Continue discussion on Metrics: breath and depth of annotations (Rex, Judy, Ruth)
    • Not done; let's do it prior to the next meeting to be able to discuss it at the reference genome meeting at the end of September.

Curation QC

  • example: ATPAF2
  • The idea is each curator (or a small group of curators) could generate a page like this when curating a gene; other groups could consult the page when curating the ortholog from their database
  • Very heated discussion about the usefulness of ISS annotations in the ref genome annotation effort.
  • there were also concerns that this is not scalable (although if we're 10 curators and there are 20 new genes a month, that's two genes per curator)
  • Judy: To expand a little bit more on my take on the relationship between ISS and the RefGenome Project.

The Ref Genome Project is to provide comprehensive annotations for high priority genes. These annotations would then be useful for emerging genomes who would be able to jump start their genome annotation pipeline through the use of ISS annotations to the experimental data in a closely related taxon.

Note that for the emerging genomes, they would not be making use of the ISS annotations in the MOD file.

So we can look at the ISS annotations as more part of the MOD annotation strategy than as an objective of the Reference Genome Project.

Of course, as a curator is working on a gene, one works both from the Ref Genome perspective but also as a curator for the particular MOD group, and the curator would probably want to add appropriate ISS as they proceeded.

  • Pascale: the goal of that page was not about ISS (those are more visible than they should have been); it was to have an example of how one curator could have a look at all publications and annotations and comment.
  • Val: I spotted this today (Dolinski & Botstein 2007, PMID: 17678444), people might find the discussion of orthology and the lists of orthologous genes considered to have the same cellular roles useful.

The list is quite conservative though. I estimate >2300 are conserved 1:1 from pombe to human (and have a similar core cellular function), and around another 900 where it is more difficult to make functional transfer because they are members of multigene families which are likely have different/ specialized roles/ different substrates and affect different downstream processes (frequenctly signalling proteins/ kinases/ GTPases etc).

Many of these genes are annotated in higher organisms with roles inferred from mutant phenotype (for example roles in development gamete generation or positive regulation of body size ) without annotations to their core cellular roles (for example as part of the ER to Golgi transport machinery, ribosome biogenesis pathway) annotated. Often the researcher works usually on human/mouse but switches to one of the yeast species to do the experiments to identify the cellular role, but this annotation will never be added to the higher eukaryote if ISS's are not made from the yeasts.

Obviously annotation transfer needs to be used with caution, for instance nuclear processes, are more conserved than cytoskeletal processes. I can't actually think of an example where a conserved 1:1 ortholog that is a member of a conserved complex doesn't have the same basic cellular role from yeast to human.

It seems that people working on higher organisms would really need to have these core roles annotated when using GO for genome wide analysis and identifying the common /core reference genome annotations in the lower eukaryotes would be an ideal way to do this, rather than annotating to the unknown terms.

  • Mary Shimoyama: I think we have to keep in mind, the focus and interest of researchers using the higher organisms and their research approaches. Unlike with mouse, the GO annotations for rat do not usually come from mutant

phenotypes because of the lack of knockouts. However, as with mouse, researchers are primarily interested in a gene's relationship to phenotype and disease, or its involvement in a pathway. I would argue that for many of our users, the biological process annotations and cellular component annotations are more important than the molecular function ones. For so many of them, the GO annotations are a means to identify possible candidate genes from a larger list. If they have genes of interest for which there are no mammal GO annotations, they will go to Amigo or to the literature if they desire to get information on molecular function found in other organisms, but by and large, they are primarily interested in information from rat, mouse and human and this is why we use ISS annotations among the 3 species and not beyond. I believe it is important to provide access to the annotations across many species, but doubt the utility of providing an ISS annotation from yeast to rat as part of our dataset. Furthermore, many of our users would question the validity and usefulness of such annotations.

Curation Targets

  • We have now shared a spreadsheet where curators can make suggestions for targets

[[1]]

  • RGD (Mary Shimoyama) sent two lists: one of neurological and cardiovascular diseases (692 genes); the other with genes associated with obesity (1222 genes)
  • How should we proceed to select new targets? We decided to chose from the neurological diseases. Pascale, Emily, Ruth will select targets for September.


Reference genome curation tool

  • RG software group and some curators had a phone conference to discuss the new tool
  • Sohel has made a new version of the interface

http://rails-dev.bioinformatics.northwestern.edu:24000/

  • Keep sending comments

Reference_Genome_Database_Requirements_Discussion; however since Sohel leaves July 31, development will be delayed

  • How to interact with AmiGO

Next meeting

Tuesday Sept 11, 1 PM CDT (11 AM PDT, 7 PM BST)