13 Oct 2009 RefGen Phone Conference (Archived)
Making Ref genome more visible on MOD pages
This is based on a request from Peter Good, who says that the reference genome effort is not visible enough. We should all put a note on the front pages of our sites to say we are participating (or wherever is most appropriate), and if possible, flag genes selected for curation in the course of the reference genome project.
Ecoli Wiki has already nicely implemented this: http://ecoliwiki.net. [ACTION ITEM]: for all to advertise the reference genome effort on their site.
- GOA: Done: http://www.ebi.ac.uk/GOA/RGI/index.html (under 'GOA projects' on our front page). Also includes a link to annotations to human RefGen proteins in QuickGO.
- MGI: added a link in MGI functional annotation home page (http://www.informatics.jax.org/function.shtml)and also the gene detail page of every selected RefGen gene (http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=markerDetail&key=42061).
- TAIR: web page is ready to go, will be pushed to production server by the end of this week (10-16-09); flagging genes will take a little longer but request for changes to database schema and curation interface has been made
- Wormbase: In progress. We are planning to add a link from each gene summary page for each Ref Genome gene. This will require a small model change and likely take ~2 months to go live. We will also add a link to the WB home page, which should be done by the end of this week.
- Ecoli: done : http://ecoliwiki.net
Uncoupling monthly gene annotation from the annotation status of genes
There are a lot more comprehensively annotated genes than just our selected monthly targets. We should capture that information for all genes, not just the monthly selected genes. (Also, the format or the gene selection will change a bit, see below). Can everyone provide a list of their comprehensively annotated genes?
- GOA: yes
- MGI: yes
- dictyBase: yes
Annotation targets: Now until Dec 2009
Mouse lung genes; see 'An Idea for Reference Genome Annotation Direction' below. We ask that each database annotate those genes, and in early 2010 we'll write up a paper describing the advantages/extra information annotating each of the reference genomes has provided.
The list was obtained by Judy/MGI from Carol Bult at Jax who is studying early lung development in mice. These are the set of most relevant genes that she has curated both from previous work and from her own work in gene expression regulation at very early stages of lung development in mice. She uses GO analysis and MouseNet tools to evaluate co-expression and regulation of the mouse genes. She collaborates with biologists at Harvard, elsewhere, also with Jax scientists and has two graduate students working on the experimental aspects of this study with her. Carol is both a computational biologist and an experimental biologist and a professor here at Jax.
[Comment from Rachael] I can't be at the call tonight, but I am concerned with the number of distinct human proteins these 28 Panther families are resulting in (55 alone for PTHR24416). Maybe Carol can come up with a shorter list of genes which are specifically involved in a particular part of early lung development? It would be good to weed out any genes that are not specific just for lung development. This would give a more manageable list and probably make a more interesting story as I imagine we may not get clear results from annotating huge numbers of genes involved in a wide variety of processes.
An Idea for Reference Genome Annotation Direction
There is a huge amount of potential that could be exploited from the co-curation activities made available via the RefGen project. Therefore would it be worth setting up a scheme whereupon GOC groups could submit short proposals to the RefGen curation body, to suggest possible co-curation projects which the RefGen curators could choose to focus on for a specified amount of time? I have outlined some possible advantages to this idea below:
Advantages - the group/curator heading a chosen project would have an invested interest in helping to push forward the co-curation work; to help improve annotation consistency and the GO terms available. These curators will therefore support Pascale's coordination efforts.
- as the selected genes in a project will have a common theme, all curators from the different groups should generate an extended understanding of the biology in a particular area; this should help improve the consistency of annotations available for a particular system, and ontology development discussions.
- projects should aim to eventually generate targeted publications on usefulness of the GO resource with respect to a particular area of biology . For instance, a publications could compare the annotations generated for the same system across diverse species, exploring interesting differences/similarities in the data, perhaps linking up with external investigators in the chosen domain.
- focused co-curation could directly aid ontology development work. For instance where an ontology development effort has recently generated new terms to describe a particular process, these could be provided to the group to be 'road tested' (with the understanding that terms need to be publicly available and that ontology developers are confident that a reasonable number of terms already exist in a usable state).
- where a recent ontology content meeting has generated a specific set of terms for an area of biology; co-ordinated curation work could help to rapidly generate annotations that apply the newly created terms and ensure the new terms are appropriate for all species. If this work is carried out with a recent ontology development effort, then external experts involved in the content meeting may also still be interested in helping with questions arising from annotation discussions.
Possible requirements for these co-curation proposals:
1. project proposals should be designed to create annotations to targets that are of interest to human biomedical research
2. proposals should incorporate a distinct time-requirement; i.e. a limited number of proteins should be proposed that would be possible to annotate in a period of approximately 3-4 months
3. at the end of this annotation period, the project should aim to generate a publication that demonstrates the usefulness of GO annotation resource and the value of the co-curation effort. The curators leading the co-curation exercise will be primary authors of such a publication as well as the Reference Genome group.
4. The annotation effort should, if possible encourage external collaborations to use and expand the information resource provided by the co-curation effort.
By using multiple focused, small annotation projects it would be hoped that more curators could become confident enough to be involved in ontology development efforts and that this may also help demonstrate to external users the usefulness of the the Reference Genome initiative; two points emphasized at the recent GOC meeting.
There are a number of ongoing projects where this proposal could be applied, which include:
- The current Co-curation project for the Loop of Henle (this has been initiated by Yasmin for the Renal GO Initiative, and involves 4 curators annotating ~200 conserved orthologs found to be involved in the development of a kidney structure which differs greatly been species, see: http://wiki.geneontology.org/index.php/Loop_of_Henle_Cocuration): GOA-UniProtKB
- Selected genes involved in cardiovascular development (targeted to use terms developed by the recent Heart Ontology Content workshop): BHF-UCL
- Selected genes involved in lung development: MGI
- Signaling. When the GO editors are satisfied that the terms in this area satisfactory - curation groups could be asked to take a selection of different signaling pathways to test the structure of the ontology.
Possible future Projects: Ageing, Cell cycle, DNA repair (GOA-UniProtKB; Rachael Huntley)