RGD Progress Report December 2008

RGD, The Rat Genome Database, December 2008

1. Staff working on GOC tasks

GO Curators: Stan Laulederkind, Victoria Petri, Jennifer Smith (3 fte, 0.8 funded by NHGRI GOC grant)

IT staff associated with GO related projects such as the development of the online curation tool and of pipelines, the updates/loads of GO ontologies in the database and the generation and submission of RGD Gene Association files: Jeff dePons, Alex Stoddard (2.5 fte, 0 fte funded by NHGRI grant)

2. Annotation progress

As of November 15, 2008 at the GO consortium site the RGD annotation numbers are as follows: 20236 rat genes with 169,127 GO annotations of which 87,760 are non –IEA. The numbers reported in the 2007 report based on the figures at GOC site as of November 3, 2007 were: 31,301 gene products with annotations, 183,028 annotations, of which 79,739 were non-IEA. Overall, it appears as a decrease for total genes and total annotations. It has to be noted here that there have been many gene merges/deletions at Entrez Gene and many GO terms have been obsoleted. Of note, the Gene Association File does contain annotations coming from GOA for entities that do not have an RGD:ID (proteins, several transcripts). These annotations are appended at the end of the GAF when it is sent to GOC. A number comparison using our reporting system would be more representative for GO data in RGD.

Based on internal RGD reports we show from January to present the number of manually generated GO annotations increased from approximately 16,600 to 20,655, representing a 25% increase in the number of annotations for 2008, so far. The number of genes with manual GO annotations has increased from approximately 4000 to 4,904 representing a 23% increase during this time period.

3. Methods and strategies for annotation

Because the pipelines for GO annotations are automated and updated weekly, all of the curators’ efforts are involved in manual annotation. Although RGD curators also annotate to other ontologies, approximately 75% of their curation efforts are related to GO annotations

a. Literature curation: RGD targets gene sets for manual curation and all rat papers published about those genes are curated. In 2008, there have been 3 major types of gene datasets curated:

(1) disease related: urogenital and breast cancer genes

(2) genes which are part of the Reference Genome Annotation Project

(3) genes involved in targeted metabolic, signaling, regulatory, and disease pathways.

b. Computational annotation strategies:

(1). Rat genes manually curated by other groups are brought in electronically from GOA with their associated evidence codes and the originating group acknowledged in the source.

(2). ISS - RGD is not currently doing manual annotation with ISS. ISS annotations are brought in from MGD and GOA. The pipelines for these have been redesigned to allow weekly updates, filter out redundancies and inappropriate associations. The IT developers have closely worked with the curators to assure the robustness of the pipelines; the orthology is manually assessed.

(3). IEA - rat annotations based on GO mapping to InterPro, Enzyme Commission and Swiss-Prot keywords, are brought in electronically with IEA evidence code from GOA. Annotations from GOA for all categories are updated weekly.

c. Priorities for annotation: There are several ways in which RGD assigns priorities for the annotation of genes to GO ontology terms. These include: the genes in the monthly list for the Reference Genome Annotation Project, genes associated with targeted disease, and genes involved in particular pathways. RGD has also participated in the two electronic jamborees in August and October of this year. Collaborators for GO ontology development have published earlier a list of genes that have been identified in humans as drug transporters. The orthologous rat genes are going to be targeted for annotations at some point in the future.

4. Presentations and publications

a. Papers with substantial GO content - none

b. Presentations including Talks and Tutorials and Teaching

(1) Society of Toxicology, March 16-20, 2008 Seattle, WA – demo of all resources at RGD, including gene ontology data.

(2) “Introduction to Medical Informatics”, September 2008, University of Wisconsin, Milwaukee - a presentation of ontologies, which included GO – the vocabularies, the consortium, GO search and analysis tools.

c. Poster presentations

(1) The Biology of Genomes, May 6 - 10, 2008, Cold Spring Harbor, NY- posters presented information about customizable datasets and Disease Portal curation at RGD.

(2) ISMB 2008 (16th Annual International Conference Intelligent Systems for Molecular Biology), July 19-23 Toronto, Canada – posters presented information on data pipelines and Disease Portals at RGD.

(3) Genome Informatics, September 10 - 14, 2008, Hinxton UK – a poster on RGD’s pipelines including the pipelines for loading the GO ontologies and GOA data, and a poster on RGD’s curation tool including how GO annotations are being made and how obsolete terms are handled.

(4) Rat Genomics & Models, December 3 - 6, 2008, Hinxton, UK - poster on enhanced navigation of RGD’s data including gene ontology annotations

5. Other Highlights

A. GO terms contributed by RGD

RGD submitted 25 new term/synonym requests in this past year, which resulted in 29 new terms/synonyms added to the ontology. Two of the requests caused additional sibling terms to be added in the process of the original requested term being approved. All but two requests were approved. One request was rejected and one request is not yet resolved.

B. Annotation outreach and user advocacy efforts - none

C. Other highlights - none