2007-09 SAB minutes: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
Line 48: Line 48:
=Annotation Outreach=
=Annotation Outreach=
==Slides==
==Slides==
Talk slides: [[Media:outreach_princeton.ppt]]


==Discussion==
==Discussion==

Revision as of 15:31, 1 October 2007

Ontology Development

Slides

Discussion

David Hill presented the report on the major areas of development over this past year. One of the first major discussion points was how to continue to have content meetings in light of limited resources.

The content meetings are extremely productive in that preliminary work is put into training the scientists about ontology development and distributing proposed ontology changes among the group before the meeting is held. During the face-to-face meetings, there is a lot of discussion about providing a consensus view of the ontology and making the changes. These meetings are 10-15 people and cost approximately $15,000.

Submitting an R13 meeting/conference grant was one suggestion. Another suggestion was to have an active outreach to industry. We could write to heads of bioinformatics and pharma groups and see what areas they want expanded. In exachange, we propose they host a meeting working on this area. Reaching companies through PR and marketing was strongly discouraged. There is an EBI industry group that Michael Ashburner started that might be tapped for contacts.

The work on cross-products was commended because it moves GO further towards being computed upon.

In order to increase our efficiency in structural work, Larry Hunter suggested collaborating more with NCBO and connection with the greater ontology world. Chris Mungall pointed out that GO is providing resources that allow it to be connected to the greater ontology world, such as publishing in RDF for the semantic web and providing a mapping to OWL. David Botstein cautioned against representing ontologies for the sake of theoretical frameworks, that GO should remain grounded in biology and content. There was discussion about the production status of some of the tools provided by NCBO at this point in time.

Larry Hunter asked about quality control metrics. John Day-Richter did a demo of OBO-Edit to show several tools built in to maintain quality control on the ontology. OBO-edit has a reasoner which identifies errors the ontology editor has made. In addition, the editor can add their own filters to identify errors, such as the disjoint errors. OBO-edit is used by GO to edit ontologies as well as other ontology groups, such as Jackson Laboratory phenotype curators.

Reference Genome

Slides

Discussion

Rex Chisholm presented the progress of the Reference Genome group after approximately a year’s worth of work. The focus has been on “comprehensive” annotation because it is possible.

Larry Hunter asked how many papers are linked to a gene. The process of obtaining the literature sets are so different, the individual database groups report the numbers on the Google spreadsheet.

AZ guy asked if any text mining processes have been incorporated in order to identify appropriate papers. Although MODs have had some collaboration with groups, the papers are all manually reviewed. Larry suggested that the MODs could be involved with groups to help identify papers in a common way.

There was significant discussion about how the priorities should be set for the list of genes. Since currently OMIM and the RGD disease portal are being used to help set priorities, there may be fewer genes to annotate for non-mammalian organisms. Simon suggested prioritizing genes that have been identified in the recent genome-wide association studies. Many of these have not been annotated yet in GO. Another suggestion from Larry Hunter was identifying metabolic disease genes.

In order to increase the number of genes annotated, Larry suggested that genes with fewer papers be selected. Rex pointed out that the counter-argument is that these genes may not be of general interest. However, this could help those doing high-throughput annotations. Judy pointed out that many organisms do provide breadth of annotation using IEA annotation and these data are available to the high-throughput community. In response to the concern about the total number of genes, David Hill pointed out that all the papers addressed in a publication are annotated during the process of curating for the Reference Genome gene. These genes, however, are not tracked but contribute to the overall goal of providing GO annotations based on experimental literature.

There was some discussion about the literature review process used to define “comprehensive” annotation. David Botstein suggested that a review is used for highly studied genes and the primary literature is used for genes with fewer papers. The caveat to this suggestion is that the experimental system is not clearly stated in review articles. Since the goal of the Reference Genome project is to capture experimental data in that organism, David Hill pointed out that the review is often a good place to start in order to identify the relevant publications that can be used for an annotation. Larry suggested we do an experiment to see if we can reach “comprehensive” annotation for a gene using ~25 publications.

Rex then proceeded to describe the need to identify the ortholog in the respective organism because the human gene is the one on the list. Currently, individual curators identify them because they are the best suited to understand how their organisms’ genome compares to the other genomes. Tools such as YOGY, INPARANOID, OrthoMCL, TreeFam, and Homologene are used in order to find orthologs and not just domain conservation. The method and ortholog are recorded but curators do not mark when they feel the assertion is wrong. In order to save some time for the curators, Larry suggested that a decision tree that reflects the curators decision process could be made into a tool. There was a little discussion about the software needs of the Reference Genome group and how the list of genes will be integrated with AmiGO as well as ortholog calls made by the curators. Integrating the list of genes and an ability to search for Reference Genome genes in AmiGO will be important in publicizing this project.

Other aspects of the Reference Genome project briefly touched upon were curation consistency and how curation drives ontology development. Midori and David reiterated that ontology requests from the Reference Genome project are made high priority and there are very few requests left open.

The discussion on Reference Genomes finished with a discussion of goals for this upcoming year. The majority of the conversation focused on continuing to make progress on the number of genes annotated and the strategy for identifying target genes. Mike Cherry again reminded the advisory board that there are other genes being annotated during this time independent of the reference genome effort.

With regard to identifying target genes, Barry inquired whether we have been communicating with potential users on the side of clinical medicine, especially those working on disease models in order to help us prioritize which diseases to focus on. In addition, the individual user communities of each of the model organisims can provide a feedback mechanism. Judy remarked that we do have many outlets to take advantage of feedback to help us prioritize.

David Botstein commented that we should refine how we say we use OMIM. Not all genes in OMIM are well characterized and not all diseased in OMIM impact a significant percent of the population. Before we publicize the Reference Genome project, we should identify the total number of genes from OMIM that fit our criteria: whether it be diseases that are well characterized or diseases with the highest number of afflicted people, etc. This may actually be a manageable subset of all OMIM records.

Other suggestions for identifying gene lists were the ENCODE set, key signaling pathways and other biochemical pathways. Not all these suggestions are mutually exclusive so a handful of genes can be picked from all these lists. There was some discussion on how it would be interesting to see if annotations from these other lists produce similar types of results as those from the disease gene list.

Another issue confronting the Reference Genome Project is the resources – GOA curators do not get a break because they always have the most number of genes to curate (since it’s all human) and these genes have the most literature. There was some discussion again about how little effort has been put into parsing the human literature in a sensible way.

Annotation Outreach

Slides

Talk slides: Media:outreach_princeton.ppt

Discussion

User Support and Providing Tools and Resources for Users

Slides

Discussion