2007-09 SAB minutes: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
Line 51: Line 51:


==Discussion==
==Discussion==
Presented by Jennifer Deegan. The purpose of the annotation outreach group is to get more groups to annotate and see if we can get their annotation in the GO database. Accomplished within the past year:
* Added SOP targeted to new users on the GO website, which contains flowcharts to help do IEA, ISS and manual annotations
* Posted a list of the meetings and conferences the Outreach group has attended on the wiki
* Produced a DAG map of the annotation groups we are in contact with
The group would like feedback from the SAB with respect to
* How to help people with no funding
* How to help people who people who don’t like the GO (function/process links missing; too many obsoletes)
* How to help people who don’t want to use the full complexity of GO (too many terms; the use of references and evidence codes) and would like to annotate a genome in two days (which is being done by some groups)
The SAB wondered about the validity of annotations done so quickly. Barry asked whether those ‘fairly correct’ annotations; are they incorrect or just high level? If they are generally correct, maybe we can use some of their strategies. Jennifer says that the annotations tend to be incomplete. Mike Cherry points out that we cannot really expect much more from those groups without dedicated curators: If the ‘quick and dirty’ approach is sufficient for their purposes, it’s hard to push them to do more; their resources for annotation are limited. Rex mentioned we have a ‘mentors system’: experienced curators mentor other new groups; however so far those collaborations have not returned any data to GO, so maybe it’s not worth the investment? Craig NM says that you need to define the qualities we need to call an annotation ‘useful’. Larry Hunter adds that we need to consider other models for integrating data; one is to use the ‘weak’ annotation these groups provide; another model is to do ‘annotation jamborees’ where experts give input and curators then expert curators would use that data (this has been done, but the GOC is concerned because the annotations resulting from the jamborees rarely get updated). Judy suggests an alternative to genome-centered jamborees would be to meet with an expert (for example in diabetes) and annotate a set of genes with their help.
David (skype) asks whether adding the Function/Process links in the ontology would help curation.
There was also a more general discussion about the funding of sequencing projects without support for annotation. That’s outside the control of the GOC; but we would like to be able to provide data for better IEAs. Larry asks what is the cost to curate a genome? If a researcher wants funding to do that, how much money does he/she need? Nobody was able to give a precise figure; too many factors: number of papers, etc.


=User Support and Providing Tools and Resources for Users=
=User Support and Providing Tools and Resources for Users=
Line 56: Line 71:
==Slides==
==Slides==
User Advocacy: [[Media:GO-User-Advocacy-SAB2007-final.ppt]]
User Advocacy: [[Media:GO-User-Advocacy-SAB2007-final.ppt]]
===Discussion===
Presented by Eurie Hong: The purpose of the User advocacy group is to provide communication channels between the GOC and the research community. Tools put in place to that end:
* quarterly newsletter that talks about papers that use GO; gene of the quarter; new software development; sent out to GO friends and GO databases; also MODs highlight newsletter
* GO help: group of curators that answer emails to GO help; either answer or forward to the appropriate group; ranges from 20-140 emails (all emails; queries and responses), lately ~ 70 mails per month; number of email requests increased since we switched to the web form. The emails usually get answered within 24 hours, but the resolution of the problem can take longer. Larry points out we should keep track of the actual number of requests, not just the total number of mails.
* website development
* documentation (FAQ, amigo help, minimal (third party)tool standards to organize the tools page better ); workshops (see slides)
Future Plans
* We will develop tutorials with the help of Moodle group (moodle.org)
* We will not have general Users meetings anymore; there were very wide range such that they were not probably helping anyone (too wide); in the future we’ll do more focused meetings; for example maintain MGED meeting;
* We’ll make a set of core slides for presentations to help communicate the core GO ideas better
Suggestions from the SAB:
* Craig: you could also make videos.
* Larry: you should define the communities you are trying to target: you may want to explain/distinguish the ‘new’ GO users better, since this is a really varied group, to help target their needs, then there is also pharmaceutical companies, bioontology people [Outreach group has started doing that, see SOPs]. Barry points out it would also be helpful to know how much the actual annotations are used (GOC thinks this is hard to figure out); Larry suggests tracking papers using GO, and define use a set of use cases (in addition to microarrays). Judy notes that we do have some of that data (http://geneontology.org/cgi-bin/biblio.cgi): we have close to 1500 papers, classified in broad categories, about ½ are microarrays. David State is concerned that GO usage drives its own development, for example GO doesn’t do pathology, therefore there are no users working on pathology.
* Barry: one common criticism is that the GO is full of mistakes; they need to send us suggestions for corrections; we should recruit people that could send corrections; Craig suggests using a wiki-like medium to allow people making suggestions. Suzi replied that there is a wiki in place that can be used like that.
* The discussion about errors in the GO led to the remark by Larry that people were not really aware of GO’s dynamism; the AZ person pointed out that people were also turned off by too much dynamism and terms getting obsolete.


Production services:
Production services:

Revision as of 07:56, 2 October 2007

Ontology Development

Slides

Discussion

David Hill presented the report on the major areas of development over this past year. One of the first major discussion points was how to continue to have content meetings in light of limited resources.

The content meetings are extremely productive in that preliminary work is put into training the scientists about ontology development and distributing proposed ontology changes among the group before the meeting is held. During the face-to-face meetings, there is a lot of discussion about providing a consensus view of the ontology and making the changes. These meetings are 10-15 people and cost approximately $15,000.

Submitting an R13 meeting/conference grant was one suggestion. Another suggestion was to have an active outreach to industry. We could write to heads of bioinformatics and pharma groups and see what areas they want expanded. In exachange, we propose they host a meeting working on this area. Reaching companies through PR and marketing was strongly discouraged. There is an EBI industry group that Michael Ashburner started that might be tapped for contacts.

The work on cross-products was commended because it moves GO further towards being computed upon.

In order to increase our efficiency in structural work, Larry Hunter suggested collaborating more with NCBO and connection with the greater ontology world. Chris Mungall pointed out that GO is providing resources that allow it to be connected to the greater ontology world, such as publishing in RDF for the semantic web and providing a mapping to OWL. David Botstein cautioned against representing ontologies for the sake of theoretical frameworks, that GO should remain grounded in biology and content. There was discussion about the production status of some of the tools provided by NCBO at this point in time.

Larry Hunter asked about quality control metrics. John Day-Richter did a demo of OBO-Edit to show several tools built in to maintain quality control on the ontology. OBO-edit has a reasoner which identifies errors the ontology editor has made. In addition, the editor can add their own filters to identify errors, such as the disjoint errors. OBO-edit is used by GO to edit ontologies as well as other ontology groups, such as Jackson Laboratory phenotype curators.

Reference Genome

Slides

Discussion

Rex Chisholm presented the progress of the Reference Genome group after approximately a year’s worth of work. The focus has been on “comprehensive” annotation because it is possible.

Larry Hunter asked how many papers are linked to a gene. The process of obtaining the literature sets are so different, the individual database groups report the numbers on the Google spreadsheet.

AZ guy asked if any text mining processes have been incorporated in order to identify appropriate papers. Although MODs have had some collaboration with groups, the papers are all manually reviewed. Larry suggested that the MODs could be involved with groups to help identify papers in a common way.

There was significant discussion about how the priorities should be set for the list of genes. Since currently OMIM and the RGD disease portal are being used to help set priorities, there may be fewer genes to annotate for non-mammalian organisms. Simon suggested prioritizing genes that have been identified in the recent genome-wide association studies. Many of these have not been annotated yet in GO. Another suggestion from Larry Hunter was identifying metabolic disease genes.

In order to increase the number of genes annotated, Larry suggested that genes with fewer papers be selected. Rex pointed out that the counter-argument is that these genes may not be of general interest. However, this could help those doing high-throughput annotations. Judy pointed out that many organisms do provide breadth of annotation using IEA annotation and these data are available to the high-throughput community. In response to the concern about the total number of genes, David Hill pointed out that all the papers addressed in a publication are annotated during the process of curating for the Reference Genome gene. These genes, however, are not tracked but contribute to the overall goal of providing GO annotations based on experimental literature.

There was some discussion about the literature review process used to define “comprehensive” annotation. David Botstein suggested that a review is used for highly studied genes and the primary literature is used for genes with fewer papers. The caveat to this suggestion is that the experimental system is not clearly stated in review articles. Since the goal of the Reference Genome project is to capture experimental data in that organism, David Hill pointed out that the review is often a good place to start in order to identify the relevant publications that can be used for an annotation. Larry suggested we do an experiment to see if we can reach “comprehensive” annotation for a gene using ~25 publications.

Rex then proceeded to describe the need to identify the ortholog in the respective organism because the human gene is the one on the list. Currently, individual curators identify them because they are the best suited to understand how their organisms’ genome compares to the other genomes. Tools such as YOGY, INPARANOID, OrthoMCL, TreeFam, and Homologene are used in order to find orthologs and not just domain conservation. The method and ortholog are recorded but curators do not mark when they feel the assertion is wrong. In order to save some time for the curators, Larry suggested that a decision tree that reflects the curators decision process could be made into a tool. There was a little discussion about the software needs of the Reference Genome group and how the list of genes will be integrated with AmiGO as well as ortholog calls made by the curators. Integrating the list of genes and an ability to search for Reference Genome genes in AmiGO will be important in publicizing this project.

Other aspects of the Reference Genome project briefly touched upon were curation consistency and how curation drives ontology development. Midori and David reiterated that ontology requests from the Reference Genome project are made high priority and there are very few requests left open.

The discussion on Reference Genomes finished with a discussion of goals for this upcoming year. The majority of the conversation focused on continuing to make progress on the number of genes annotated and the strategy for identifying target genes. Mike Cherry again reminded the advisory board that there are other genes being annotated during this time independent of the reference genome effort.

With regard to identifying target genes, Barry inquired whether we have been communicating with potential users on the side of clinical medicine, especially those working on disease models in order to help us prioritize which diseases to focus on. In addition, the individual user communities of each of the model organisims can provide a feedback mechanism. Judy remarked that we do have many outlets to take advantage of feedback to help us prioritize.

David Botstein commented that we should refine how we say we use OMIM. Not all genes in OMIM are well characterized and not all diseased in OMIM impact a significant percent of the population. Before we publicize the Reference Genome project, we should identify the total number of genes from OMIM that fit our criteria: whether it be diseases that are well characterized or diseases with the highest number of afflicted people, etc. This may actually be a manageable subset of all OMIM records.

Other suggestions for identifying gene lists were the ENCODE set, key signaling pathways and other biochemical pathways. Not all these suggestions are mutually exclusive so a handful of genes can be picked from all these lists. There was some discussion on how it would be interesting to see if annotations from these other lists produce similar types of results as those from the disease gene list.

Another issue confronting the Reference Genome Project is the resources – GOA curators do not get a break because they always have the most number of genes to curate (since it’s all human) and these genes have the most literature. There was some discussion again about how little effort has been put into parsing the human literature in a sensible way.

Annotation Outreach

Slides

Talk slides: Media:outreach_princeton.ppt

Discussion

Presented by Jennifer Deegan. The purpose of the annotation outreach group is to get more groups to annotate and see if we can get their annotation in the GO database. Accomplished within the past year:

  • Added SOP targeted to new users on the GO website, which contains flowcharts to help do IEA, ISS and manual annotations
  • Posted a list of the meetings and conferences the Outreach group has attended on the wiki
  • Produced a DAG map of the annotation groups we are in contact with

The group would like feedback from the SAB with respect to

  • How to help people with no funding
  • How to help people who people who don’t like the GO (function/process links missing; too many obsoletes)
  • How to help people who don’t want to use the full complexity of GO (too many terms; the use of references and evidence codes) and would like to annotate a genome in two days (which is being done by some groups)

The SAB wondered about the validity of annotations done so quickly. Barry asked whether those ‘fairly correct’ annotations; are they incorrect or just high level? If they are generally correct, maybe we can use some of their strategies. Jennifer says that the annotations tend to be incomplete. Mike Cherry points out that we cannot really expect much more from those groups without dedicated curators: If the ‘quick and dirty’ approach is sufficient for their purposes, it’s hard to push them to do more; their resources for annotation are limited. Rex mentioned we have a ‘mentors system’: experienced curators mentor other new groups; however so far those collaborations have not returned any data to GO, so maybe it’s not worth the investment? Craig NM says that you need to define the qualities we need to call an annotation ‘useful’. Larry Hunter adds that we need to consider other models for integrating data; one is to use the ‘weak’ annotation these groups provide; another model is to do ‘annotation jamborees’ where experts give input and curators then expert curators would use that data (this has been done, but the GOC is concerned because the annotations resulting from the jamborees rarely get updated). Judy suggests an alternative to genome-centered jamborees would be to meet with an expert (for example in diabetes) and annotate a set of genes with their help.

David (skype) asks whether adding the Function/Process links in the ontology would help curation.

There was also a more general discussion about the funding of sequencing projects without support for annotation. That’s outside the control of the GOC; but we would like to be able to provide data for better IEAs. Larry asks what is the cost to curate a genome? If a researcher wants funding to do that, how much money does he/she need? Nobody was able to give a precise figure; too many factors: number of papers, etc.


User Support and Providing Tools and Resources for Users

Slides

User Advocacy: Media:GO-User-Advocacy-SAB2007-final.ppt

Discussion

Presented by Eurie Hong: The purpose of the User advocacy group is to provide communication channels between the GOC and the research community. Tools put in place to that end:

  • quarterly newsletter that talks about papers that use GO; gene of the quarter; new software development; sent out to GO friends and GO databases; also MODs highlight newsletter
  • GO help: group of curators that answer emails to GO help; either answer or forward to the appropriate group; ranges from 20-140 emails (all emails; queries and responses), lately ~ 70 mails per month; number of email requests increased since we switched to the web form. The emails usually get answered within 24 hours, but the resolution of the problem can take longer. Larry points out we should keep track of the actual number of requests, not just the total number of mails.
  • website development
  • documentation (FAQ, amigo help, minimal (third party)tool standards to organize the tools page better ); workshops (see slides)

Future Plans

  • We will develop tutorials with the help of Moodle group (moodle.org)
  • We will not have general Users meetings anymore; there were very wide range such that they were not probably helping anyone (too wide); in the future we’ll do more focused meetings; for example maintain MGED meeting;
  • We’ll make a set of core slides for presentations to help communicate the core GO ideas better

Suggestions from the SAB:

  • Craig: you could also make videos.
  • Larry: you should define the communities you are trying to target: you may want to explain/distinguish the ‘new’ GO users better, since this is a really varied group, to help target their needs, then there is also pharmaceutical companies, bioontology people [Outreach group has started doing that, see SOPs]. Barry points out it would also be helpful to know how much the actual annotations are used (GOC thinks this is hard to figure out); Larry suggests tracking papers using GO, and define use a set of use cases (in addition to microarrays). Judy notes that we do have some of that data (http://geneontology.org/cgi-bin/biblio.cgi): we have close to 1500 papers, classified in broad categories, about ½ are microarrays. David State is concerned that GO usage drives its own development, for example GO doesn’t do pathology, therefore there are no users working on pathology.
  • Barry: one common criticism is that the GO is full of mistakes; they need to send us suggestions for corrections; we should recruit people that could send corrections; Craig suggests using a wiki-like medium to allow people making suggestions. Suzi replied that there is a wiki in place that can be used like that.
  • The discussion about errors in the GO led to the remark by Larry that people were not really aware of GO’s dynamism; the AZ person pointed out that people were also turned off by too much dynamism and terms getting obsolete.


Production services:

Software:

Discussion