Curator Discussion Minutes-20080414: Difference between revisions

Revision as of 14:22, 23 April 2008

I wrote these minutes for our MGI Wiki and thought I would share with the rest of you. Feel free to change, elaborate, etc.
-Terry Meehan, Alleles & Phenotypes, MGI, Jackson Laboratory

The moderator Mike Cherry started the phone call with an informal roll call. Most of the major MOD were represented including MGI, RGD, SGD, Zfin, Wormbase, Candida Genome Database, Uniprot.

Discussion of C. Elegans paper by Andrei Petcherski of Wormbase

Brief discussion about wormbase annotation process
1. Use GO terms
2. Look for specific techniques (transgene, RNAi, mutational analysis)
3. Note what figures is relevant for each
Interesting figure on break down of database
1. 4,232 papers annotated
2. Largest category is expression data (1219) followed by RNAI (938)
3. Organism Identification

Metadata

During discussion of C. elegans paper, Mary of RGD asked if other curators had problems with identification of the species of origin for a given protein or gene used in a paper. This generated a great deal of discussion:

Reasons for absence of information

1. RGD finds many researches treat rat and mouse as the same thing
2. Speculation that authors want to blur distinction so their research has more relevance to human disease
3. Others pointed out that many researchers don’t know what the species of origin; they simply have an expression vector in a tube labeled "Beta-Actin"
4. To people in a particular subfield like reviewers, it may be obvious the species of origin
5. Many researchers don’t care

Lack of species origin data creates problems in database

1. Time spent by curators on this issues is a waste of resources

At the protein-protein interaction database, 50% of curators time is spent trying to figure this out

2. good research is left out of databases

Metadata to collect

1. Authors should include at a minimum identifiers for the protein(s) or gene(s)used

There was almost universal consensus for this
TAIR and Plant Physiology collaboration
PLOS journals are moving towards collecting metadata

2. Other pieces of metadata to include would be strain identification, alleles, and isoforms

Curators agreed its helpful but if you ask for too much, you’ll get nothing
Strain data is often misunderstood by researchers
Metadata collection and enforcement

Debate whether there should be a central database or leave it in charge of individual journals

1. Central database would allow for improvements in data collection form without consulting journals

Specific forms could be tailored to different areas of research
Changes to forms could be made quickly
Central gathering site is more stable

2. Journals should collect metadata

Most journals would not give up control
Too many forms at a central site would generate confusion
Journals should instead be convinced to collect metadata as part of their manuscript submission process
Journals have more enforcement power
Discussion also included where metadata should be placed: into the methods section, including it in the keywords, having it on the journal website.

Convincing Journals to collect metadata

1. Curators should present a united front

Start generating a letter
Further need for a biocurator society

2. Benefits to journal

Makes journals more "searchable" by data-mining software
Curated papers will get more hits
SGD actually tracks how many people go to pubmed or journal homepages from their website (METRICS)

3. Raising awareness

Get the scientific community involved
Protein Structure Database resulted from the community realizing benefits of IDs
Writing editorials
Meetings
Think of further ways could benefit authors