Curator Discussion Minutes-20080414
I wrote these minutes for our MGI Wiki and thought I would share with the rest of you. Feel free to change, elaborate, etc.
-Terry Meehan, Alleles & Phenotypes, MGI, Jackson Laboratory
The moderator Mike Cherry started the phone call with an informal roll call. Most of the major MOD were represented including MGI, RGD, SGD, Zfin, Wormbase, Candida Genome Database, Uniprot.
Discussion of C. Elegans paper by Andrei Petcherski of Wormbase
Brief discussion about wormbase annotation process
1. Use GO terms
2. Look for specific techniques (transgene, RNAi, mutational analysis)
3. Note what figures is relevant for each
Interesting figure on break down of database
1. 4,232 papers annotated
2. Largest category is expression data (1219) followed by RNAI (938)
3. Organism Identification
Metadata
During discussion of C. elegans paper, Mary of RGD asked if other curators had problems with identification of the species of origin for a given protein or gene used in a paper. This generated a great deal of discussion:
Reasons for absence of information
1. RGD finds many researches treat rat and mouse as the same thing
2. Speculation that authors want to blur distinction so their research has more relevance to human disease
3. Others pointed out that many researchers don’t know what the species of origin; they simply have an expression vector in a tube labeled "Beta-Actin"
4. To people in a particular subfield like reviewers, it may be obvious the species of origin
5. Many researchers don’t care
Lack of species origin data creates problems in database
1. Time spent by curators on this issues is a waste of resources
- At the protein-protein interaction database, 50% of curators time is spent trying to figure this out
2. good research is left out of databases
Metadata to collect
1. Authors should include at a minimum identifiers for the protein(s) or gene(s)used
- There was almost universal consensus for this
- TAIR and Plant Physiology collaboration
- PLOS journals are moving towards collecting metadata
2. Other pieces of metadata to include would be strain identification, alleles, and isoforms
- Curators agreed its helpful but if you ask for too much, you’ll get nothing
- Strain data is often misunderstood by researchers
- Metadata collection and enforcement
Debate whether there should be a central database or leave it in charge of individual journals
1. Central database would allow for improvements in data collection form without consulting journals
- Specific forms could be tailored to different areas of research
- Changes to forms could be made quickly
- Central gathering site is more stable
2. Journals should collect metadata
- Most journals would not give up control
- Too many forms at a central site would generate confusion
- Journals should instead be convinced to collect metadata as part of their manuscript submission process
- Journals have more enforcement power
- Discussion also included where metadata should be placed: into the methods section, including it in the keywords, having it on the journal website.
Convincing Journals to collect metadata
1. Curators should present a united front
- Start generating a letter
- Further need for a biocurator society
2. Benefits to journal
- Makes journals more "searchable" by data-mining software
- Curated papers will get more hits
- SGD actually tracks how many people go to pubmed or journal homepages from their website (METRICS)
3. Raising awareness
- Get the scientific community involved
- Protein Structure Database resulted from the community realizing benefits of IDs
- Writing editorials
- Meetings
- Think of further ways could benefit authors