BioCurator Discussion Topics: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 37: Line 37:


   Here's a rough draft of the idea we (ie DIP, hopefully with support of  
   Here's a rough draft of the idea we (ie DIP, hopefully with support of  
as many databases as possible) would like to persuade the journal editors  
as many databases as possible) would like to persuade the journal editors  
to ([[User:Lukasz|Lukasz]]/DIP):
to ([[User:Lukasz|Lukasz]]/DIP):
    
    
   Recent years have seen a rapid increase in the quantity of biological  
   Recent years have seen a rapid increase in the quantity of biological  
data published in research papers. As the volume of the data increases,  
data published in research papers. As the volume of the data increases,  
it is of utmost importance to organize and combine it in a systematic way.  
it is of utmost importance to organize and combine it in a systematic way.  
This is one of the primary roles of the numerous biological databases: RCSB,  
This is one of the primary roles of the numerous biological databases: RCSB,  
GenBank, UniProt, SwissProt, DIP, IntAct, MINT, SGD (yeast), FlyBase,  
GenBank, UniProt, SwissProt, DIP, IntAct, MINT, SGD (yeast), FlyBase,  
WormBase, TAIR (Arabidopsis), RGD (rat) and many others.
WormBase, TAIR (Arabidopsis), RGD (rat) and many others.
   With the exception of RCSB and GenBank, where direct data deposition by  
   With the exception of RCSB and GenBank, where direct data deposition by  
the authors is imposed by journal editors and/or funding agencies,  
the authors is imposed by journal editors and/or funding agencies,  
biological databases generally depend on curators to manually extract individual pieces  
biological databases generally depend on curators to manually extract individual pieces  
of information from research papers for database deposition. This curation is
of information from research papers for database deposition. This curation is
labor-intensive, and curators agree that the major stumbling block to  
labor-intensive, and curators agree that the major stumbling block to  
efficient curation of biological literature is incomplete and/or ambigous  
efficient curation of biological literature is incomplete and/or ambigous  
information about the identity of the biomolecules and genes studied.  Every curator can provide horror stories of tracing the  
information about the identity of the biomolecules and genes studied.  Every curator can provide horror stories of tracing the  
identity of a single protein used in a paper through a chain of 'prepared  
identity of a single protein used in a paper through a chain of 'prepared  
as described in...' and 'obtained as a gift from...' phrases only to  
as described in...' and 'obtained as a gift from...' phrases only to  
discover at the end of the trail that it is still impossible to identify  
discover at the end of the trail that it is still impossible to identify  
the protein as coming from human or rat without contacting the authors (1).  
the protein as coming from human or rat without contacting the authors (1).  
The problem seems to be universal across every journal and every database  
The problem seems to be universal across every journal and every database  
with which we have a contact.
with which we have a contact.
   Over the years, a number of researchers have raised this issue in  
   Over the years, a number of researchers have raised this issue in  
numerous commentaries, reviews and editorials, mostly without any response.  
numerous commentaries, reviews and editorials, mostly without any response.  
Two recent initiatives, however, seem to suggest that the situation is  
Two recent initiatives, however, seem to suggest that the situation is  
changing. TAIR initiated a partnership with Plant Physiology journal (2)  
changing. TAIR initiated a partnership with Plant Physiology journal (2)  
aimed at capturing as much functional data as possible with minimal burden  
aimed at capturing as much functional data as possible with minimal burden  
imposed on both journal editorial office and the authors. Similarly FEBS  
imposed on both journal editorial office and the authors. Similarly FEBS  
Letters (3), in collaboration with the MINT database attempts to recruit  
Letters (3), in collaboration with the MINT database attempts to recruit  
authors to capture protein interaction data.
authors to capture protein interaction data.
   Whereas we are quite excited about these two attempts, we realize that  
   Whereas we are quite excited about these two attempts, we realize that  
the scope of the problem is much broader.  Curation efforts of many  
the scope of the problem is much broader.  Curation efforts of many  
individual databases would become instantaneously more efficient if a list  
individual databases would become instantaneously more efficient if a list  
of biomolecules and genes, each with a reference to the relevant database,  
of biomolecules and genes, each with a reference to the relevant database,  
were published as a simple electronic supplement available to every journal  
were published as a simple electronic supplement available to every journal  
reader. We believe it would translate into rapid dissemination of the  
reader. We believe it would translate into rapid dissemination of the  
information from such papers to many diverse databases. As every database  
information from such papers to many diverse databases. As every database  
references the original source of data, the supplement would improve database  
references the original source of data, the supplement would improve database  
coverage and increase the visiblity of both individual articles and the  
coverage and increase the visiblity of both individual articles and the  
journals in which they are published.
journals in which they are published.
   We (as DIP, but also as a member of the biocurator forum that includes CGD,  
   We (as DIP, but also as a member of the biocurator forum that includes CGD,  
dictyBase, DIP, SGD, TAIR, RGD, UniProt, WormBase, Zfin; and as a member of  
dictyBase, DIP, SGD, TAIR, RGD, UniProt, WormBase, Zfin; and as a member of  
the IMEx consortium of interaction databases grouping DIP, IntAct, MINT,  
the IMEx consortium of interaction databases grouping DIP, IntAct, MINT,  
MPact, BioGRID) wonder if Journal XXX would be willing to implement a policy  
MPact, BioGRID) wonder if Journal XXX would be willing to implement a policy  
requiring the authors of the accepted papers to prepare, with the help of the  
requiring the authors of the accepted papers to prepare, with the help of the  
database community, an electronic supplement file listing all the biomolecules  
database community, an electronic supplement file listing all the biomolecules  
and genes studied in the manuscript. One possible approach would be to implement  
and genes studied in the manuscript. One possible approach would be to implement  
a form similar to the one prepared by TAIR for Plant Physiology:
a form similar to the one prepared by TAIR for Plant Physiology:
       http://www.aspb.org/publications/tairsubmission.cfm
       http://www.aspb.org/publications/tairsubmission.cfm
that would produce, as the output, a file to be included within electronic  
that would produce, as the output, a file to be included within electronic  
supplement.
supplement.
   
 
References:  
  References:  
(1) most recent example from PNAS:
(1) most recent example from PNAS:
     Bartsch S, Monnet J, Selbach K, Quigley F, Gray J,
     Bartsch S, Monnet J, Selbach K, Quigley F, Gray J,
     von Wettstein D, Reinbothe S, Reinbothe C
     von Wettstein D, Reinbothe S, Reinbothe C
Line 103: Line 103:
     holds for any other functional data reported in the paper.
     holds for any other functional data reported in the paper.
    
    
(2) Plant Physiology 146:1022-1023 (2008)
(2) Plant Physiology 146:1022-1023 (2008)
     Plant Physiology and TAIR Partnership
     Plant Physiology and TAIR Partnership
   
   
(3) Superti-Furga G, Wieland F, Cesareni G
(3) Superti-Furga G, Wieland F, Cesareni G
     Finally: The digital, democratic age of scientific abstracts
     Finally: The digital, democratic age of scientific abstracts
     FEBS Letters 582(8),1169
     FEBS Letters 582(8),1169

Revision as of 20:19, 18 April 2008

Project Information.

(Please edit below and add information about your project. Add your project if it is not listed below.)

Number of papers annotated per year?

  • CGD
  • dictyBase
  • DIP
about 1000/year; protein-protein interactions only
  • SGD
  • TAIR
  • RGD
  • UCL
  • UniProtKB
  • WormBase
  • Zfin

What should be in a publication?

  • CGD
  • dictyBase
  • DIP
bare minimum:
gene/protein/EST/etc name (as used in the paper), database identifier, species (taxon id) provided for every gene/protein/DNA fragment used in the paper (including controls)
useful(but, IMHO, optional):
fuctional annotation as on TAIR/Plant Physiology page
unrealistic:
formalized annotation of individual experiments
NOTE: If given a choice between a bare minimum now and useful additional information sometime in the future I (Lukasz) would take the former as consistently providing DB all DB identifiers already saves a LOT of work
  • SGD
  • TAIR
  • RGD
  • UCL
  • UniProtKB
  • WormBase
  • Zfin

Suggested text for a letter to journal editors

(Please edit here and include your thoughts.)

  Here's a rough draft of the idea we (ie DIP, hopefully with support of 
as many databases as possible) would like to persuade the journal editors 
to (Lukasz/DIP):
 
  Recent years have seen a rapid increase in the quantity of biological 
data published in research papers. As the volume of the data increases, 
it  is of utmost importance to organize and combine it in a systematic way. 
This is one of the primary roles of the numerous biological databases: RCSB, 
GenBank, UniProt, SwissProt, DIP, IntAct, MINT, SGD (yeast), FlyBase, 
WormBase, TAIR (Arabidopsis), RGD (rat) and many others.
  With the exception of RCSB and GenBank, where direct data deposition by 
the authors is imposed by journal editors and/or funding agencies, 
biological databases generally depend on curators to manually extract individual pieces 
of information from research papers for database deposition. This curation is
labor-intensive, and curators agree that the major stumbling block to 
efficient curation of biological literature is incomplete and/or ambigous 
information about the identity of the biomolecules and genes studied.  Every curator can provide horror stories of tracing the 
identity of a single protein used in a paper through a chain of 'prepared 
as described in...' and 'obtained as a gift from...' phrases only to 
discover at the end of the trail that it is still impossible to identify 
the protein as coming from human or rat without contacting the authors (1). 
The problem seems to be universal across every journal and every database 
with which we have a contact.
  Over the years, a number of researchers have raised this issue in 
numerous commentaries, reviews and editorials, mostly without any response. 
Two recent initiatives, however, seem to suggest that the situation is 
changing. TAIR initiated a partnership with Plant Physiology journal (2) 
aimed at capturing as much functional data as possible with minimal burden 
imposed on both journal editorial office and the authors. Similarly FEBS 
Letters (3), in collaboration with the MINT database attempts to recruit 
authors to capture protein interaction data.
  Whereas we are quite excited about these two attempts, we realize that 
the scope of the problem is much broader.  Curation efforts of many 
individual databases would become instantaneously more efficient if a list 
of biomolecules and genes, each with a reference to the relevant database, 
were published as a simple electronic supplement available to every journal 
reader. We believe it would translate into rapid dissemination of the 
information from such papers to many diverse databases. As every database 
references the original source of data, the supplement would improve database 
coverage and increase the visiblity of both individual articles and the 
journals in which they are published.
  We (as DIP, but also as a member of the biocurator forum that includes CGD, 
dictyBase, DIP, SGD, TAIR, RGD, UniProt, WormBase, Zfin; and as a member of 
the IMEx consortium of interaction databases grouping DIP, IntAct, MINT, 
MPact, BioGRID) wonder if Journal XXX would be willing to implement a policy 
requiring the authors of the accepted papers to prepare, with the help of the 
database community, an electronic supplement file listing all the biomolecules 
and genes studied in the manuscript. One possible approach would be to implement 
a form similar to the one prepared by TAIR for Plant Physiology:
     http://www.aspb.org/publications/tairsubmission.cfm
that would produce, as the output, a file to be included within electronic 
supplement.
 
References: 
(1) most recent example from PNAS:
    Bartsch S, Monnet J, Selbach K, Quigley F, Gray J,
    von Wettstein D, Reinbothe S, Reinbothe C
    PNAS 105(12):4933-8 (2008)
    Three thioredoxin targets in the inner envelope membrane of 
    chloroplasts function in protein import and chlorophyll
    metabolism.

   There's absolutely to way to identify Trx protein used in
   experiments described in Fig 1. The result is 6 interactions of
   this protein are lost for DIP, IntAct, MINT databases; the same
   holds for any other functional data reported in the paper.
 
(2) Plant Physiology 146:1022-1023 (2008)
   Plant Physiology and TAIR Partnership

(3) Superti-Furga G, Wieland F, Cesareni G
   Finally: The digital, democratic age of scientific abstracts
   FEBS Letters 582(8),1169