BioCurator Discussion Topics: Difference between revisions
Jump to navigation
Jump to search
Line 37: | Line 37: | ||
Here's a rough draft of the idea we (ie DIP, hopefully with support of | Here's a rough draft of the idea we (ie DIP, hopefully with support of | ||
as many databases as possible) would like to persuade the journal editors | as many databases as possible) would like to persuade the journal editors | ||
to ([[User:Lukasz|Lukasz]]/DIP): | to ([[User:Lukasz|Lukasz]]/DIP): | ||
Recent years have seen a rapid increase in the quantity of biological | Recent years have seen a rapid increase in the quantity of biological | ||
data published in research papers. As the volume of the data increases, | data published in research papers. As the volume of the data increases, | ||
it is of utmost importance to organize and combine it in a systematic way. | it is of utmost importance to organize and combine it in a systematic way. | ||
This is one of the primary roles of the numerous biological databases: RCSB, | This is one of the primary roles of the numerous biological databases: RCSB, | ||
GenBank, UniProt, SwissProt, DIP, IntAct, MINT, SGD (yeast), FlyBase, | GenBank, UniProt, SwissProt, DIP, IntAct, MINT, SGD (yeast), FlyBase, | ||
WormBase, TAIR (Arabidopsis), RGD (rat) and many others. | WormBase, TAIR (Arabidopsis), RGD (rat) and many others. | ||
With the exception of RCSB and GenBank, where direct data deposition by | With the exception of RCSB and GenBank, where direct data deposition by | ||
the authors is imposed by journal editors and/or funding agencies, | the authors is imposed by journal editors and/or funding agencies, | ||
biological databases generally depend on curators to manually extract individual pieces | biological databases generally depend on curators to manually extract individual pieces | ||
of information from research papers for database deposition. This curation is | of information from research papers for database deposition. This curation is | ||
labor-intensive, and curators agree that the major stumbling block to | labor-intensive, and curators agree that the major stumbling block to | ||
efficient curation of biological literature is incomplete and/or ambigous | efficient curation of biological literature is incomplete and/or ambigous | ||
information about the identity of the biomolecules and genes studied. Every curator can provide horror stories of tracing the | information about the identity of the biomolecules and genes studied. Every curator can provide horror stories of tracing the | ||
identity of a single protein used in a paper through a chain of 'prepared | identity of a single protein used in a paper through a chain of 'prepared | ||
as described in...' and 'obtained as a gift from...' phrases only to | as described in...' and 'obtained as a gift from...' phrases only to | ||
discover at the end of the trail that it is still impossible to identify | discover at the end of the trail that it is still impossible to identify | ||
the protein as coming from human or rat without contacting the authors (1). | the protein as coming from human or rat without contacting the authors (1). | ||
The problem seems to be universal across every journal and every database | The problem seems to be universal across every journal and every database | ||
with which we have a contact. | with which we have a contact. | ||
Over the years, a number of researchers have raised this issue in | Over the years, a number of researchers have raised this issue in | ||
numerous commentaries, reviews and editorials, mostly without any response. | numerous commentaries, reviews and editorials, mostly without any response. | ||
Two recent initiatives, however, seem to suggest that the situation is | Two recent initiatives, however, seem to suggest that the situation is | ||
changing. TAIR initiated a partnership with Plant Physiology journal (2) | changing. TAIR initiated a partnership with Plant Physiology journal (2) | ||
aimed at capturing as much functional data as possible with minimal burden | aimed at capturing as much functional data as possible with minimal burden | ||
imposed on both journal editorial office and the authors. Similarly FEBS | imposed on both journal editorial office and the authors. Similarly FEBS | ||
Letters (3), in collaboration with the MINT database attempts to recruit | Letters (3), in collaboration with the MINT database attempts to recruit | ||
authors to capture protein interaction data. | authors to capture protein interaction data. | ||
Whereas we are quite excited about these two attempts, we realize that | Whereas we are quite excited about these two attempts, we realize that | ||
the scope of the problem is much broader. Curation efforts of many | the scope of the problem is much broader. Curation efforts of many | ||
individual databases would become instantaneously more efficient if a list | individual databases would become instantaneously more efficient if a list | ||
of biomolecules and genes, each with a reference to the relevant database, | of biomolecules and genes, each with a reference to the relevant database, | ||
were published as a simple electronic supplement available to every journal | were published as a simple electronic supplement available to every journal | ||
reader. We believe it would translate into rapid dissemination of the | reader. We believe it would translate into rapid dissemination of the | ||
information from such papers to many diverse databases. As every database | information from such papers to many diverse databases. As every database | ||
references the original source of data, the supplement would improve database | references the original source of data, the supplement would improve database | ||
coverage and increase the visiblity of both individual articles and the | coverage and increase the visiblity of both individual articles and the | ||
journals in which they are published. | journals in which they are published. | ||
We (as DIP, but also as a member of the biocurator forum that includes CGD, | We (as DIP, but also as a member of the biocurator forum that includes CGD, | ||
dictyBase, DIP, SGD, TAIR, RGD, UniProt, WormBase, Zfin; and as a member of | dictyBase, DIP, SGD, TAIR, RGD, UniProt, WormBase, Zfin; and as a member of | ||
the IMEx consortium of interaction databases grouping DIP, IntAct, MINT, | the IMEx consortium of interaction databases grouping DIP, IntAct, MINT, | ||
MPact, BioGRID) wonder if Journal XXX would be willing to implement a policy | MPact, BioGRID) wonder if Journal XXX would be willing to implement a policy | ||
requiring the authors of the accepted papers to prepare, with the help of the | requiring the authors of the accepted papers to prepare, with the help of the | ||
database community, an electronic supplement file listing all the biomolecules | database community, an electronic supplement file listing all the biomolecules | ||
and genes studied in the manuscript. One possible approach would be to implement | and genes studied in the manuscript. One possible approach would be to implement | ||
a form similar to the one prepared by TAIR for Plant Physiology: | a form similar to the one prepared by TAIR for Plant Physiology: | ||
http://www.aspb.org/publications/tairsubmission.cfm | http://www.aspb.org/publications/tairsubmission.cfm | ||
that would produce, as the output, a file to be included within electronic | that would produce, as the output, a file to be included within electronic | ||
supplement. | supplement. | ||
References: | References: | ||
(1) most recent example from PNAS: | (1) most recent example from PNAS: | ||
Bartsch S, Monnet J, Selbach K, Quigley F, Gray J, | Bartsch S, Monnet J, Selbach K, Quigley F, Gray J, | ||
von Wettstein D, Reinbothe S, Reinbothe C | von Wettstein D, Reinbothe S, Reinbothe C | ||
Line 103: | Line 103: | ||
holds for any other functional data reported in the paper. | holds for any other functional data reported in the paper. | ||
(2) Plant Physiology 146:1022-1023 (2008) | (2) Plant Physiology 146:1022-1023 (2008) | ||
Plant Physiology and TAIR Partnership | Plant Physiology and TAIR Partnership | ||
(3) Superti-Furga G, Wieland F, Cesareni G | (3) Superti-Furga G, Wieland F, Cesareni G | ||
Finally: The digital, democratic age of scientific abstracts | Finally: The digital, democratic age of scientific abstracts | ||
FEBS Letters 582(8),1169 | FEBS Letters 582(8),1169 |
Revision as of 20:19, 18 April 2008
Project Information.
(Please edit below and add information about your project. Add your project if it is not listed below.)
Number of papers annotated per year?
- CGD
- dictyBase
- DIP
- about 1000/year; protein-protein interactions only
- SGD
- TAIR
- RGD
- UCL
- UniProtKB
- WormBase
- Zfin
What should be in a publication?
- CGD
- dictyBase
- DIP
- bare minimum:
- gene/protein/EST/etc name (as used in the paper), database identifier, species (taxon id) provided for every gene/protein/DNA fragment used in the paper (including controls)
- useful(but, IMHO, optional):
- fuctional annotation as on TAIR/Plant Physiology page
- unrealistic:
- formalized annotation of individual experiments
- NOTE: If given a choice between a bare minimum now and useful additional information sometime in the future I (Lukasz) would take the former as consistently providing DB all DB identifiers already saves a LOT of work
- SGD
- TAIR
- RGD
- UCL
- UniProtKB
- WormBase
- Zfin
Suggested text for a letter to journal editors
(Please edit here and include your thoughts.)
Here's a rough draft of the idea we (ie DIP, hopefully with support of as many databases as possible) would like to persuade the journal editors to (Lukasz/DIP): Recent years have seen a rapid increase in the quantity of biological data published in research papers. As the volume of the data increases, it is of utmost importance to organize and combine it in a systematic way. This is one of the primary roles of the numerous biological databases: RCSB, GenBank, UniProt, SwissProt, DIP, IntAct, MINT, SGD (yeast), FlyBase, WormBase, TAIR (Arabidopsis), RGD (rat) and many others. With the exception of RCSB and GenBank, where direct data deposition by the authors is imposed by journal editors and/or funding agencies, biological databases generally depend on curators to manually extract individual pieces of information from research papers for database deposition. This curation is labor-intensive, and curators agree that the major stumbling block to efficient curation of biological literature is incomplete and/or ambigous information about the identity of the biomolecules and genes studied. Every curator can provide horror stories of tracing the identity of a single protein used in a paper through a chain of 'prepared as described in...' and 'obtained as a gift from...' phrases only to discover at the end of the trail that it is still impossible to identify the protein as coming from human or rat without contacting the authors (1). The problem seems to be universal across every journal and every database with which we have a contact. Over the years, a number of researchers have raised this issue in numerous commentaries, reviews and editorials, mostly without any response. Two recent initiatives, however, seem to suggest that the situation is changing. TAIR initiated a partnership with Plant Physiology journal (2) aimed at capturing as much functional data as possible with minimal burden imposed on both journal editorial office and the authors. Similarly FEBS Letters (3), in collaboration with the MINT database attempts to recruit authors to capture protein interaction data. Whereas we are quite excited about these two attempts, we realize that the scope of the problem is much broader. Curation efforts of many individual databases would become instantaneously more efficient if a list of biomolecules and genes, each with a reference to the relevant database, were published as a simple electronic supplement available to every journal reader. We believe it would translate into rapid dissemination of the information from such papers to many diverse databases. As every database references the original source of data, the supplement would improve database coverage and increase the visiblity of both individual articles and the journals in which they are published. We (as DIP, but also as a member of the biocurator forum that includes CGD, dictyBase, DIP, SGD, TAIR, RGD, UniProt, WormBase, Zfin; and as a member of the IMEx consortium of interaction databases grouping DIP, IntAct, MINT, MPact, BioGRID) wonder if Journal XXX would be willing to implement a policy requiring the authors of the accepted papers to prepare, with the help of the database community, an electronic supplement file listing all the biomolecules and genes studied in the manuscript. One possible approach would be to implement a form similar to the one prepared by TAIR for Plant Physiology: http://www.aspb.org/publications/tairsubmission.cfm that would produce, as the output, a file to be included within electronic supplement. References: (1) most recent example from PNAS: Bartsch S, Monnet J, Selbach K, Quigley F, Gray J, von Wettstein D, Reinbothe S, Reinbothe C PNAS 105(12):4933-8 (2008) Three thioredoxin targets in the inner envelope membrane of chloroplasts function in protein import and chlorophyll metabolism. There's absolutely to way to identify Trx protein used in experiments described in Fig 1. The result is 6 interactions of this protein are lost for DIP, IntAct, MINT databases; the same holds for any other functional data reported in the paper. (2) Plant Physiology 146:1022-1023 (2008) Plant Physiology and TAIR Partnership (3) Superti-Furga G, Wieland F, Cesareni G Finally: The digital, democratic age of scientific abstracts FEBS Letters 582(8),1169