Caenorhabditis elegans (Retired)
Here is a summary of how WormBase curators are filling in the publication counts in the reference genome spreadsheet (using the column labeling in the latest version of the spreadsheet).
K - Total Publications
For this metric, we count the total number of research articles associated with a gene in WormBase. We are NOT including reviews, meeting abstracts, Worm Breeder's Gazette articles, or WormBook chapters in this number.
At WormBase, genes are associated with papers in two ways: 1) via a script that automatically associates genes mentioned in the abstract, and 2) via manual association as part of our first-pass paper curation.
Ranjana and I routinely cross-check the list of papers in WormBase with those returned from both PubMed and Textpresso (http://textpresso.org) searches to make certain that no papers are missing from the WormBase bibliography.
L - Triaged papers
We don't really triage papers for GO curation. We do, however, prioritize papers for curation based upon the score they receive from Textpresso searches where we have used the gene name in a keyword search. Higher scoring papers are curated first, with lower scoring papers curated later.
Since Textpresso is very efficient, however, at returning *all* papers that mention a gene anywhere in the full text, we also use the results of PubMed searches to get a rough approximation of how many papers are likely to have GO curatable information, the rationale being that a research article returned from a PubMed search using a gene name and the word 'elegans' is likely to have significant primary research information. We record this number in column L and are investigating how good an indicator this number is for assessing potential curation workload.
M - Number of papers read
This is the number of papers we have read and checked for possible GO annotations.
N - Number of papers producing GO annotations
This is the number of papers from which we actually made annotations.