Here is the GOA group's practice of counting publications.

We only fill in columns *L* and **O**- reasons below. If it is absolutely necessary for the metrics to succeed, we would be willing to count the no. of publications read (column N), however, this would take more effort on our part and slow down the curation process somewhat. If columns L and O are sufficient for your calculations then we will continue only to provide those numbers, if you feel strongly about one of the other columns also being completed, then we will start to fill in column N - but note that we will not be able to complete this column for the proteins already annotated.

L - Total Publications

Total no. of papers found as a result of a PubMed search using gene/protein name and synonyms and limiting the search for human genes and the English language. **This number will contain *many* false positives since the references contained in this list could be a) reviews, b) clinical papers which describe the human disease but not any functional information about the gene/protein or c) papers that merely mention the human gene, i.e. 'the mouse gene is homologous to the human gene', but the paper is actually about the mouse gene.

M - Triaged papers

We do not count a triaged set of papers with GO annotation or even those specifically about the human protein. We do a basic triage by finding papers potentially suitable for curation by reading the title and abstract of the list of papers from the PubMed search, these are not counted as a triaged number since quite often we find more citations whilst reading these papers - see *N* below - or, when reading the paper in full, it becomes obvious that this paper is not about the human protein. We also look at what is already known about the protein, e.g. by looking at GeneRifs, to decide which papers to read and generally read the most recent papers first.

N - Number of papers read

We do not count the number of papers read since this is not the same number as the number of papers selected to read from the PubMed list (described in *M* above). Very often a citation not in the original PubMed list will appear in a paper and that paper quickly scanned to see if there is any relevant information in it to curate - we consider this as 'reading a paper' as well but don't keep track of how many times it is done. Also, as mentioned above, we could read a paper in full only to discover that they have used the mouse protein instead of human.

O - Number of papers producing GO annotations

No. of papers actually used to produce a GO annotation