Drosophila melanogaster (Retired): Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 13: Line 13:


1. Publications that are already associated with the gene in FlyBase.  
1. Publications that are already associated with the gene in FlyBase.  
Only primary research literature is included in this list (reviews,
Only primary research literature is included in this list (reviews,
abstracts, personal communications, FlyBase analysis refs, Drosophila
abstracts, personal communications, FlyBase analysis refs, Drosophila
Line 27: Line 25:
Where there are very few or no PubMed hits, a full text search is
Where there are very few or no PubMed hits, a full text search is
carried out at PubMed Central.  
carried out at PubMed Central.  


3. Any additional publications that are cited within other publications
3. Any additional publications that are cited within other publications
Line 34: Line 31:


==== M - Triaged papers ====
==== M - Triaged papers ====
FlyBase do not triage papers in the course of normal curation and no
FlyBase do not triage papers in the course of normal curation and no
link is made between papers and genes until the paper is fully curated.
link is made between papers and genes until the paper is fully curated.
Line 39: Line 37:
this project and the process varies somewhat depending on the number of
this project and the process varies somewhat depending on the number of
publications and the status of our current GO data...
publications and the status of our current GO data...


For a low number of publications (<10), I review the titles and
For a low number of publications (<10), I review the titles and
abstracts of the PubMed hits and eliminate only the obvious false
abstracts of the PubMed hits and eliminate only the obvious false
positives. In this case M is very similar to N.   
positives. In this case M is very similar to N.   


For a modest number of publications (<100), I review the titles and
For a modest number of publications (<100), I review the titles and
Line 50: Line 46:
priorities the ones that look most promising for GO data. In this case
priorities the ones that look most promising for GO data. In this case
the difference between M and N is more marked.
the difference between M and N is more marked.


For a large number of publications (>100), even triaging them is too
For a large number of publications (>100), even triaging them is too
Line 59: Line 54:
sufficient to achieve 'complete' annotation. For these cases, M = N as
sufficient to achieve 'complete' annotation. For these cases, M = N as
there is no triage.  
there is no triage.  


Note: TAS/NAS annotations are removed where appropriate for all genes,
Note: TAS/NAS annotations are removed where appropriate for all genes,

Revision as of 14:36, 8 February 2007

Return to Reference Genome Annotation Project Main Page

Return to Reference Genome Publication Counts



L - Total Publications

This is the total number of publications about the gene based on 3 sources:

1. Publications that are already associated with the gene in FlyBase. Only primary research literature is included in this list (reviews, abstracts, personal communications, FlyBase analysis refs, Drosophila Information Service reports etc are excluded). These papers are known to contain some mention of the gene because they have already been curated by FlyBase however that curation may pre-date GO.

2. Publications (excluding reviews) identified in a PubMed search (title/abstract/MeSH terms, for all identifiers associated with that gene and/ or gene product in Drosophila. Some searches are modified (e.g. short synonyms ignored) to reduce the number of false positives. Where there are very few or no PubMed hits, a full text search is carried out at PubMed Central.

3. Any additional publications that are cited within other publications and were used for GO annotation. Other papers that were 'skimmed' in the annotation process are not included in the total pubs count.

M - Triaged papers

FlyBase do not triage papers in the course of normal curation and no link is made between papers and genes until the paper is fully curated. As a result, I've had to work out my own triage process to deal with this project and the process varies somewhat depending on the number of publications and the status of our current GO data...

For a low number of publications (<10), I review the titles and abstracts of the PubMed hits and eliminate only the obvious false positives. In this case M is very similar to N.

For a modest number of publications (<100), I review the titles and abstracts of the PubMed hits, eliminate the obvious false positives and priorities the ones that look most promising for GO data. In this case the difference between M and N is more marked.

For a large number of publications (>100), even triaging them is too time consuming and I tackle annotating these genes via reviews. Historically, FlyBase has curated a lot of GO data from reviews; where there are many existing NAS or TAS annotations, I would normally start by tracing those back to the source - frequently this approach is sufficient to achieve 'complete' annotation. For these cases, M = N as there is no triage.

Note: TAS/NAS annotations are removed where appropriate for all genes, not just those associated with many pubs.

N - Number of papers read

As it says but this number may be smaller than the actual number read (see *L* 3. above). FlyBase has been routinely curating GO data since 2000; papers associated with the gene that have been curated by FlyBase from 2000 are included in this count even though they may not have been read as part of this curation initiative.

O - Number of papers producing GO annotations

The total number of newly and previously curated papers producing GO annotation.