TAIR December 2012: Difference between revisions
Line 78: | Line 78: | ||
Donghui Li, Tanya Z. Berardini, Robert Muller and Eva Huala (2012) Building an efficient curation workflow for the Arabidopsis literature corpus. Database, in press. | Donghui Li, Tanya Z. Berardini, Robert Muller and Eva Huala (2012) Building an efficient curation workflow for the Arabidopsis literature corpus. Database, in press. | ||
Van Auken, Kimberly; Berardini, Tanya; Dodson, Robert; Cooper, Laurel; Li, Donghui; Chan, Juancarlos; Li, Yuling; Basu, Siddhartha; Mueller, Hans-Michael; Chisholm, Rex; Huala, Eva; Sternberg, Paul | Van Auken, Kimberly; Berardini, Tanya; Dodson, Robert; Cooper, Laurel; Li, Donghui; Chan, Juancarlos; Li, Yuling; Basu, Siddhartha; Mueller, Hans-Michael; Chisholm, Rex; Huala, Eva; Sternberg, Paul (2012) Text Mining in the BioCuration Workflow: Applications for Literature Curation at WormBase, dictyBase, and TAIR. Database, in press. | ||
Chih-Hsuan Wei, Bethany R. Harris, Donghui Li, Tanya Z. Berardini, Eva Huala, Hung-Yu Kao and Zhiyong Lu (2012) Accelerating literature curation with text mining tools: A case study of using PubTator to curate genes in PubMed abstracts. Database, in press. | Chih-Hsuan Wei, Bethany R. Harris, Donghui Li, Tanya Z. Berardini, Eva Huala, Hung-Yu Kao and Zhiyong Lu (2012) Accelerating literature curation with text mining tools: A case study of using PubTator to curate genes in PubMed abstracts. Database, in press. |
Revision as of 20:07, 5 December 2012
TAIR, The Arabidopsis Information Resource, December 2012
WORK IN PROGRESS, HAS NOT BEEN COMPLETED, DO NOT USE NUMBERS
1. Staff working on GOC tasks
Tanya Berardini, Donghui Li
The total number of FTE working on GOC tasks is 1.4.
2. Annotation progress
Table 1: Number of Annotations to Various GO Aspects
Annotations | BP (12/10) | BP (12/09) | change | MF (12/10) | MF (12/09) | change | CC (12/10) | CC (12/09) | change | |
---|---|---|---|---|---|---|---|---|---|---|
non-IEA/non-ND | 17690 | 15868 | + 1822 | 11219 | 10603 | + 616 | 19841 | 19209 | + 632 | |
IEA | 12095 | 10688 | + 1407 | 19293 | 19934 | - 641 | 10505 | 10452 | + 53 | |
ND | 9875 | 14284 | - 4409 | 5060 | 8813 | - 3753 | 10237 | 14501 | - 4264 |
Table 2: Number of Genes Annotated to Various GO Aspects
Genes | BP (12/10) | BP (12/09) | change | MF (12/10) | MF (12/09) | change | CC (12/10) | CC (12/09) | change | |
---|---|---|---|---|---|---|---|---|---|---|
non-IEA/non-ND | 7981 | 7385 | + 533 | 7189 | 6988 | + 201 | 7619 | 7378 | + 241 | |
IEA | 6973 | 6807 | + 166 | 7924 | 8135 | - 211 | 7538 | 7783 | - 245 | |
ND | 9875 | 14284 | - 4409 | 5059 | 8812 | - 3753 | 10233 | 14497 | - 4264 |
- Numbers of ND annotations/genes annotated decreased as annotations to pseudogenes and transposable element genes were removed after a QC check revealed that these existed.
3. Methods and strategies for annotation
a. Literature curation: We continue to put most of our effort (95%) into annotation of gene products from the literature.
b. Computational annotation strategies: With every genome release, we run two computational GO annotation pipelines, one based on INTERPROtoGO mapping and the other based on a TargetP analysis. These results are integrated into our GO annotation file. This represents roughly 5% of our annotation effort. We integrate GOA Arabidopsis GO annotations into our gene association file so that all Arabidopsis annotations, regardless of original source, are now relayed to GO via TAIR with the appropriate source attribution.
c. Priorities for annotation:
(1) literature of any age pertaining to Reference Genome genes,
(2) literature describing the characterization of previously undescribed ('novel') genes,
(3) recent literature from high impact factor journals
4. Presentations and publications
GO 2012 Publications, Talks, Posters
Donghui Li, Tanya Z. Berardini, Robert Muller and Eva Huala (2012) Building an efficient curation workflow for the Arabidopsis literature corpus. Database, in press.
Van Auken, Kimberly; Berardini, Tanya; Dodson, Robert; Cooper, Laurel; Li, Donghui; Chan, Juancarlos; Li, Yuling; Basu, Siddhartha; Mueller, Hans-Michael; Chisholm, Rex; Huala, Eva; Sternberg, Paul (2012) Text Mining in the BioCuration Workflow: Applications for Literature Curation at WormBase, dictyBase, and TAIR. Database, in press.
Chih-Hsuan Wei, Bethany R. Harris, Donghui Li, Tanya Z. Berardini, Eva Huala, Hung-Yu Kao and Zhiyong Lu (2012) Accelerating literature curation with text mining tools: A case study of using PubTator to curate genes in PubMed abstracts. Database, in press.
5. Other Highlights
A. Ontology Development Contributions
- GO terms contributed by TAIR
NEED TO UPDATE Donghui Li has submitted 58 SourceForge term requests on behalf of TAIR curators from December 2009 to December 2010 (each request may contain multiple terms). Of these 58 requests, 52 have been closed. 56 new GO terms have been created.
- Other ontology development work
NEED TO UPDATE Tanya Berardini:
- continues to participate in creating cross-products for terms within and among the three GO namespaces
- continues to participate in the project to align GO with ChEBI.
- participates in the rota for Sourceforge request
- participates in the rota for the gatekeeper for the TermGenie requests
B. Annotation outreach and user advocacy efforts
- PAINT annotation
Donghui Li does PAINT-based annotation.
- TAIR/Journal collaboration
TAIR continues to collect controlled vocabulary annotations via its online tool. TAIR Online Submission Tool
TAIR can accept annotations based on any journal article, regardless of the journal it was published in, provided that the article has a DOI or a PMID. Submitters must be registered at TAIR.
- GO help
Tanya Berardini continues to participate in manning the GO helpdesk. This involves answering the questions that come in through gohelp@geneontology.org or forwarding them to the appropriate parties for response. There are 9 GOC curators that rotate this task, one week at a time.
C. Other highlights - none