TAIR December 2012

TAIR, The Arabidopsis Information Resource, December 2012

WORK IN PROGRESS, HAS NOT BEEN COMPLETED, DO NOT USE NUMBERS

1. Staff working on GOC tasks

Tanya Berardini, Donghui Li

The total number of FTE working on GOC tasks is 1.4.

2. Annotation progress

Dates are 11/11 = 11/03/2011 and 12/12 = 12/06/2012.

Table 1: Number of Annotations to Various GO Aspects

Annotations	BP (11/11)	BP (12/12)	change	MF (11/11)	MF (12/12)	change	CC (11/11)	CC (12/12)	change
non-IEA/non-ND	19401	Fill	+ Fill	12293	Fill	+ Fill	19841	Fill	+ Fill
IEA	11276	Fill	+ Fill	18479	Fill	- Fill	10505	Fill	+ Fill
ND	9904	Fill	- Fill	8335	Fill	- Fill	10237	Fill	- Fill

Table 2: Number of Genes Annotated to Various GO Aspects

Genes	BP (11/11)	BP (12/12)	change	MF (11/11)	MF (12/12)	change	CC (11/11)	CC (12/12)	change
non-IEA/non-ND	8463	Fill	+ Fill	7410	Fill	+ Fill	8661	Fill	+ Fill
IEA	6620	Fill	+ Fill	7672	Fill	- Fill	6148	Fill	- Fill
ND	9904	Fill	- Fill	5270	Fill	- Fill	9116	Fill	- Fill

3. Methods and strategies for annotation

a. Literature curation: We continue to put most of our annotation effort (95%) into annotation of gene products from the literature.

b. Computational annotation strategies: With every genome release, we run two computational GO annotation pipelines, one based on INTERPROtoGO mapping and the other based on a TargetP analysis. These results are integrated into our GO annotation file. This represents roughly 5% of our annotation effort.

c. Integration of non-TAIR Arabidopsis annotations: We integrate GOA Arabidopsis GO annotations into our gene association file so that all Arabidopsis annotations, regardless of original source, are now relayed to GO via TAIR with the appropriate source attribution. The following types of annotations are included in our Arabidopsis gene association file:

 1. Literature-based annotations made by TAIR curators
 2. Community annotations made via TAIR's TOAST annotation tool (see below)
 3. GOA annotations for Arabidopsis with experimental evidence codes
 4. PAINT-based Arabidopsis annotations from RefGenome group
 5. Function-Process link-based annotations from GOC 
 6. TIGR's annotations from Arabidopsis functional annotation project

d. Priorities for annotation:

 1. literature describing the characterization of previously undescribed ('novel') genes, 
 2. genes that do not have any GO annotations at all (none of the three aspects),
 3. recent literature from high impact factor journals

e. Review of user-submitted annotations (see TOAST section below)

 Donghui and Tanya review the annotations submitted via TOAST, making sure that terms were mapped correctly and that the 
 proper evidence_with information is entered, if necessary.  Sometimes, follow-up with the submitter via email is necessary.

4. Presentations and publications

GO 2012 Publications, Talks, Posters

Publications

Tanya Z. Berardini, Donghui Li, Robert Muller, Raymond Chetty, Larry Ploetz, Shanker Singh, April Wensel, and Eva Huala (2012) Assessment of community-submitted ontology annotations from a novel database-journal partnership. Database doi:10.1093/database/bas030

Donghui Li, Tanya Z. Berardini, Robert Muller and Eva Huala (2012) Building an efficient curation workflow for the Arabidopsis literature corpus. Database, in press.

Van Auken, Kimberly; Berardini, Tanya; Dodson, Robert; Cooper, Laurel; Li, Donghui; Chan, Juancarlos; Li, Yuling; Basu, Siddhartha; Mueller, Hans-Michael; Chisholm, Rex; Huala, Eva; Sternberg, Paul (2012) Text Mining in the BioCuration Workflow: Applications for Literature Curation at WormBase, dictyBase, and TAIR. Database, in press.

Chih-Hsuan Wei, Bethany R. Harris, Donghui Li, Tanya Z. Berardini, Eva Huala, Hung-Yu Kao and Zhiyong Lu (2012) Accelerating literature curation with text mining tools: A case study of using PubTator to curate genes in PubMed abstracts. Database, in press.

Talks

Donghui Li, From experimental data to structured knowledge: Literature curation workflow at The Arabidopsis Information Resource. 5th International Biocuration Conference (BioCreative), Washington, DC, USA, April 2-4, 2012.

5. Other Highlights

A. Ontology Development Contributions

GO terms contributed by TAIR

NEED TO UPDATE Donghui Li has submitted 58 SourceForge term requests on behalf of TAIR curators from December 2009 to December 2010 (each request may contain multiple terms). Of these 58 requests, 52 have been closed. 56 new GO terms have been created.

Other ontology development work

Tanya Berardini:

continues to participate in creating cross-products for terms within and among the three GO namespaces
participated in LEGO prototyping and modeling
participated in the project to align GO with ChEBI
working on aligning the Plant Ontology (PO) with GO with respect to anatomical structures used in the development branch of the biological process ontology
participates in the rota for Sourceforge request (4 total people in rota, one week at a time, weekly conference call)
participates in the rota for the gatekeeper for the TermGenie requests (4 total people in rota, one week at a time)
attends weekly GO editors conference call

Donghui Li:

attends regular GO annotation conference calls as the TAIR representative

B. Annotation outreach and user advocacy efforts

PAINT annotation

Donghui Li does PAINT-based annotation.

TOAST (TAIR Online Annotation Submission Tool)

TAIR continues to collect controlled vocabulary annotations via its online tool. TOAST

TAIR can accept annotations based on any journal article, regardless of the journal it was published in, provided that the article has a DOI or a PMID. Submitters must be registered at TAIR.

GO help

Tanya Berardini continues to participate in manning the GO helpdesk. This involves answering the questions that come in through gohelp@geneontology.org or forwarding them to the appropriate parties for response. There are 8 GOC curators that rotate this task, one week at a time.

C. Other highlights - none

Donghui Li continues to serve as a member of the BioCreative User Advisory Group. This involves defining the task for BioCreative V (development of modules to aid GO curators in identifying articles with curatable GO information (triage) and extracting gene function terms and the associated evidence sentences in full-length articles) as well as developing a strategy to create a gold-standard annotated literature corpus for text mining.