Ontology Development Progress Report December 2010: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
(Created page with '== Ontology Development== ===Metrics=== ====GO term statistics==== '''November 24, 2010''' {| border="1" cellpadding="5" cellspacing="0" {| border="1" cellpadding="5" cellspacing…')
 
No edit summary
 
(33 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Ontology Development==
[[Category:Reports]]
===Metrics===
==Metrics==
====GO term statistics====
===GO term statistics===
'''November 24, 2010'''
'''November 30, 2009'''
{| border="1" cellpadding="5" cellspacing="0"
{| border="1" cellpadding="5" cellspacing="0"
{| border="1" cellpadding="5" cellspacing="0"
! !!Current!!Defined!!Obsolete!!Total
! !!Current!!Defined!!Obsolete!!Total
|-
|-
|Function||-||-||-||-
|Function||8657||8474||798||9455
|-
|-
|Process||-||-||-||-
|Process||17533||17465||508||18041
|-
|-
|Component||-||-||-||-
|Component||2613||2613||117||2731
|-
|-
|'''All'''||'''-'''||'''-'''||'''-'''||'''-'''
|'''All'''||'''28803'''||'''29976'''||'''1424'''||'''30227'''
|}
|}


'''November 30, 2009'''
 
'''November 24, 2010'''
{| border="1" cellpadding="5" cellspacing="0"
{| border="1" cellpadding="5" cellspacing="0"
{| border="1" cellpadding="5" cellspacing="0"
! !!Current!!Defined!!Obsolete!!Total
! !!Terms
|-
|-
|Function||8657||8474||798||9455
|Function||8898
|-
|-
|Process||17533||17465||508||18041
|Process||19900
|-
|-
|Component||2613||2613||117||2731
|Component||2773
|-
|-
|'''All'''||'''28803'''||'''29976'''||'''1424'''||'''30227'''
|'''All'''||'''31571'''
|}
|}


Note that all GO terms are now defined. The numbers above do not include the 1460 obsolete terms.
===Tracker statistics (Nov. 30, 2009 - Nov 24, 2010)===
*items opened: 1158
*items closed: 1242
===Tracker report===
[[File:SF analytics 2010.gif|Tracker Report for 2010]]
==Ontology development==
===[[Cross_Product_Guide|Internal Cross Products]]===
We have made considerable progress this year on creating cross-products for GO terms. The first set of cross-products, between regulatory processes and regulated processes or functions, were added to the GO file at the beginning of 2010. Subsequently, two further sets have been added: biological processes involved in other biological processes, and cellular components that are part of other cellular components.
====Term Genie====
As a result of these changes, we have been able to develop a tool – TermGenie – that allows users to add new GO terms that conform to a cross-product template directly to the ontologies. Terms are automatically placed correctly within the ontology, and textual definitions and synonyms are automatically generated. This tool reduces the workload for ontology editors and helps reduce human error in the ontologies.
===[[Chemical_terms_in_GO|External cross-products: CHEBI alignment]]===
The biggest effort this year has gone into aligning GO with the Chemical Entities of Biological Interest (CHEBI) ontology, with the aim of generating cross-products between GO and CHEBI. This work involved a 2-day meeting in July with the CHEBI ontology developers to reconcile some of the critical differences between the two ontologies. This project requires major changes to both GO and CHEBI and we hope the first CHEBI cross-products will be added to GO early in 2011. We have finished a first draft of a paper on this project to go to Nature Chemical Biology.
===OBO Foundry===
We are active members of the OBO Foundry and earlier this year GO became one of the founder set of OBO Foundry ontologies, having undergone peer review and found to meet the agreed OBO Foundry standards.


====SourceForge statistics (Dec 1, 2008 - Nov. 30, 2009)====
===[[Function-Process_Links|Function-Process links]]===
*items opened: 1085
We continued to make relationships between the function and process ontologies - links between the transporters and transport terms were completed in June.
*items closed: 1071
====SourceForge reports (on SF site)====
*[https://sourceforge.net/tracker/reporting/index.php?atid=440764&what=aging&span=12&period=month&group_id=36855| "Aging Report" for past year]
*[https://sourceforge.net/tracker/reporting/index.php?atid=440764&what=resolution&span=12&period=month&group_id=36855| "Distribution by Resolution" for past year]


===Ontology development===
===Mappings===
In 2008, we introduced three new relationship types -- ''regulates'', ''negatively_regulates'' and ''positively_regulates'' -- to represent biological regulation in the Biological Process (BP) ontology. We have now extended the use of these relations, adding regulates relationships within the Molecular Function ontology and between the Biological Process and Molecular Function ontologies; the latter are the inter-ontology links in the GO vocabularies. The new links make previously implicit regulatory relationships explicit between 'regulation of molecular function' Biological Process terms and the corresponding Molecular Function terms [e.g. regulation of kinase activity (BP) regulates kinase activity (MF)], and between terms within the Molecular Function ontology [e.g. calcium channel regulator activity (MF) regulates calcium channel activity (MF)]. Like the original implementation of the regulates relations, this work used reasoning and quality control reports to ensure internal consistency between molecular functions and the molecular functions or biological processes that regulate them. For example, if a term 'regulation of function X' exists in the ontology, it must be a valid subtype of 'regulation of molecular function', and must have a regulates relationship with 'function X'. This project resulted in new terms for 402 processes that regulate molecular functions.
In 2010, over 1500 enzyme reactions in GO were synchronized with MetaCyc, KEGG, RHEA and EC. The reaction text was converted to reflect ChEBI names.


Additional Molecular Function-Biological Process links have also been introduced, and are being added gradually. The first such links to be incorporated are part_of relationships based on molecular functions that are only involved in a single process (e.g. 'protein kinase' part_of 'protein phosphorylation'). Additional part_of links will be created based on mining external pathway resources such as Reactome and MetaCyc, and curated relationships for some metabolic pathways. To accommodate cases where one molecular function may be involved in any of several different biological processes, we will add new terms for biological process-specific molecular functions; each will have an is_a relationship to a more generic molecular function term, and a part_of link to a biological process term.
===[[GO_slim_overhaul|GO slims]]===
A new version of the generic (non-species specific) GO slim was developed this year, a draft is currently available.


We have recently introduced another new relationship type, has_part, which represents a partrspective of the parent, and is thus the logical complement of the part_of relation. As with part_of, the GO relation has_part is only used in cases where A always has B as a part, i.e. where A necessarily has part B. The has_part relation allows GO to accurately represent cases where different wholes may have parts in common (e.g. one biological process may always be a subprocess of two different larger processes).
===[[Taxon_Main_Page|Taxon Triggers]]===
The taxon trigger file is a set of taxonomic restrictions for specific GO terms that is used for automatic quality control of annotations. The file resides in [http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/quality_control/annotation_checks/taxon_checks/taxon_go_triggers.obo cvs] and is edited as new restrictions are made. A [http://www.biomedcentral.com/1471-2105/11/530 paper] on this project has recently been published in BMC Bioinformatics.


GO curators have thoroughly reviewed several sets of "internal" cross-products (i.e. those that refer to terms that are represented by the combination of other terms within the three GO ontologies), and will begin releasing them into the authoritative version of the GO in January 2010. The first internal cross-products to be released will provide computable definitions for regulation terms and for some biological process terms that can be defined by referring to other terms within the same branch of GO. Additional internal cross-products defining cellular component terms will be released later in 2010. Cross-products between GO and external ontologies such as ChEBI and the OBO Cell ontology are pending; using a pilot set of GO-ChEBI cross-products, we have added missing links and corrected some inconsistencies within the Biological Process ontology.
==Content Projects==
====[[Transcription]]====
A major overhaul of the transcription process and function terms was made this year. This project involved one curator attending the Gene Transcription in Yeast meeting in Sant Feliu de Guixols in Spain in late June, to obtain expert input. Several rounds of changes have been made to the ontologies so far, and we expect the project to be completed by the end of the year.


A content meeting on heart development was held in London, UK, on September 22, 2009. Participants added 250 new terms and modified 12 existing terms. The ontology now describes heart development beginning with the signalling events that specify the heart field and ending with the physiological hypertrophy that takes place after birth. Representatives from Flybase and ZFIN attended the meeting to ensure that the new terms, definitions and relationships were consistent with usage in their respective organisms.
====[[Signaling]]====
The GO signaling working group is working to expand and improve the signaling terms in GO. At the start of 2010, signaling terms in the process ontology were restructured to include different types and mechanisms of signaling, and to connect the signaling pathways with the processes they regulate. We are currently standardising these terms to ensure more consistent annotation, and have begun to refine the signaling terms in the function ontology.


We also tried a new strategy in ontology development this year in which we sent GO ontology developers to The Society for Developmental Biology Meeting for the specific purpose of developing the ontology in the area of developmental biology. Curators attended sessions at the meeting to ensure that the developmental biology portion of the BP ontology was consistent with current views in the field. New terms were added to the ontology and older terms were updated or re-arranged to fit with current understanding. This effort resulted in the addition or modification of 150 terms.
We have recently recruited three experts in signaling to help with the restructuring and to ensure that signaling is represented correctly in GO, and a two-day workshop with GO editors, GO annotators and signaling experts is scheduled for February 2011.


This strategy is also being employed by ontology developers attending the American Society for Cell Biology meeting in mid-December 2009.
====[[Kidney_Development|Kidney Development]]====
Following a 1-day meeting with renal experts in January 2010 445 new terms relating to kidney development were added to GO. The terms represent development of the various renal systems across organisms:


To support the recently funded Renal GO Annotation Initiative, we have begun adding terms describing kidney development. A content meeting is planned for January 2010.
*metanephros (mammalian; 129 terms)
*pronephros (amphibian; 24 terms)
*mesonephros (fish; 102 terms)
*renal system/ Malpighian tubule (insect; 18 terms).  


Other major changes included reorganization of Biological Process terms describing cellular component organization and addition of Biological Process terms for branching organ development. We have continued to note correlations between GO terms and taxon information, and have begun using these correlations as a set of triggers for annotation quality control. Further work is in progress on a number of topics begun in 2008, including extensive revisions to the signal transduction branch of the Biological Process ontology, as well as work on transcription terms, terms representing viral biology, and new terms for the Plant-Associated Microbe Gene Ontology (PAMGO) project.
A publication is currently in progress.


===[[Ontology Quality Control]]===
====[[Virus_terms|Viral terms]]====
We have continued to use the ontology quality control procedures previously reported, including built-in and custom OBO-Edit text and ontology structure checks, as well as reasoner-based checks that are run periodically, external to the ontology editing cycle, and generate reports that curators use to correct errors in the ontology. The release of internal cross-product definitions will allow some checks that are now run separately to be incorporated into the ontology editing and release cycle, making error correction more efficient and errors less persistent.
A project to remodel the terms related to viruses in GO was begun this year. The working group - which includes many externally collaborating groups - met several times and have come up with a broad structure for the representation of viral processes and components in GO. The new structure will be implemented in 2011.


===Ontology file management===
====Heart Development====
We have introduced a release pipeline for the publication of the ontology files, allowing us to provide an "extended" version of GO (in OBO 1.2 format) to make innovative changes available to the public, while still providing versions that are compatible with existing parsers and other tools.  At present, has_part relationships and all inter-ontology relationships (part_of and regulates links between Molecular Function and Biological Process) are filtered from the "standard" release of GO, as are a few OBO tags capturing term creation metadata. The standard release is available in OBO 1.2 and OBO 1.0 formats.
A paper describing the heart development project has been submitted to Developmental Biology

Latest revision as of 13:50, 1 July 2014

Metrics

GO term statistics

November 30, 2009

Current Defined Obsolete Total
Function 8657 8474 798 9455
Process 17533 17465 508 18041
Component 2613 2613 117 2731
All 28803 29976 1424 30227


November 24, 2010

Terms
Function 8898
Process 19900
Component 2773
All 31571

Note that all GO terms are now defined. The numbers above do not include the 1460 obsolete terms.

Tracker statistics (Nov. 30, 2009 - Nov 24, 2010)

  • items opened: 1158
  • items closed: 1242

Tracker report

Tracker Report for 2010

Ontology development

Internal Cross Products

We have made considerable progress this year on creating cross-products for GO terms. The first set of cross-products, between regulatory processes and regulated processes or functions, were added to the GO file at the beginning of 2010. Subsequently, two further sets have been added: biological processes involved in other biological processes, and cellular components that are part of other cellular components.

Term Genie

As a result of these changes, we have been able to develop a tool – TermGenie – that allows users to add new GO terms that conform to a cross-product template directly to the ontologies. Terms are automatically placed correctly within the ontology, and textual definitions and synonyms are automatically generated. This tool reduces the workload for ontology editors and helps reduce human error in the ontologies.

External cross-products: CHEBI alignment

The biggest effort this year has gone into aligning GO with the Chemical Entities of Biological Interest (CHEBI) ontology, with the aim of generating cross-products between GO and CHEBI. This work involved a 2-day meeting in July with the CHEBI ontology developers to reconcile some of the critical differences between the two ontologies. This project requires major changes to both GO and CHEBI and we hope the first CHEBI cross-products will be added to GO early in 2011. We have finished a first draft of a paper on this project to go to Nature Chemical Biology.

OBO Foundry

We are active members of the OBO Foundry and earlier this year GO became one of the founder set of OBO Foundry ontologies, having undergone peer review and found to meet the agreed OBO Foundry standards.

Function-Process links

We continued to make relationships between the function and process ontologies - links between the transporters and transport terms were completed in June.

Mappings

In 2010, over 1500 enzyme reactions in GO were synchronized with MetaCyc, KEGG, RHEA and EC. The reaction text was converted to reflect ChEBI names.

GO slims

A new version of the generic (non-species specific) GO slim was developed this year, a draft is currently available.

Taxon Triggers

The taxon trigger file is a set of taxonomic restrictions for specific GO terms that is used for automatic quality control of annotations. The file resides in cvs and is edited as new restrictions are made. A paper on this project has recently been published in BMC Bioinformatics.

Content Projects

Transcription

A major overhaul of the transcription process and function terms was made this year. This project involved one curator attending the Gene Transcription in Yeast meeting in Sant Feliu de Guixols in Spain in late June, to obtain expert input. Several rounds of changes have been made to the ontologies so far, and we expect the project to be completed by the end of the year.

Signaling

The GO signaling working group is working to expand and improve the signaling terms in GO. At the start of 2010, signaling terms in the process ontology were restructured to include different types and mechanisms of signaling, and to connect the signaling pathways with the processes they regulate. We are currently standardising these terms to ensure more consistent annotation, and have begun to refine the signaling terms in the function ontology.

We have recently recruited three experts in signaling to help with the restructuring and to ensure that signaling is represented correctly in GO, and a two-day workshop with GO editors, GO annotators and signaling experts is scheduled for February 2011.

Kidney Development

Following a 1-day meeting with renal experts in January 2010 445 new terms relating to kidney development were added to GO. The terms represent development of the various renal systems across organisms:

  • metanephros (mammalian; 129 terms)
  • pronephros (amphibian; 24 terms)
  • mesonephros (fish; 102 terms)
  • renal system/ Malpighian tubule (insect; 18 terms).

A publication is currently in progress.

Viral terms

A project to remodel the terms related to viruses in GO was begun this year. The working group - which includes many externally collaborating groups - met several times and have come up with a broad structure for the representation of viral processes and components in GO. The new structure will be implemented in 2011.

Heart Development

A paper describing the heart development project has been submitted to Developmental Biology