Biological Pathways as GO-CAMs

From GO Wiki
Jump to: navigation, search

There is a vast amount of computable knowledge captured in biological pathway databases. The project(s) described here seek to convert that knowledge into GO-CAMs. These conversions directly expand the content of the GO-CAM knowledge base and can be used to provide GO-CAM curators with 'seeds' from which new models can grow. The main repository for code related to this project is https://github.com/geneontology/pathways2GO

Mapping from Reactome pathways to GO-CAM models

As one of the largest and most actively curated pathway databases, Reactome provides a good starting point for building and testing the conversion process. Using the BioPAX export as input the pathways2GO code generates valid OWL GO-CAM models using a combination of PaxTools (for handling biopax) and the OWL-API (for building the OWL models). Details about the mapping process and results are currently located in:

Alignment between RHEA, Reactome, and GO

Once a pathway is converted into the OWL structure of a GO-CAM, OWL reasoners such as Arachne and ELK can be applied to infer class membership for component reactions (as GO molecular functions) and pathways (as biological processes). Classifying the members of GO-CAM models with the GO makes it possible to query the integrated knowledge base using the structure of the GO. For example, queries like 'show all genes involved in Wnt signaling' can leverage the knowledge in pathway databases (e.g., the Wnt pathway from Reactome) but query using GO terms (e.g., Wnt signaling pathway, planar cell polarity pathway GO:0060071 and its parent Wnt signaling pathway GO:0016055).

In this example, Reactome curators have already provided the mapping to GO classes, but this is not always the case. Within Reactome, which is one of the most intensely manually curated pathway knowledge bases with the deepest connections to the GO, only about 50% of the reactions and 50% of the pathways are mapped to GO terms. In many other databases, e.g., many of those collected in the Pathway Commons collection, there are no mappings at all. Apart from manually adding these classifications (a solution that has problems with cost, consistency and scale), it is possible to infer them automatically based on logical definitions (written as OWL axioms) for GO terms. While there are many such definitions, there are large gaps in areas of importance to pathways such as Catalytic Activity and Binding.

The goal of this project is to leverage the RHEA database of biochemical reactions to construct logical definitions for the children of Catalytic Activity. This should help automate the classification of Pathway components, thus facilitating integration into the GO-CAM knowledge base. Further, it should help improve the structure of this branch of the GO.

Logically defining Catalytic Activity

The definitions are structured primarily based on the inputs and outputs of the reaction. For example, a logical definition for the GO term ‘nucleoside phosphate kinase activity’ amounts to the rule:

If 
X has type 'catalytic activity'
and X has input ATP and X has input nucleoside 5'-monophosphate
and X has output ADP and X has output nucleoside 5'-diphosphate
Then 
 X has type nucleoside phosphate kinase activity'

This definition can be extracted automatically from the RHEA database entry which is an xref of the GO term nucleoside phosphate kinase activity.

Mapping from GO-CAMS to BioPAX models

Work in progress.