OBO-Edit: OBO Parser - Getting Started

From GO Wiki
Jump to: navigation, search

Java Event Based Parser for OBO - Getting Started:

The Java event-based parser is part of the OBO-Edit source code, in the org.geneontology.oboedit.dataadapter package. The OBO-Edit sources are available from our sourceforge CVS repository. For information on accessing the repository, go to:

   OBO-Edit: Getting the Source Code

We apologize in advance for the very sparse documentation in these source files.

The classes of interest are:

  • OBOParser
  • DefaultOBOParser
  • OBOParseEngine

Note: These classes were named GOBOParser, DefaultGOBOParser and GOBOParseEngine until very recently. They were renamed in the recent code overhaul

The basic idea is that OBOParseEngine reads and parses a Collection of OBO files to generate events (like "readID" and "readDefinition"). Each GOBOParseEngine is associated with an implementation of OBOParser. Each time OBOParseEngine generates an event, the corresponding GOBOParser method is called. Thus, if GOBOParseEngine sees the line "name: kinase" in an OBO file, it will call GOBOParseEngine.readName("kinase", null).

DefaultGOBOParser is an implementation of OBOParser that populates the OBO-Edit datamodels from an OBO file. If you want to use OBO-Edit's datamodels, you can use DefaultOBOParser like so:

    public static OBOSession getSession(String path) {
        DefaultOBOParser parser = new DefaultOBOParser();
        OBOParseEngine engine = new OBOParseEngine(parser);
        //OBOParseEngine can parse several files at once
	//and create one munged-together ontology,
	//so we need to provide a Collection to the setPaths() method
        Collection paths = new LinkedList();
        OBOSession session = parser.getSession();
        return session;

If you're populating a database, or doing something else where it would just be a waste of memory to use the OBO-Edit datamodels, you can create your own implementation of GOBOParser, and skip the datamodel generation step altogether.