OBO-Edit: Potential Refactoring
There are several areas in which OBO-Edit might benefit from a large scale refactoring. This page lays out these possibilities, and hopefully will invite discussion amongst the users of the OBO-Edit libraries. I feel that now is the perfect time to do these refactors, because OBO-Edit libraries are being used in more and more applications. Making these changes now will have a smaller impact on the user community than any future changes, and it will be much simpler to make these changes before OBO-Edit changes hands.
Package Naming Changes Proposal
Change the prefix of all packages from org.geneontology.oboedit to org.oboedit
Break OBO-Edit into the following packages (and corresponding jars):
- This package would include all the general-purpose code for representing and manipulating OBO ontologies. All of this code would be application agnostic and would include NO gui elements.
- org.bbop (or something - this could go into org.geneontology) (or all of org.geneontology could be dumped into this package)
- This package would contain all the gui pieces that I'm calling the BBOP Application Framework. These packages would have NO dependencies on org.obo or org.oboedit, but would contain pieces from OBO-Edit that would be useful for rapidly developing ANY future swing application, including the general plugin system, the background task system, the component system, and the docking system.
- This package would contain all the OBO-Edit-specific gui code, including OBO-Edit startup tasks, OBO-Edit GUI components, settings, resources, etc.
These changes would have little impact on OBO-Edit, although there may be a certain amount of work in removing unexpected dependencies from components that ought to be indepent. It will have a much greater impact on Phenote & other users of the OBO-Edit libraries, because they'll break everyone's code who uses OBO-Edit libraries. On the other hand, in most cases fixing the problems will just be a matter of changing import statements, and most of those can be done with a search and replace.
Time taken, outside code broken.
These changes should make it much clearer what each of the pieces of OBO-Edit actually does, and will enforce independence between pieces of code that should not be interdependent.
Possible Changes to the Datamodel
I'm not necessarily advocating the following change, but I'm putting it on the table for debate. I'd like to hear people's reactions to these changes on the discussion page or via email so we can decide on the best course of action.
Weak version: Merging together the sub-interfaces of AnnotatedObject
Currently, the commonly-used objects in OBO-Edit are broken into many, many sub-interfaces. Effectively, there's one sub-interface for each kind of metadata (ie there's a SynonymedObject that provides methods for working with synonyms, a DbxrefedObject for working with Dbxrefs, etc). All of these sub-interfaces could be merged into one big interface (probably the grouping interface currently called AnnotatedObject).
Strong version: Merge together sub-interfaces of AnnotatedObject AND merge AnnotatedObject into LinkedObject
The proposal above still keeps LinkedObject separate from AnnotatedObject. LinkedObject is an interface that contains the methods for fetching term children and parents. Note that this is a much bigger move than the one proposed above, because AnnotatedObject ONLY contains metadata; that is, data that does not affect the structure (and hence computable meaning) of the ontology. If these are merged into LinkedObject, we've now fused the metadata and structural data interfaces, which will make our datamodel much more inflexible and hard to adapt to changes in the OBO format.
This will have little impact on external tools that use the GO datamodels, but it would require a lot of changes to OBO-Edit. It would probably require a week of work in OBO-Edit.
Much easier to approach for new developers, less explicit casting will be necessary in the code.
Much less flexible datamodel, especially if OBO introduces novel concepts not currently supported in OBO-Edit.
Currently, if a term is an intersection, the genus/differentia definition for that intersection is encoded using a special kind of Link object, and those links are stored with the regular parent/child links for that term. This made sense initially, but it has led to a large amount of code that iterates over a term's parents or children and has to make a special check to ignore these intersection links.
Proposal: Create another interface
I suggest we create yet another datamodel interface: IntersectionObject with the following methods:
- Collection<Link> getIntersectionLinks();
and the related OBO format bookkeeping methods (if necessary - see below). If we adopt the interface merging proposals above, obviously these methods will be merged into the appropriate interface.
This would probably take 3-5 days, and may break a small amount of outside code.
- Makes it much easier to work with LinkedObjects of all kinds, particularly when actually trying to get intersections to work right.
- Simplifies the reasoner and saves memory, because the reasoner currently needs to do a pass where it separates out all these intersection links anyway
- At least 2 days to implement this
- Would probably break some of Chris's code on the intersection editor
Remove all OBO bookkeeping methods from the datamodel
OBO format is very forgiving when it comes to round-tripping unrecognized data. OBO-Edit can round-trip unrecognized tags, stanzas, and even trailing modifier data specified in braces at the end of a tag.
Currently, when an unrecognized tag is found within a [Term], [Typedef] or [Instance] stanza, the unrecognized tag is stored within the object represented by that stanza. When a trailing modifier is detected, the trailing modifier data is stored and retrieved using a method specific to trailing modifier data in that specific tag.
This leads to a large amount of specialized data adapter information being stored in the OBO-Edit datamodel itself, and nearly doubles the amount of methods in the datamodels.
Weak version: Store all this metadata in OBOSession instead
Currently, unrecognized stanzas (and their associated tags) are stored in the OBOSession. Using a hash, we could also store unrecognized tags and trailing modifiers in the OBOSession, and remove all the special purpose methods from the other datamodel classes.
Strong version: Store all this metadata in an AdapterMetaData object
This information is not proper datamodel information at all - it's really special-purpose bookkeeping data for a particular data adapter. OBO-Edit already has a mechanism for dealing with other OBO-specific bookkeeping data: the OBOFileAdapter has a method called getMetaData that returns an object called OBOMetaData which holds various pieces of obo-specific information (for example, OBOMetaData knows the import dependency graph for the most recently loaded collection of files. This information is not relevant to the meaning of the ontology, but is used for configuration purposes by Phenote).
I suggest that we move all unrecognized tag and trailing modifier data into the OBOMetaData object, where it can be easily retrieved by people who need it, but is kept safely out of sight from regular users.
Point of interest: we could also generalize this system so that every data adapter could return its own adapter-specific metadata object.
Either version of this proposal would probably take about 3 days.
It becomes less straightforward to access OBO meta data, particularly trailing modifiers.
The datamodel becomes far cleaner and it becomes much easier to support new OBO tags (because there's no need to code per-tag methods for storing trailing modifier data).