Chris's email

From GO Wiki
Jump to navigation Jump to search

Current workflow

Currently all sets of XPs are maintained by cjm, using a combination of term parsing and hand editing. The exceptions are the standard regulation XPs, which are generated entirely automatically, and the CHEBI xps, with the initial set contributed by Mike Bada.

Each XP set has its own wiki page. The complete set can be seen here: http://wiki.geneontology.org/index.php/Category:Cross_Products

cjm periodically runs the OE reasoner over individual xp sets, finds inconsistencies and missing links, submits a tracker item. This leads to some fixes to GO. If a curator rejects a reasoner suggestion then cjm goes back and fixes the XP file. Through an iterative process the XPs and core GO is improved. If external ontologies are involved there may be extended periods of lack of resolution.

Results so far show that large parts of ontology maintenance can be automated using XPs, eliminating certain classes of error, and speeding the creation of trivial is_a links. This is also borne out by experience with SO, which has for some time used XPs in conjunction with the reasoner for automated management.

The next stage is to move to curator ownership of the XP sets.

Proposed new workflow

Intermediate/Transition Stage

Each set of XPs will have one person from the ontology group as designated curator, in association with cjm. That person will be responsible for the content of the set of XPs. This includes:

  • evaluating the content of the XP files
    • This could be in the form of a report following a standard template
    • Complete?
    • Biologically accurate?
    • Understandable?
    • Logically coherent?
    • Consistent with GO?
  • using the XPs and the reasoner to find errors and missing links in GO
    • submitting tracker items for GO, and external ontologies if required
    • discussion with domain experts
    • following up on tracker items
  • helping define the relations used in those XPs
  • keeping the XPs up to date with changes in GO
  • recommending a date for incorporation into core ontology

During this stage, the xps will continue to live outside the core GO file. They will be loaded simultaneously with GO for the purposes of checking. The XPs will be filtered out when saving back to the core GO. The XP defs themselves may be edited by hand or in OE2 and then filtered back into the XP file in CVS.

Live XP Stage

Eventually XP sets will migrate from being external files to parts of the core GO. This will likely happen gradually, commencing first of all with the simplest cases (e.g regulation)

Once an XP set becomes part of the core GO, responsibility for that set becomes shared by all ontology editors. For example, once bp_xp_cc goes live, then it is the responsibility of the person adding a new cellular component organization term to make the xp def.

In fact this will be a time saver rather than an impediment. The curator will simply select the two terms plus a relation, and everything will be auto-generated: graph placement, synonyms, standard def etc.

The procedure for making an XP set live will of course include notification of the community with significant lead time.

Preparation

There is already significant detail on the wiki, including background material, an index of each xp set, and a page for each xp set. Each page follows a standard template, with a synopsis, results and discussion section. It also shows example XPs, has links in which the XPs can be viewed outside OE. There are links to outstanding tracker items that have arisen from running the reasoner over this XP set.

See

However, what is lacking is any plan or prioritization.

We can add an additional wiki table, in which people can sign themselves up for an XP set. The table can also include current status, "hardness" of the XP set etc.

Alternatively we can use the tracker here. We can assign a tracker item for an initial review of each XP set and use the tracker mechanisms to assign responsible people, prioritize etc.

After the initial review when everyone is more comfortable with the concepts and content we will jointly come up with a timetable for final implementation. (This will be easier when we have a better handle on a OE2 release date)

There should be a reasonable degree of parallelism and fluidity here. For example, a recent focus of ontology develeopment has been on cellular component organization terms. It makes sense to do the evaluation of the bp_xp_cc set simultaneously. [[1]]

At the same time, the bp_xp_cell effort has been ongoing for some time, and there is a long standing open tracker item concerning inconsistencies between GO and CL. This needs to have its priority increased. [[2]]

I recognize that OE2 could be improved in terms of both documentation and capabilities. This is particularly true for the "Live XP Stage". However, we have some time before even the regulation terms move to this stage, and the best way to improve the documentation and capabilities is for people to start using what is there and giving feedback.