Sensu Metrics

From GO Wiki
Jump to navigation Jump to search

I am working out some sensu metric for David and noting down here how I did the analysis.

The metrics that we want are:

1) For each sensu term what are the old and new names?
2) How many terms were merged?
3) How many terms were redefined?


Before and after the edits

For some of the work I am comparing the before and after files that flank the period of the sensu overhaul. For this purpose the pre sensu overhaul file that I am using is version 5.123 of go/ontology/gene_ontology_edit.obo.

obodiff

Use of obodiff indicates that this period saw 77642 individual edits to the file.

sensu term ids

I extracted the ids of the old sensu terms from the sensu term table that is in scratch. (I just did that in a text editor by doing a search and replace to get rid of everything abut the GO:ids.)

sensu lines in obodiff output

I then extracted all the lines from the obodiff output file that described edits to sensu terms. This was done using a new script, now in cvs, called extract-from-obodiff.pl. Please see: /go/software/utilities/extract-from-obodiff.pl

The script loads the list of sensu term ids, puts them in a hash, and then searches the file for any lines with any of those ids, and prints only those lines.

3040 of the 77642 lines in the obodiff output pertain to sensu edits.

I wrote a script to parse the obodiff output file format and it counts how many of each type of edit was carried out.


For the sensu terms it shows:

415 definition dbxrefs were added.
98 term merges took place.
650 new synonyms were added.
358 names were changed.
604 syonyms were assigned a scope.
63 new part_of relationships were made.
51 new is_a relationships were made.
11 definitions were added.
77 general dbxrefs were added.
11 terms were added to subsets.
67 comments were changed.
371 definitions were changed.
10 new terms were made.
37 definition dbxrefs were deleted.
21 part_of relationships were deleted.
113 is_a relationships were deleted.
46 synonyms were deleted.

The script is checked into cvs in go/software/utilities and it is called 'count_edits.pl'.