Difference between revisions of "Merging Ontology Terms"

From GO Wiki
Jump to navigation Jump to search
Line 72: Line 72:

== Review Status ==
== Review Status ==
Last reviewed: March 12, 2020
Last reviewed: January 25, 2022

[[Ontology_Development#Editing_the_Ontology |Back to: Editing the Ontology]]
[[Ontology_Development#Editing_the_Ontology |Back to: Editing the Ontology]]

[[Category:GO Editors]][[Category:Ontology]][[Category:Editor_Guide_2018]]
[[Category:GO Editors]][[Category:Ontology]][[Category:Editor_Guide_2018]]

Revision as of 04:23, 25 January 2022

 See Ontology_Editors_Daily_Workflow for creating branches and basic Protégé instructions.

Term merges principles

Principles for merging terms

Evaluate the potential consequences of the merge

Before performing a merge, make sure that you know all of the consequences that the merge will cause.

  • # Check Ontology usage: Check if the term is used elsewhere in the ontology.
    • In Protégé, go to the Class Usage tab to see if that ID is used elsewhere. Search for the term name or the term IRI (ie with underscore between GO and the numerical part of the ID, for example: 'GO_0030722'.
    • Be sure to look at child terms and any other terms that refer to the ‘deprecated’ term. In many cases a simple merge of two terms is not sufficient because it will result in equivalent classes for child terms. For example if deprecated term X is going to be merged into target term Y and ‘regulation of X’ and ‘regulation of Y’ terms exist, then you will need to merge the regulation terms in addition to the primary terms. You will also need to edit any terms that refer to the deprecated term to be sure that the names and definitions are consistent.
    • Be sure that the term has not been used with a replaced_by or consider tag. If it has, update these, if possible, or remove them. Note that the consider tag does not appear in the Usage tab; you need to do a search with the GO ID and look in the results section for 'consider' entities using the ID.
  1. Subset usage: Check if the term is used in a subset: In Protégé, look for any values present in the in_subset tag. Usually the same subsets can be kept after merges.
  2. Taxon Constraints: Check to see if the term has been used in a taxon constraint. If it has, the source file needs to be modified and the taxon constraint imports need to be regenerated. See: Adding_Taxon_Restrictions.
  3. External mappings: Check if there are automatic mappings to external resources (you need to use QuickGO for this, since it has a lot more electronic annotations, or look at individual files here: http://current.geneontology.org/ontology/external2go/index.html). Any IEAs to the following resources should be reported to the contact person from that source:
    • HAMAP
    • InterPro
    • Rfam
    • Unirule
    • UNIPROTKB_SL (subcellular locations)
    • UNIPROTKB_KW (keywords)

  • Once this is merged, go `git pull origin master` and only then start working on the merges.

Choosing which term to merge into ("Winning Term")

Any of the merged term labels can become the primary ID. Often, one of the terms will have a better class label, a better definition, or be placed more correctly in the ontology. That ID should be favored as primary label. One can also consider how long a term has existed to choose the primary ID (the older is given preference), or the number of annotations associated to either term (the term with the most annotations is given preference).

Procedure for merging terms

  1. Find the ID of the term into which the deprecated term will be merged and navigate to that ‘winning’ term using the Search box. Copy the ID of the winning term somewhere, so you know which id to keep.
  2. For the term being deprecated, click on the Class Usage tab to see all uses of the term throughout the ontology. If the term is used in other classes, e.g. as part of an equivalence axiom, you must first deal with those other classes (maybe with merges, maybe by renaming, maybe moving parents, etc.).
  3. Remove annotations from the deprecated terms
    • Navigate to the term to be deprecated.
    • Remove the logical definition by clicking on the x on the right-hand side.
    • Remove any remaining subclasses by clicking on the x on the right-hand side.
    • Look at the definition; if it does not seem relevant, remove it by clicking on the x on the right-hand side; otherwise copy/paste it somewhere to refer to when reviewing the definition for the winning term.
    • Note down the created_by and created_date (there can only be one value per term for each of these fields; this will be useful if you need to pick one after the merge is done).
    • Check existing list of synonyms to see if they need to be moved to the new term, otherwise delete them by clicking on the x on the right-hand side.
    • If there is an annotation for dcterms:conformsTo, remove it by clicking on the x on the right-hand side.
    • Change the ID of the term to be deprecated to the new main term’s ID
      • In the term to be deprecated, click on Refactor > Rename entity’ in the Protege menu (shortcut: command-U)
      • Copy the ID of the winning term (obtained in Step 1).
      • Be sure to use the underscore _ in the identifier instead of the colon :, for example: GO_1234567.
      • Make sure that the change all entities with this URI box is checked.
    • Make the deprecated ID an alternative ID
      • Navigate to the winning term. In the Annotations box, locate the ID of the deprecated term. Click the o to change the ID type.
      • In the resulting pop-up window, making sure the Literal tab is selected in the top right side box, select has_alternative_id from the list on the left side. Double check that the entry corresponds to the GO ID of the deprecated term.
      • Click OK. The deprecated term identifier should now have the label has_alternative_id instead of id.
  4. Change deprecated term label to a synonym
      • In the annotations box of the winning term there are now two terms with labels rdfs:label. Click the o to change the label of the deprecated term.
      • In the resulting pop-up window, select the appropriate synonym label from the list on the left:
        • has_broad_synonym
        • has_exact_synonym
        • has_narrow_synonym
        • has_related_synonym (if unsure, this is the safest choice)
  5. Fix synonyms: In the annotations box of the winning term, check the list of synonyms to see if they are all still appropriate.
  6. If needed, fix the definition, using information from the deprecated term as appropriate.
  7. Synchronize the reasoner and make sure there are no terms that have identical definitions as a result of the merge. These are displayed with an ‘equivalent’ sign in the class hierarchy view on the left hand panel.
  8. Save changes.

Troubleshooting: Travis/Jenkins errors

  • Merging a term that is used as ‘replaced by’ for an obsolete term:
    :: ERROR: ID-mentioned-twice:: GO:0048126 
      GO:0030722 :: ERROR: has-definition: missing definition for id

The cause of this error is that Term A (GO:0048126) was obsolete and had a replaced_by Term B (GO:0030722) tag. The GO editor tried to merge Term B into a third term term C (GO:0007312). The Jenkins check failed because ‘Term A replaced_by Term B’ references an alternative_id (GO:0030722) rather than a primary_id (now GO:0007312). Solution: In the ontology, go to the obsolete term A and modify the replaced_by tag from the secondary Term B identifier (GO:0030722) to the term C identifier (GO:0007312).

 See Ontology_Editors_Daily_Workflow for commit, push and merge instructions.

Review Status

Last reviewed: January 25, 2022

Back to: Editing the Ontology