Taxon Editing Workflow
This page gives a full description of the workflow being used in creating or editing links between GO terms and the NCBI taxon slim.
Before June 1st 2009 the checked links were marked with a definition dbxref GOC:mtg_taxon. After this date the dbxrefs were removed and the scheme changed so that only checked links were left in the file.
Loading the files into OBO-Edit
Checkout these files and load them into OBO-Edit:
go/ontology/editor/gene_ontology_write.obo go/scratch/go-taxon/ncbitax-slim.obo go/scratch/go-taxon/TaxonGOLinksFile.obo go/scratch/go-taxon/UnionTerms.obo
(Or if these files have not been regenerated since someone last edited all_files_mid_edit.obo then just check out all_files_mid_edit.obo and load that into OBO-Edit)
Making edits
Relationships are edited in the usual way in OBO-Edit.
Once a new relationship has been made you should check transitivity in both the taxon graph and the GO graph.
For example in the image below you would check that all gene products involved in chloroplast-type-photosynthesis (This is a made up example term) would also be involved in all the processes that are ancestors of this term. You would also check that chloroplast-type photosynthesis would not occur in any of the taxa outside of Viridiplantae and Cyanobacteria.
Saving the files
Save out the entire dataset including all ontologies, and without any filtering. Save the file as:
go/scratch/go-taxon/all_files_mid_edit.obo
Temporary Storage between editing sessions
If you do not wish to make the new links public immediately then just commit all_files_mid_edit.obo to cvs and you can check it out and start editing from the same point again later. In this case you would only load all_files_mid_edit.obo into OBO-edit next time.
If you wish to make the links public then you will need to process the file to extract the links as described below.
Files processing to create release format files
Scripts have been developed to enable processing of the edited file. This will ultimately make two files containing only the union terms and the go-taxon links. These can then be checked into cvs for distribution. Run the perl scrips as follows.
Check out:
go/software/utilities
Move the following scripts to go/scratch/go-taxon
go/software/utilities/FindTaxonLinkedTerms.pl go/software/utilities/RetainTaxonLines.pl go/software/utilities/Taxonlinks_TabDel2OBO.pl go/software/utilities/Taxonlinks_OBO2TabDel.pl
Run the scripts in this directory as follows:
cd go/scratch/go-taxon perl [script name]
The scripts will have the following effects:
FindTaxonLinkedTerms.pl
Input: all_files_mid_edit.obo (The complete files containing all GO and taxon ontologies and relationship types.)
Output: TaxonLinkedTerms.obo (File containing only GO terms with links to taxon terms.)
Output: UnionTerms.obo (File containing only the union terms)
To extract the stanzas of GO terms that have links to taxon terms, and also any union terms. These terms will be written to two separate files as described above.
The only_in_taxon and never_in_taxon relationships are written to TaxonLinkedTerms.obo.
TaxonLinkedTerms.obo will contain stanzas like this:
[Term] id: GO:0048749 name: compound eye development namespace: biological_process alt_id: GO:0007456 def: "The process whose specific outcome is the progression of the compound eye over time, from its formation to the mature structure." [GOC:jic, GOC:mtg_sensu] synonym: "compound eye development (sensu Endopterygota)" EXACT [] synonym: "eye development (sensu Endopterygota)" EXACT [] synonym: "insect-type retina development" EXACT [PMID:11735386] is_a: GO:0001654 ! eye development relationship: only_in_taxon NCBITaxon:33392 ! Endopterygota
UnionTerms.obo will contain stanzas like this:
[Term] id: JD:0000002 name: Viridiplantae and Cyanobacteria namespace: union_terms def: "The union of the taxa Viridiplantae and Cyanobacteria." [GOC:mtg_go-taxon] union_of: NCBITaxon:1117 ! Cyanobacteria union_of: NCBITaxon:33090 ! Viridiplantae created_by: Jennifer Deegan creation_date: 2008-07-16T03:33:38Z
N.B. Before releasing to the public, open union.obo and delete the stanza for JD:1. Currently this term is needed to allow ID generation to work in OBO-edit but the term is logically incorect and should not be there. The JD prefix will also need to be replaced but we have not thought of an alternative yet.
Once created, commit the output files to cvs.
RetainTaxonLines.pl
Input: TaxonLinkedTerms.obo (File containing only those GO terms with links to taxon terms.)
Output: TaxonGOLinksFile.obo (File containing same stanzas but with extraneous information stripped out, as below.)
To take the file of taxon linked GO terms and extract only the lines that are needed to show the relationship between the GO term and the taxon term as below:
[Term] id: GO:0048749 name: compound eye development namespace: biological_process alt_id: GO:0007456 def: "The process whose specific outcome is the progression of the compound eye over time, from its formation to the mature structure." [GOC:jic, GOC:mtg_sensu] synonym: "compound eye development (sensu Endopterygota)" EXACT [] synonym: "eye development (sensu Endopterygota)" EXACT [] synonym: "insect-type retina development" EXACT [PMID:11735386] is_a: GO:0001654 ! eye development relationship: never_outside_taxon NCBITaxon:33392 ! Endopterygota
becomes:
[Term] id: GO:0048749 name: compound eye development relationship: only_in_taxon NCBITaxon:33392 ! Endopterygota
Once created, commit the output files to cvs.
Taxonlinks_OBO2TabDel.pl
Input: TaxonGOLinksFile.obo (File containing same stanzas but with extraneous information stripped out, as below)
Output: TaxonGOLinksTabDelimited.txt (Tab-delimited version of the same.)
Converts the file of minimalist taxon-linked GO terms to tab-delimited format.
[Term] id: GO:0048749 name: compound eye development relationship: only_in_taxon NCBITaxon:33392 ! Endopterygota
becomes:
GO:0048749 compound eye development only_in_taxon Endopterygota NCBITaxon:33392
Once created, commit the output files to cvs.
Taxonlinks_TabDel2OBO.pl
Input: TaxonGOLinksTabDelimited.txt (Tab-delimited version of the GO terms with links to taxon terms.)
Output: RegeneratedTaxonGOLinks.obo (OBO format version of the same, with only_in_taxon and never_in_taxon relationship Typedef stanzas added.)
GO:0048749 compound eye development only_in_taxon Endopterygota NCBITaxon:33392
becomes:
[Term] id: GO:0048749 name: compound eye development relationship: only_in_taxon NCBITaxon:33392 ! Endopterygota
Once created, commit the output files to cvs.