Taxon Editing Workflow

From GO Wiki
Jump to navigation Jump to search

This page gives a full description of the workflow being used in creating or editing links between GO terms and the NCBI taxon slim.

Before June 1st 2009 the checked links were marked with a definition dbxref GOC:mtg_taxon. After this date the dbxrefs were removed and the scheme changed so that only checked links were left in the file.

Loading the files into OBO-Edit

Checkout these files and load them into OBO-Edit:

go/ontology/editor/gene_ontology_write.obo
go/scratch/go-taxon/ncbitax-slim.obo
go/scratch/go-taxon/TaxonGOLinksFile.obo
go/scratch/go-taxon/UnionTerms.obo

(Or if these files have not been regenerated since someone last edited all_files_mid_edit.obo then just check out all_files_mid_edit.obo and load that into OBO-Edit)

Making edits

Relationships are edited in the usual way in OBO-Edit.

Once a new relationship has been made you should check transitivity in both the taxon graph and the GO graph.

For example in the image below you would check that all gene products involved in chloroplast-type-photosynthesis (This is a made up example term) would also be involved in all the processes that are ancestors of this term. You would also check that chloroplast-type photosynthesis would not occur in any of the taxa outside of Viridiplantae and Cyanobacteria.


Saving the files

Save out the entire dataset including all ontologies, and without any filtering. Save the file as:

go/scratch/go-taxon/all_files_mid_edit.obo

Temporary Storage between editing sessions

If you do not wish to make the new links public immediately then just commit all_files_mid_edit.obo to cvs and you can check it out and start editing from the same point again later. In this case you would only load all_files_mid_edit.obo into OBO-edit next time.

If you wish to make the links public then you will need to process the file to extract the links as described below.

Files processing to create release format files

Scripts have been developed to enable processing of the edited file. This will ultimately make two files containing only the union terms and the go-taxon links. These can then be checked into cvs for distribution. Run the perl scrips as follows.

Check out:

go/software/utilities

Move the following scripts to go/scratch/go-taxon

go/software/utilities/FindTaxonLinkedTerms.pl
go/software/utilities/RetainTaxonLines.pl
go/software/utilities/Taxonlinks_TabDel2OBO.pl 
go/software/utilities/Taxonlinks_OBO2TabDel.pl

Run the scripts in this directory as follows:

cd go/scratch/go-taxon
perl [script name]

The scripts will have the following effects:

FindTaxonLinkedTerms.pl

Input: all_files_mid_edit.obo (The complete files containing all GO and taxon ontologies and relationship types.)
Output: TaxonLinkedTerms.obo (File containing only GO terms with links to taxon terms.)
Output: UnionTerms.obo (File containing only the union terms)

To extract the stanzas of GO terms that have links to taxon terms, and also any union terms. These terms will be written to two separate files as described above.

The only_in_taxon and never_in_taxon relationships are written to TaxonLinkedTerms.obo.

TaxonLinkedTerms.obo will contain stanzas like this:

[Term]
id: GO:0048749
name: compound eye development
namespace: biological_process
alt_id: GO:0007456
def: "The process whose specific outcome is the progression of the compound eye over time, from its formation to the mature    
structure." [GOC:jic, GOC:mtg_sensu] 
synonym: "compound eye development (sensu Endopterygota)" EXACT []
synonym: "eye development (sensu Endopterygota)" EXACT []
synonym: "insect-type retina development" EXACT [PMID:11735386]
is_a: GO:0001654 ! eye development
relationship: only_in_taxon NCBITaxon:33392 ! Endopterygota

UnionTerms.obo will contain stanzas like this:

[Term]
id: JD:0000002
name: Viridiplantae and Cyanobacteria
namespace: union_terms
def: "The union of the taxa Viridiplantae and Cyanobacteria." [GOC:mtg_go-taxon]
union_of: NCBITaxon:1117 ! Cyanobacteria
union_of: NCBITaxon:33090 ! Viridiplantae
created_by: Jennifer Deegan
creation_date: 2008-07-16T03:33:38Z


N.B. Before releasing to the public, open union.obo and delete the stanza for JD:1. Currently this term is needed to allow ID generation to work in OBO-edit but the term is logically incorect and should not be there. The JD prefix will also need to be replaced but we have not thought of an alternative yet.

Once created, commit the output files to cvs.

RetainTaxonLines.pl

Input: TaxonLinkedTerms.obo (File containing only those GO terms with links to taxon terms.)
Output: TaxonGOLinksFile.obo (File containing same stanzas but with extraneous information stripped out, as below.)

To take the file of taxon linked GO terms and extract only the lines that are needed to show the relationship between the GO term and the taxon term as below:

[Term]
id: GO:0048749
name: compound eye development
namespace: biological_process
alt_id: GO:0007456
def: "The process whose specific outcome is the progression of the compound eye over time, from its formation to the mature structure." [GOC:jic, GOC:mtg_sensu]
synonym: "compound eye development (sensu Endopterygota)" EXACT []
synonym: "eye development (sensu Endopterygota)" EXACT []
synonym: "insect-type retina development" EXACT [PMID:11735386]
is_a: GO:0001654 ! eye development
relationship: never_outside_taxon NCBITaxon:33392 ! Endopterygota

becomes:

[Term]
id: GO:0048749
name: compound eye development
relationship: only_in_taxon NCBITaxon:33392 ! Endopterygota

Once created, commit the output files to cvs.

Taxonlinks_OBO2TabDel.pl

Input: TaxonGOLinksFile.obo (File containing same stanzas but with extraneous information stripped out, as below)
Output: TaxonGOLinksTabDelimited.txt (Tab-delimited version of the same.)

Converts the file of minimalist taxon-linked GO terms to tab-delimited format.

[Term]
id: GO:0048749
name: compound eye development
relationship: only_in_taxon NCBITaxon:33392 ! Endopterygota

becomes:

GO:0048749	compound eye development	only_in_taxon	Endopterygota	NCBITaxon:33392

Once created, commit the output files to cvs.

Taxonlinks_TabDel2OBO.pl

Input: TaxonGOLinksTabDelimited.txt (Tab-delimited version of the GO terms with links to taxon terms.)
Output: RegeneratedTaxonGOLinks.obo (OBO format version of the same, with only_in_taxon and never_in_taxon relationship Typedef stanzas added.)

GO:0048749	compound eye development	only_in_taxon	Endopterygota	NCBITaxon:33392

becomes:

[Term]
id: GO:0048749
name: compound eye development
relationship: only_in_taxon NCBITaxon:33392 ! Endopterygota

Once created, commit the output files to cvs.