Taxon-GO Implementation April 2009 onwards
1st April 2009
This work has been on hold for a while as we were waiting for more sophisticated filtering in OBO-Edit to let all editor edit the file while doing normal live file editing. However Chris has asked that we push on without the filtering for a while.
Loaded the files in OE2 and found some file problems. The UnionTerm stanzas were a bit messed up, but were fixed by adding a Typedef for the the union_of relationship, and removing the top term.
Other file errors found and fixed:
Two entire taxa had gone missing from the taxon slim, and we had also lost a union_of term from the UnionTerms file. I do not know how that happened but I have replaced them.
I have recommitted the edited source files and also saved out and recommitted the all_files_mid_edit.obo file. I did not make any edits so the perl scripts do not need to be run.
Fixed this misformed tag 'name: synonym: "synonym: "synonym: "'in both edit file and source.
Set up WinXP laptop to run the perl scripts to generate the tab delimited file that the users need to act on these links.
TODO: There are a bunch of terms that currently have two only_in_taxon links and they mess up the converstion to tab-delimited. Need to resolve these relationships. Have deleted them from the file for now. This is the list:
GO:0048494 chromatophore ribulose bisphosphate carboxylase complex GO:0030075 plasma membrane-derived thylakoid GO:0030094 plasma membrane-derived photosystem I GO:0030096 plasma membrane-derived thylakoid photosystem II GO:0031676 plasma membrane-derived thylakoid membrane GO:0031979 plasma membrane-derived thylakoid lumen GO:0048493 plasma membrane-derived thylakoid ribulose bisphosphate carboxylase complex GO:0009521 photosystem GO:0030077 plasma membrane light-harvesting complex GO:0042716 chromatophore GO:0009760 C4 photosynthesis GO:0009761 CAM photosynthesis GO:0016168 chlorophyll binding GO:0030093 chloroplast photosystem I GO:0030095 chloroplast photosystem II GO:0030089 phycobilisome
I am wondering what the violations.txt file is in cvs. There is a readme but it is a dead end when you follow the urls.
9th April 2009
Looked through the rest of Michelle's taxon spreadsheet and sent a few questions to her before doing the final edits:
> 2. regarding "flagellin-based flagella...." terms - I left these as prok, > but Archaeal flagella have "flagellin-based" flagella too, but the > flagellins are different than bacterial flagellins and also they have > different overall flagellar structures. Therefore, the "flagellin-based" > terms do not provide a way to distinguish between the Arch and Bact types. > These will likely need to be revised. Oh that's good to know. So for now I should make these prokaryote, but later we might like to make child terms then? > 3. Archaea do not have peptidoglycan, but they have other cell wall > materials that are similar we don¹t have terms to cover those. Good to know. So I'll leave these as Bacterial for now and we can add Archaeal terms later. > 4. I notice that there are terms like ³plant-type cell wall² and > ³fungal-type cell wall². Can¹t we then have ³Bacterial-type cell wall² and > ³Archaeal-type cell wall²? Or even "bacterial-type flagella" and > "archaeal-type flagella"? That would be more clear I think. Yes that would be fine. Which ones would you like me to add? If you put a column in the attached spreadsheet and write the names in I can do the edits.
The prokaryotic terms are now finished and a list of questions has been sent to SGD for the fungal terms.
A plan has been formed to finished the sensu terms and release the file in time to submit a paper to this conference:
IEEE International Conference
on Bioinformatics and Biomedicine (BIBM09)
Washington DC, USA, Nov. 1-4, 2009
Electronic submission of full papers: July 10, 2009
I sent this list to Kimberly to get feedback:
GO:0060111 alae of collagen and cuticulin-based cuticle extracellular matrix GO:0060110 basal layer of collagen and cuticulin-based cuticle extracellular matrix GO:0060102 collagen and cuticulin-based cuticle extracellular matrix GO:0060106 cortical layer of collagen and cuticulin-based cuticle extracellular matrix GO:0042715 dosage compensation complex assembly during dosage compensation by hypoactivation of X chromosome GO:0042464 dosage compensation, by hypoactivation of X chromosome GO:0060105 epicuticle of collagen and cuticulin-based cuticle extracellular matrix GO:0060104 surface coat of collagen and cuticulin-based cuticle extracellular matrix
This list was sent to Eurie to get feedback:
Sensu Fungi terms GO:0000754 adaptation to pheromone during conjugation with cellular fusion GO:0007569 cell aging GO:0030466 chromatin silencing at silent mating-type cassette GO:0000747 conjugation with cellular fusion GO:0000755 cytogamy GO:0030473 nuclear migration along microtubule GO:0000750 pheromone-dependent signal transduction during conjugation with cellular fusion GO:0034306 regulation of sexual sporulation GO:0000749 response to pheromone during conjugation with cellular fusion GO:0009847 spore germination Sensu Saccharomyces terms GO:0000754 adaptation to pheromone during conjugation with cellular fusion GO:0007571 age-dependent general metabolic decline GO:0000752 agglutination during conjugation with cellular fusion GO:0007569 cell aging GO:0000751 cell cycle arrest in response to pheromone GO:0030466 chromatin silencing at silent mating-type cassette GO:0000747 conjugation with cellular fusion GO:0000501 flocculation via cell wall protein-carbohydrate interaction GO:0001403 invasive growth in response to glucose limitation GO:0030473 nuclear migration along microtubule GO:0007576 nucleolar fragmentation GO:0007323 peptide pheromone maturation GO:0000750 pheromone-dependent signal transduction during conjugation with cellular fusion GO:0030163 protein catabolic process GO:0000321 re-entry into mitotic cell cycle after pheromone arrest GO:0000749 response to pheromone during conjugation with cellular fusion
The current up to date version is in the edit file.
1st May 2008
Finished Nematode terms
TODO: return to dosage compensation terms and put a note in the def e.g. 'for example in Nematodes'. Do the same for the other terms. These terms need a taxon note so people know which one is which, but they are not suitable to be restricted on what can be annotated to them. This needs to be done in the live file. The taxon links are sorted out.
N.B. The dosage compensation terms do not need a taxon link even though they have a sensu synonym.
TODO: Michelle Gwinn-Giglio has requested the following change in the live file:
I'd say that the following terms should all have "flagellin-based" changed to "bacterial-type": flagellin-based flagellum flagellin-based flagellum filament flagellin-based flagellum filament cap flagellin-based flagellum hook-filament junction flagellin-based flagellum hook flagellin-based flagellum basal body flagellin-based flagellum basal body, distal rod flagellin-based flagellum basal body, distal rod, L ring flagellin-based flagellum basal body, distal rod, P ring flagellin-based flagellum basal body, proximal rod flagellin-based flagellum basal body, MS ring flagellin-based flagellum basal body, C ring flagellin-based flagellum basal body, rod flagellin-based flagellum part We could add the phrase "as found in bacteria" [to the def] - would that work? Then we should add a term "archaeal-type flagellum".
15th May 2009
Reminder that we still have feedback to act on from SGD.
We had a discussion with Jen Deegan, Chris Mungall and GO-Top on email about whether we could publish the trigger system. This led to further discussion of the ideas that would be covered. The discussion was broken off prematurely as Jen went on vacation, but some useful feedback came out of it. It has been proposed that the sensu terms that are difficult to interpret because of very technical or esoteric names and definitions should be clarified by putting an example in the gloss of the definitions field, or in a special examples field. Jen is in favour of this, but we have not yet had a chance to discuss the idea fully. Following on from other parts of the discussion, Jen Deegan and Chris Mungall are working together to figure out whether the presence of union terms is useful in terms of number of errors found, or if we could do without these to avoid complexity. The full discussion was very detailed and lengthy but Jen has it archived.
Changed the 'and's to 'or's in the union term names. Put explanation of the file in cvs into the readme file.
Realised there was an error in the editing workflow. I had regenerated the all_edits file by using a recent live file and the TaxonGOLinksFile.obo, and so had lost all of the unchecked links that did not have the def dbxref GOC:mtg_taxon. I had retained all of the checked links but they were no longer marked with the dbxref, as incorporation of the new live file had overwritten the definitions and obliterated the dbxrefs. No checked links have been lost and I have now edited the all_edits file so that all of the checked links are once more marked with the dbxref.
Chris Mungall has adapted a script so that it can now be used to check for ontology and annotation errors using the union terms as well as the links to monophyletic groupings. He found a number of issues, and we are going to make a file in cvs in go/scratch/go-taxon in which to collect metrics, so the data are kept even though the errors have been fixed. We are also checking the output of the checking script into cvs.
Changed editing workflow so that dbxrefs are not used to make checked links. From now on only checked links are retained in the file. The all_edits file is still the master copy, but regeneration of the all_edits file by combining the OBO file of links with the current live file will not cause loss of data.
Jen has been working mainly on fixing the sensu terms so that the sensu synonyms can be stripped out as soon as possible. However Chris is also keen to have triggers for the non-sensu terms that are also obviously taxon-specific. These can easily be found by using the GOOSE query that Chris made. Jen has propsed that she should start working on this list of terms after completing the last few difficult sensu terms.
Currently a high priority remains to figure out whether the union terms show up a lot of errors that could not be found without them.
[Term] id: GO:0031424 name: keratinization namespace: biological_process def: "The process in which the cytoplasm of the outermost cells of the vertebrate epidermis is replaced by keratin. Keratinization occurs in the stratum corneum, feathers, hair, claws, nails, hooves, and horns." [GOC:ebc] is_a: GO:0032502 ! developmental process relationship: part_of GO:0009913 ! epidermal cell differentiation Suggestion: keratinization only_in vertebrate Suggestion: organ development only_in metazoa
sorocarp development only_in_taxon slme molds. No: slime molds are polyphyletic. WHat about the annotation from Magnaportha grisea? Is this right. Not sure this is a slime mold.
10th June 2009
Chris and Jennifer met at the EBI and worked on the taxon workflow. The results are minuted here:
I have made an import file in go-taxon to facilitate loading of the files.
Edits for today:
I have sent this to Alex and Tanya:
Hi Alex and Tanya, Do you think it would be possible for us to agree a better def for this term: GO:0002218 activation of innate immune response def: Any process that initiates an innate immune response. It used to have a sensu Magnoliophyta synonym so I am assuming that it is a plant thing, but the definition is not terribly informative. I have added a hint: "An example of this is the activation of innate immune response in Arabidopsis thaliana", but I think maybe there could be something more explanatory. The pubmed ids in the dbxref field suggest that it may not be a plant term. Thanks, Jen
I am working through the old sensu terms in alphabetical order, checking the links, removing the sensu synonyms, and putting examples in the defs.
I added a new taxon link, saved out and ran the scripts. I have decided not to commit the mid file after all, as it takes ages, and I need to change editing topic too frequently to easily allow for this. I updated the editing workflow document.
29th June 2009
We agreed a definition for 'activation of innate immune response' and I have made the edits. I also worked through about 30 more terms. I managed to filter out the links file but am having some problems filtering out the live file to commit back to the repository. I am continuing to work on that.
30th July 2009
Was not able to get the live file filtered out and committed back as the filtering turned out to be tricky and in the meantime a very large changeset was committed. I will make those synonym changes again without the links loaded, and just commit back without requiring the filtering step.
3rd July 2009
I have redone those changes and committed the live file.
5th July 2009
Resolved further sensu synonyms and links. 464 left to go.
8th July 2009
Resolved further sensu synonyms and links. 424 left to go.
9th July 2009
Resolved further sensu synonyms and links. 380 left to go.
5th August 2009
Resolved further sensu synonyms and links. 333 left to go.
11th August 2009
Resolved further sensu synonyms and links. 275 left to go.
13th August 2009
Resolved further sensu synonyms and links. 174 left to go.
14th August 2009
Noted that for annotation checking we will need the XP from biological process to cellular component to be in place. The terms 'mitochondrial process XXX' have their only_in_taxon link via the relationship to 'mitochondrion' in component, and then from there up to 'Eukaryota'. There are quite a number of terms like this (e.g. mitochondrial ATP synthesis coupled proton transport ; GO:0042776). Several other processes are defined in terms of organelles that are taxon-specific. I have reported this to Chris and he has modified the checking script to take this XP file into account.
Resolved further sensu synonyms and links. 144 left to go.
24th August 2009
Asked Midori about the XPs between process and cellular component. She confirms that these links should be in the process to cellular component file and that she is in charge of this file. She has suggested that I check in the file, and that if I find any links missing I could forward them on, which I will do.
25th August 2009
There are a number of cell differentation terms, where the cell type is specific to a given taxon. I have written to Chris to ask if we should label these terms in GO and transfer the links electronically to the cell type ontology, or if I should wait and label the cell type ontology directly.
Chris suggests that I make the links directly to the cell type ontology, but check the xp links are in place at the same time. I will plan to do this after I have worked through the list that came from Chris's GOOSE query.
Current priority list is:
Finish remaining terms with sensu synonyms
Fix false positives that the checking script shows
Do GOOSE results
Do cell type ontology
27th August 2009
All of the sensu synonyms have now been removed and taxon checking lines have been added to the file for these terms, where appropriate. In some cases, examples have been added to the definition to provide additional clarity of meaning for the term.