Taxon-GO Implementation April 2009 onwards
- 1 1st April 2009
- 2 9th April 2009
- 3 27th/28th April
- 4 1st May 2008
- 5 15th May 2009
- 6 Late May
- 7 28th May
- 8 29th May
- 9 30th May
- 10 1st June
- 11 9th June
- 12 10th June 2009
- 13 12thJune 2009
- 14 29th June 2009
- 15 30th July 2009
- 16 3rd July 2009
- 17 5th July 2009
- 18 8th July 2009
- 19 9th July 2009
- 20 5th August 2009
- 21 11th August 2009
- 22 13th August 2009
- 23 14th August 2009
- 24 24th August 2009
- 25 25th August 2009
- 26 27th August 2009
- 27 28th August 2009
1st April 2009
This work has been on hold for a while as we were waiting for more sophisticated filtering in OBO-Edit to let all editor edit the file while doing normal live file editing. However Chris has asked that we push on without the filtering for a while.
Loaded the files in OE2 and found some file problems. The UnionTerm stanzas were a bit messed up, but were fixed by adding a Typedef for the the union_of relationship, and removing the top term.
Other file errors found and fixed:
Two entire taxa had gone missing from the taxon slim, and we had also lost a union_of term from the UnionTerms file. I do not know how that happened but I have replaced them.
I have recommitted the edited source files and also saved out and recommitted the all_files_mid_edit.obo file. I did not make any edits so the perl scripts do not need to be run.
Fixed this misformed tag 'name: synonym: "synonym: "synonym: "'in both edit file and source.
Set up WinXP laptop to run the perl scripts to generate the tab delimited file that the users need to act on these links.
TODO: There are a bunch of terms that currently have two only_in_taxon links and they mess up the converstion to tab-delimited. Need to resolve these relationships. Have deleted them from the file for now. This is the list:
GO:0048494 chromatophore ribulose bisphosphate carboxylase complex GO:0030075 plasma membrane-derived thylakoid GO:0030094 plasma membrane-derived photosystem I GO:0030096 plasma membrane-derived thylakoid photosystem II GO:0031676 plasma membrane-derived thylakoid membrane GO:0031979 plasma membrane-derived thylakoid lumen GO:0048493 plasma membrane-derived thylakoid ribulose bisphosphate carboxylase complex GO:0009521 photosystem GO:0030077 plasma membrane light-harvesting complex GO:0042716 chromatophore GO:0009760 C4 photosynthesis GO:0009761 CAM photosynthesis GO:0016168 chlorophyll binding GO:0030093 chloroplast photosystem I GO:0030095 chloroplast photosystem II GO:0030089 phycobilisome
I am wondering what the violations.txt file is in cvs. There is a readme but it is a dead end when you follow the urls.
9th April 2009
Looked through the rest of Michelle's taxon spreadsheet and sent a few questions to her before doing the final edits:
> 2. regarding "flagellin-based flagella...." terms - I left these as prok, > but Archaeal flagella have "flagellin-based" flagella too, but the > flagellins are different than bacterial flagellins and also they have > different overall flagellar structures. Therefore, the "flagellin-based" > terms do not provide a way to distinguish between the Arch and Bact types. > These will likely need to be revised. Oh that's good to know. So for now I should make these prokaryote, but later we might like to make child terms then? > 3. Archaea do not have peptidoglycan, but they have other cell wall > materials that are similar we don¹t have terms to cover those. Good to know. So I'll leave these as Bacterial for now and we can add Archaeal terms later. > 4. I notice that there are terms like ³plant-type cell wall² and > ³fungal-type cell wall². Can¹t we then have ³Bacterial-type cell wall² and > ³Archaeal-type cell wall²? Or even "bacterial-type flagella" and > "archaeal-type flagella"? That would be more clear I think. Yes that would be fine. Which ones would you like me to add? If you put a column in the attached spreadsheet and write the names in I can do the edits.
The prokaryotic terms are now finished and a list of questions has been sent to SGD for the fungal terms.
A plan has been formed to finished the sensu terms and release the file in time to submit a paper to this conference:
IEEE International Conference
on Bioinformatics and Biomedicine (BIBM09)
Washington DC, USA, Nov. 1-4, 2009
Electronic submission of full papers: July 10, 2009
I sent this list to Kimberly to get feedback:
GO:0060111 alae of collagen and cuticulin-based cuticle extracellular matrix GO:0060110 basal layer of collagen and cuticulin-based cuticle extracellular matrix GO:0060102 collagen and cuticulin-based cuticle extracellular matrix GO:0060106 cortical layer of collagen and cuticulin-based cuticle extracellular matrix GO:0042715 dosage compensation complex assembly during dosage compensation by hypoactivation of X chromosome GO:0042464 dosage compensation, by hypoactivation of X chromosome GO:0060105 epicuticle of collagen and cuticulin-based cuticle extracellular matrix GO:0060104 surface coat of collagen and cuticulin-based cuticle extracellular matrix
This list was sent to Eurie to get feedback:
Sensu Fungi terms GO:0000754 adaptation to pheromone during conjugation with cellular fusion GO:0007569 cell aging GO:0030466 chromatin silencing at silent mating-type cassette GO:0000747 conjugation with cellular fusion GO:0000755 cytogamy GO:0030473 nuclear migration along microtubule GO:0000750 pheromone-dependent signal transduction during conjugation with cellular fusion GO:0034306 regulation of sexual sporulation GO:0000749 response to pheromone during conjugation with cellular fusion GO:0009847 spore germination Sensu Saccharomyces terms GO:0000754 adaptation to pheromone during conjugation with cellular fusion GO:0007571 age-dependent general metabolic decline GO:0000752 agglutination during conjugation with cellular fusion GO:0007569 cell aging GO:0000751 cell cycle arrest in response to pheromone GO:0030466 chromatin silencing at silent mating-type cassette GO:0000747 conjugation with cellular fusion GO:0000501 flocculation via cell wall protein-carbohydrate interaction GO:0001403 invasive growth in response to glucose limitation GO:0030473 nuclear migration along microtubule GO:0007576 nucleolar fragmentation GO:0007323 peptide pheromone maturation GO:0000750 pheromone-dependent signal transduction during conjugation with cellular fusion GO:0030163 protein catabolic process GO:0000321 re-entry into mitotic cell cycle after pheromone arrest GO:0000749 response to pheromone during conjugation with cellular fusion
The current up to date version is in the edit file.
1st May 2008
Finished Nematode terms
TODO: return to dosage compensation terms and put a note in the def e.g. 'for example in Nematodes'. Do the same for the other terms. These terms need a taxon note so people know which one is which, but they are not suitable to be restricted on what can be annotated to them. This needs to be done in the live file. The taxon links are sorted out.
N.B. The dosage compensation terms do not need a taxon link even though they have a sensu synonym.
TODO: Michelle Gwinn-Giglio has requested the following change in the live file:
I'd say that the following terms should all have "flagellin-based" changed to "bacterial-type": flagellin-based flagellum flagellin-based flagellum filament flagellin-based flagellum filament cap flagellin-based flagellum hook-filament junction flagellin-based flagellum hook flagellin-based flagellum basal body flagellin-based flagellum basal body, distal rod flagellin-based flagellum basal body, distal rod, L ring flagellin-based flagellum basal body, distal rod, P ring flagellin-based flagellum basal body, proximal rod flagellin-based flagellum basal body, MS ring flagellin-based flagellum basal body, C ring flagellin-based flagellum basal body, rod flagellin-based flagellum part We could add the phrase "as found in bacteria" [to the def] - would that work? Then we should add a term "archaeal-type flagellum".
15th May 2009
Reminder that we still have feedback to act on from SGD.
We had a discussion with Jen Deegan, Chris Mungall and GO-Top on email about whether we could publish the trigger system. This led to further discussion of the ideas that would be covered. The discussion was broken off prematurely as Jen went on vacation, but some useful feedback came out of it. It has been proposed that the sensu terms that are difficult to interpret because of very technical or esoteric names and definitions should be clarified by putting an example in the gloss of the definitions field, or in a special examples field. Jen is in favour of this, but we have not yet had a chance to discuss the idea fully. Following on from other parts of the discussion, Jen Deegan and Chris Mungall are working together to figure out whether the presence of union terms is useful in terms of number of errors found, or if we could do without these to avoid complexity. The full discussion was very detailed and lengthy but Jen has it archived.
Changed the 'and's to 'or's in the union term names. Put explanation of the file in cvs into the readme file.
Realised there was an error in the editing workflow. I had regenerated the all_edits file by using a recent live file and the TaxonGOLinksFile.obo, and so had lost all of the unchecked links that did not have the def dbxref GOC:mtg_taxon. I had retained all of the checked links but they were no longer marked with the dbxref, as incorporation of the new live file had overwritten the definitions and obliterated the dbxrefs. No checked links have been lost and I have now edited the all_edits file so that all of the checked links are once more marked with the dbxref.
Chris Mungall has adapted a script so that it can now be used to check for ontology and annotation errors using the union terms as well as the links to monophyletic groupings. He found a number of issues, and we are going to make a file in cvs in go/scratch/go-taxon in which to collect metrics, so the data are kept even though the errors have been fixed. We are also checking the output of the checking script into cvs.
Changed editing workflow so that dbxrefs are not used to make checked links. From now on only checked links are retained in the file. The all_edits file is still the master copy, but regeneration of the all_edits file by combining the OBO file of links with the current live file will not cause loss of data.
Jen has been working mainly on fixing the sensu terms so that the sensu synonyms can be stripped out as soon as possible. However Chris is also keen to have triggers for the non-sensu terms that are also obviously taxon-specific. These can easily be found by using the GOOSE query that Chris made. Jen has propsed that she should start working on this list of terms after completing the last few difficult sensu terms.
Currently a high priority remains to figure out whether the union terms show up a lot of errors that could not be found without them.
[Term] id: GO:0031424 name: keratinization namespace: biological_process def: "The process in which the cytoplasm of the outermost cells of the vertebrate epidermis is replaced by keratin. Keratinization occurs in the stratum corneum, feathers, hair, claws, nails, hooves, and horns." [GOC:ebc] is_a: GO:0032502 ! developmental process relationship: part_of GO:0009913 ! epidermal cell differentiation Suggestion: keratinization only_in vertebrate Suggestion: organ development only_in metazoa
sorocarp development only_in_taxon slme molds. No: slime molds are polyphyletic. WHat about the annotation from Magnaportha grisea? Is this right. Not sure this is a slime mold.
10th June 2009
Chris and Jennifer met at the EBI and worked on the taxon workflow. The results are minuted here:
I have made an import file in go-taxon to facilitate loading of the files.
Edits for today:
I have sent this to Alex and Tanya:
Hi Alex and Tanya, Do you think it would be possible for us to agree a better def for this term: GO:0002218 activation of innate immune response def: Any process that initiates an innate immune response. It used to have a sensu Magnoliophyta synonym so I am assuming that it is a plant thing, but the definition is not terribly informative. I have added a hint: "An example of this is the activation of innate immune response in Arabidopsis thaliana", but I think maybe there could be something more explanatory. The pubmed ids in the dbxref field suggest that it may not be a plant term. Thanks, Jen
I am working through the old sensu terms in alphabetical order, checking the links, removing the sensu synonyms, and putting examples in the defs.
I added a new taxon link, saved out and ran the scripts. I have decided not to commit the mid file after all, as it takes ages, and I need to change editing topic too frequently to easily allow for this. I updated the editing workflow document.
29th June 2009
We agreed a definition for 'activation of innate immune response' and I have made the edits. I also worked through about 30 more terms. I managed to filter out the links file but am having some problems filtering out the live file to commit back to the repository. I am continuing to work on that.
30th July 2009
Was not able to get the live file filtered out and committed back as the filtering turned out to be tricky and in the meantime a very large changeset was committed. I will make those synonym changes again without the links loaded, and just commit back without requiring the filtering step.
3rd July 2009
I have redone those changes and committed the live file.
5th July 2009
Resolved further sensu synonyms and links. 464 left to go.
8th July 2009
Resolved further sensu synonyms and links. 424 left to go.
9th July 2009
Resolved further sensu synonyms and links. 380 left to go.
5th August 2009
Resolved further sensu synonyms and links. 333 left to go.
11th August 2009
Resolved further sensu synonyms and links. 275 left to go.
13th August 2009
Resolved further sensu synonyms and links. 174 left to go.
14th August 2009
Noted that for annotation checking we will need the XP from biological process to cellular component to be in place. The terms 'mitochondrial process XXX' have their only_in_taxon link via the relationship to 'mitochondrion' in component, and then from there up to 'Eukaryota'. There are quite a number of terms like this (e.g. mitochondrial ATP synthesis coupled proton transport ; GO:0042776). Several other processes are defined in terms of organelles that are taxon-specific. I have reported this to Chris and he has modified the checking script to take this XP file into account.
Resolved further sensu synonyms and links. 144 left to go.
24th August 2009
Asked Midori about the XPs between process and cellular component. She confirms that these links should be in the process to cellular component file and that she is in charge of this file. She has suggested that I check in the file, and that if I find any links missing I could forward them on, which I will do.
25th August 2009
There are a number of cell differentation terms, where the cell type is specific to a given taxon. I have written to Chris to ask if we should label these terms in GO and transfer the links electronically to the cell type ontology, or if I should wait and label the cell type ontology directly.
Chris suggests that I make the links directly to the cell type ontology, but check the xp links are in place at the same time. I will plan to do this after I have worked through the list that came from Chris's GOOSE query.
Current priority list is:
Finish remaining terms with sensu synonyms
Fix false positives that the checking script shows
Do GOOSE results
Do cell type ontology
27th August 2009
All of the sensu synonyms have now been removed and taxon checking lines have been added to the file for these terms, where appropriate. In some cases, examples have been added to the definition to provide additional clarity of meaning for the term.
28th August 2009
I am looking through the file of errors that Chris Mungall's checking script has produced. It is called gaf-taxon-gaffes.txt and is in go/scratch/go-taxon.
Here are the things that I have found:
- Senescence: Human annotations to the plant senescence (GO:0010149) term. Senescence in plants is the process whereby cells die just before the shedding of an organ, whereas senscence in humans is the more general aging of cells that remain alive in the organism. We may need to make a new term for the human process, or add a synonym to the existing aging (GO:0007568) term.
- 8 manual annotations by BHF-UCL to plant senescence. I have reported this to GOA, and they will liaise with BHF-UCL and send back any terms changes that need to be made.
- 7 RGD ISO annotatations to the same term. Wrote to Simon Twigger.
- Lactation: Mammalian reproduction in chickens: There are 6 AgBase ISS annotations to GO:0001553 "luteinization" or GO:0007595 "lactation", which are both mammalian reproductive processes. Written to Fiona McCarthy.
- phycobilisome: I will come back to phycobilisome GO:0030089 later as it needs some research.
- gastrulation with mouth forming first: 67 WormBase annotations to 'gastrulation with mouth forming first' GO:0001703. All are IEA. I Have written to Kimberly van Auken to ask if this is right.
- nucleus: 14 EcoCyc IEA annotations of E. coli gene products to 'nucleus' GO:0005634. Written to Jim Hu to ask about these.
- Emily is looking at the file too, and is making notes of the numbers of annotations errors found, and the fixes that she has applied. She is feeding this information back, along with any ontology errors that she finds.
- sensory perception GO:0007600 seems to be much less granular than its parent GO:0050890 cognition
- I need a taxon link for neurological system process, which is the ancestor that indicates involvement of the brain. TODO
- I need to add the cognitive part to the def of sensory perception. DONE
sensory perception old def:The series of events required for an organism to receive a sensory stimulus, convert it to a molecular signal, and recognize and characterize the signal.
new def: The series of events required for an organism to receive a sensory stimulus, convert it to a molecular signal, and recognize and characterize the signal. This is a neurological process.
cognition def: The operation of the mind by which an organism becomes aware of objects of thought or perception; it includes the mental activities associated with thinking, learning, and memory.