Taxon-GO Implementation April 2009 onwards

From GO Wiki
Revision as of 05:29, 28 August 2009 by Jdeegan (talk | contribs)

Jump to: navigation, search

1st April 2009

This work has been on hold for a while as we were waiting for more sophisticated filtering in OBO-Edit to let all editor edit the file while doing normal live file editing. However Chris has asked that we push on without the filtering for a while.


Loaded the files in OE2 and found some file problems. The UnionTerm stanzas were a bit messed up, but were fixed by adding a Typedef for the the union_of relationship, and removing the top term.

Other file errors found and fixed:

Two entire taxa had gone missing from the taxon slim, and we had also lost a union_of term from the UnionTerms file. I do not know how that happened but I have replaced them.

I have recommitted the edited source files and also saved out and recommitted the all_files_mid_edit.obo file. I did not make any edits so the perl scripts do not need to be run.

Fixed this misformed tag 'name: synonym: "synonym: "synonym: "'in both edit file and source.

Set up WinXP laptop to run the perl scripts to generate the tab delimited file that the users need to act on these links.

TODO: There are a bunch of terms that currently have two only_in_taxon links and they mess up the converstion to tab-delimited. Need to resolve these relationships. Have deleted them from the file for now. This is the list:

GO:0048494	chromatophore ribulose bisphosphate carboxylase complex
GO:0030075	plasma membrane-derived thylakoid
GO:0030094	plasma membrane-derived photosystem I
GO:0030096	plasma membrane-derived thylakoid photosystem II
GO:0031676	plasma membrane-derived thylakoid membrane
GO:0031979	plasma membrane-derived thylakoid lumen
GO:0048493	plasma membrane-derived thylakoid ribulose bisphosphate carboxylase complex
GO:0009521	photosystem
GO:0030077	plasma membrane light-harvesting complex
GO:0042716	chromatophore
GO:0009760	C4 photosynthesis
GO:0009761	CAM photosynthesis
GO:0016168	chlorophyll binding
GO:0030093	chloroplast photosystem I
GO:0030095	chloroplast photosystem II
GO:0030089	phycobilisome

I am wondering what the violations.txt file is in cvs. There is a readme but it is a dead end when you follow the urls.

9th April 2009

Looked through the rest of Michelle's taxon spreadsheet and sent a few questions to her before doing the final edits:

> 2.  regarding "flagellin-based flagella...." terms - I left these as prok,
> but Archaeal flagella have "flagellin-based" flagella too, but the
> flagellins are different than bacterial flagellins ­ and also they have
> different overall flagellar structures.  Therefore, the "flagellin-based"
> terms do not provide a way to distinguish between the Arch and Bact types.
> These will likely need to be revised.

Oh that's good to know. So for now I should make these prokaryote,
but later we might like to make child terms then?

> 3.  Archaea do not have peptidoglycan, but they have other cell wall
> materials that are similar ­ we don¹t have terms to cover those.

Good to know. So I'll leave these as Bacterial for now and we can add Archaeal terms later.

> 4.  I notice that there are terms like ³plant-type cell wall² and
> ³fungal-type cell wall².  Can¹t we then have ³Bacterial-type cell wall² and
> ³Archaeal-type cell wall²?  Or even "bacterial-type flagella" and
> "archaeal-type flagella"?  That would be more clear I think.

Yes that would be fine. Which ones would you like me to add? If you
put a column in the attached spreadsheet and write the names in I can do the edits.

27th/28th April

The prokaryotic terms are now finished and a list of questions has been sent to SGD for the fungal terms.

A plan has been formed to finished the sensu terms and release the file in time to submit a paper to this conference:

IEEE International Conference
on Bioinformatics and Biomedicine (BIBM09)
Washington DC, USA, Nov. 1-4, 2009
Electronic submission of full papers: July 10, 2009

I sent this list to Kimberly to get feedback:

GO:0060111 alae of collagen and cuticulin-based cuticle extracellular matrix
GO:0060110 basal layer of collagen and cuticulin-based cuticle extracellular matrix
GO:0060102 collagen and cuticulin-based cuticle extracellular matrix
GO:0060106 cortical layer of collagen and cuticulin-based cuticle extracellular matrix
GO:0042715 dosage compensation complex assembly during dosage compensation by hypoactivation of X chromosome
GO:0042464  dosage compensation, by hypoactivation of X chromosome
GO:0060105 epicuticle of collagen and cuticulin-based cuticle extracellular matrix
GO:0060104 surface coat of collagen and cuticulin-based cuticle extracellular matrix 

This list was sent to Eurie to get feedback:

Sensu Fungi terms	
GO:0000754	adaptation to pheromone during conjugation with cellular fusion
GO:0007569	cell aging
GO:0030466	chromatin silencing at silent mating-type cassette
GO:0000747	conjugation with cellular fusion
GO:0000755	cytogamy
GO:0030473	nuclear migration along microtubule
GO:0000750	pheromone-dependent signal transduction during conjugation with cellular fusion
GO:0034306	regulation of sexual sporulation
GO:0000749	response to pheromone during conjugation with cellular fusion
GO:0009847	spore germination 

Sensu Saccharomyces terms	 

GO:0000754	adaptation to pheromone during conjugation with cellular fusion
GO:0007571	age-dependent general metabolic decline
GO:0000752	agglutination during conjugation with cellular fusion
GO:0007569	cell aging
GO:0000751	cell cycle arrest in response to pheromone
GO:0030466	chromatin silencing at silent mating-type cassette
GO:0000747	conjugation with cellular fusion
GO:0000501	flocculation via cell wall protein-carbohydrate interaction
GO:0001403	invasive growth in response to glucose limitation
GO:0030473	nuclear migration along microtubule
GO:0007576	nucleolar fragmentation
GO:0007323	peptide pheromone maturation
GO:0000750	pheromone-dependent signal transduction during conjugation with cellular fusion
GO:0030163	protein catabolic process
GO:0000321	re-entry into mitotic cell cycle after pheromone arrest
GO:0000749	response to pheromone during conjugation with cellular fusion

The current up to date version is in the edit file.

1st May 2008

Finished Nematode terms

TODO: return to dosage compensation terms and put a note in the def e.g. 'for example in Nematodes'. Do the same for the other terms. These terms need a taxon note so people know which one is which, but they are not suitable to be restricted on what can be annotated to them. This needs to be done in the live file. The taxon links are sorted out.

N.B. The dosage compensation terms do not need a taxon link even though they have a sensu synonym.

TODO: Michelle Gwinn-Giglio has requested the following change in the live file:

I'd say that the following terms should all have "flagellin-based" changed to
flagellin-based flagellum
flagellin-based flagellum filament
flagellin-based flagellum filament cap
flagellin-based flagellum hook-filament junction
flagellin-based flagellum hook
flagellin-based flagellum basal body
flagellin-based flagellum basal body, distal rod
flagellin-based flagellum basal body, distal rod, L ring
flagellin-based flagellum basal body, distal rod, P ring
flagellin-based flagellum basal body, proximal rod
flagellin-based flagellum basal body, MS ring
flagellin-based flagellum basal body, C ring
flagellin-based flagellum basal body, rod
flagellin-based flagellum part

We could add the phrase "as found in bacteria" [to the def] - would that work?

Then we should add a term "archaeal-type flagellum".

15th May 2009

Reminder that we still have feedback to act on from SGD.

Late May

We had a discussion with Jen Deegan, Chris Mungall and GO-Top on email about whether we could publish the trigger system. This led to further discussion of the ideas that would be covered. The discussion was broken off prematurely as Jen went on vacation, but some useful feedback came out of it. It has been proposed that the sensu terms that are difficult to interpret because of very technical or esoteric names and definitions should be clarified by putting an example in the gloss of the definitions field, or in a special examples field. Jen is in favour of this, but we have not yet had a chance to discuss the idea fully. Following on from other parts of the discussion, Jen Deegan and Chris Mungall are working together to figure out whether the presence of union terms is useful in terms of number of errors found, or if we could do without these to avoid complexity. The full discussion was very detailed and lengthy but Jen has it archived.

28th May

Changed the 'and's to 'or's in the union term names. Put explanation of the file in cvs into the readme file.

29th May

Realised there was an error in the editing workflow. I had regenerated the all_edits file by using a recent live file and the TaxonGOLinksFile.obo, and so had lost all of the unchecked links that did not have the def dbxref GOC:mtg_taxon. I had retained all of the checked links but they were no longer marked with the dbxref, as incorporation of the new live file had overwritten the definitions and obliterated the dbxrefs. No checked links have been lost and I have now edited the all_edits file so that all of the checked links are once more marked with the dbxref.

30th May

Chris Mungall has adapted a script so that it can now be used to check for ontology and annotation errors using the union terms as well as the links to monophyletic groupings. He found a number of issues, and we are going to make a file in cvs in go/scratch/go-taxon in which to collect metrics, so the data are kept even though the errors have been fixed. We are also checking the output of the checking script into cvs.

1st June

Changed editing workflow so that dbxrefs are not used to make checked links. From now on only checked links are retained in the file. The all_edits file is still the master copy, but regeneration of the all_edits file by combining the OBO file of links with the current live file will not cause loss of data.

Jen has been working mainly on fixing the sensu terms so that the sensu synonyms can be stripped out as soon as possible. However Chris is also keen to have triggers for the non-sensu terms that are also obviously taxon-specific. These can easily be found by using the GOOSE query that Chris made. Jen has propsed that she should start working on this list of terms after completing the last few difficult sensu terms.

Currently a high priority remains to figure out whether the union terms show up a lot of errors that could not be found without them.

9th June

From Chris:

id: GO:0031424
name: keratinization
namespace: biological_process
def: "The process in which the cytoplasm of the outermost cells of the vertebrate epidermis is replaced by keratin.     
Keratinization occurs in the stratum corneum, feathers, hair, claws, nails, hooves, and horns." [GOC:ebc] 
is_a: GO:0032502 ! developmental process
relationship: part_of GO:0009913 ! epidermal cell differentiation

Suggestion: keratinization only_in vertebrate
Suggestion: organ development only_in metazoa 


sorocarp development only_in_taxon slme molds. No: slime molds are polyphyletic. WHat about the annotation from Magnaportha grisea? Is this right. Not sure this is a slime mold.

10th June 2009

Chris and Jennifer met at the EBI and worked on the taxon workflow. The results are minuted here:


12thJune 2009

I have made an import file in go-taxon to facilitate loading of the files.

Edits for today:

I have sent this to Alex and Tanya:

Hi Alex and Tanya,

Do you think it would be possible for us to agree a better def for this term: 

activation of innate immune response
def: Any process that initiates an innate immune response.

It used to have a sensu Magnoliophyta synonym so I am assuming 
that it is a plant thing, but the definition is not terribly  
informative. I have added a hint: "An example of this is the 
activation of innate immune response in Arabidopsis thaliana", 
but I think maybe there could be something more explanatory. The 
pubmed ids in the dbxref field suggest that it may not be a plant term.



I am working through the old sensu terms in alphabetical order, checking the links, removing the sensu synonyms, and putting examples in the defs.

I added a new taxon link, saved out and ran the scripts. I have decided not to commit the mid file after all, as it takes ages, and I need to change editing topic too frequently to easily allow for this. I updated the editing workflow document.

29th June 2009

We agreed a definition for 'activation of innate immune response' and I have made the edits. I also worked through about 30 more terms. I managed to filter out the links file but am having some problems filtering out the live file to commit back to the repository. I am continuing to work on that.

30th July 2009

Was not able to get the live file filtered out and committed back as the filtering turned out to be tricky and in the meantime a very large changeset was committed. I will make those synonym changes again without the links loaded, and just commit back without requiring the filtering step.

3rd July 2009

I have redone those changes and committed the live file.

5th July 2009

Resolved further sensu synonyms and links. 464 left to go.

8th July 2009

Resolved further sensu synonyms and links. 424 left to go.

9th July 2009

Resolved further sensu synonyms and links. 380 left to go.

5th August 2009

Resolved further sensu synonyms and links. 333 left to go.

11th August 2009

Resolved further sensu synonyms and links. 275 left to go.

13th August 2009

Resolved further sensu synonyms and links. 174 left to go.

14th August 2009

Noted that for annotation checking we will need the XP from biological process to cellular component to be in place. The terms 'mitochondrial process XXX' have their only_in_taxon link via the relationship to 'mitochondrion' in component, and then from there up to 'Eukaryota'. There are quite a number of terms like this (e.g. mitochondrial ATP synthesis coupled proton transport ; GO:0042776). Several other processes are defined in terms of organelles that are taxon-specific. I have reported this to Chris and he has modified the checking script to take this XP file into account.

Resolved further sensu synonyms and links. 144 left to go.

24th August 2009

Asked Midori about the XPs between process and cellular component. She confirms that these links should be in the process to cellular component file and that she is in charge of this file. She has suggested that I check in the file, and that if I find any links missing I could forward them on, which I will do.

25th August 2009

There are a number of cell differentation terms, where the cell type is specific to a given taxon. I have written to Chris to ask if we should label these terms in GO and transfer the links electronically to the cell type ontology, or if I should wait and label the cell type ontology directly.

Chris suggests that I make the links directly to the cell type ontology, but check the xp links are in place at the same time. I will plan to do this after I have worked through the list that came from Chris's GOOSE query.

Current priority list is:

Finish remaining terms with sensu synonyms
Fix false positives that the checking script shows
Do GOOSE results
Do cell type ontology

27th August 2009

All of the sensu synonyms have now been removed and taxon checking lines have been added to the file for these terms, where appropriate. In some cases, examples have been added to the definition to provide additional clarity of meaning for the term.

28th August 2009

I am looking through the file of errors that Chris Mungall's checking script has produced. It is called gaf-taxon-gaffes.txt and is in go/scratch/go-taxon.

Here are the things that I have found:

  • Human annotations to the plant senescence (GO:0010149) term. Senescence in plants is the process whereby cells die just before the shedding of an organ, whereas senscence in humans is the more general aging of cells that remain alive in the organism. We may need to make a new term for the human process, or add a synonym to the existing aging (GO:0007568) term.

8 manual annotations by BHF-UCL to plant senescence. I have reported this to GOA, and they will liaise with BHF-UCL and send back any terms changes that need to be made.

  • Mammalian reproduction in chickens: There are 6 AgBase ISS annotations to GO:0001553 "luteinization" or GO:0007595 "lactation", which are both mammalian reproductive processes. Written to Fiona McCarthy.
  • I will come back to phycobilisome GO:0030089 later as it needs some research.
  • 67 WormBase annotations to 'gastrulation with mouth forming first' GO:0001703. All are IEA. I Have written to Kimberly van Auken to ask if this is right.
  • 14 EcoCyc IEA annotations of E. coli gene products to 'nucleus' GO:0005634. Written to Jim Hu to ask about these.
  • Emily is looking at the file too, and is making notes of the numbers of annotations errors found, and the fixes that she has applied. She is feeding this information back, along with any ontology errors that she finds.