Taxon-GO Proposal 2007
Taxon in GO - 27th April 2007
Participants: Waclaw Kusnierczyk, Chris Mungall, David Hill, Jennifer Clark, Midori Harris, Jane Lomax, Melissa Haendel, Judy Blake.
Minutes: Jennifer Clark
Prior to the meeting we all read Waclaw's paper presenting a way in which GO could capture relationships between GO terms and taxon. We called this meeting to discuss whether this or a modified form should be implemented.
Should taxon be there at all?
Strong opinions were expressed one several sides:
Judy felt that taxonomic information should not be included as part of the GO file itself. She suggested that this would make more sense as a community project, just as creation of the plant GO slim was a project carried out by TAIR for their own purposes.
David suggested that the best way to find relations between taxa and GO terms would be to extract the data from the annotation files. In response to this Jane pointed out that while this would give us a start, not all all terms have annotation, so the data set created would be very incomplete. Melissa agreed with this and said that annotation file data would be a good start but that it would require manual curation to make it complete and accurate.
Chris said that his main concern was that many OBO groups were keen to represent links between ontologies and taxa, and that this was an opportunity for GO to figure out a good system that the other groups could follow.
Melissa was keen to go ahead with implementing Waclaw's system as it is, and said that figuring out a standard way to represent taxon links would be very helpful for the other ontology that she edits.
Jennifer was concerned that, without any taxon labelling, the sensu terms were now extremely cryptic and hard for annotators and ontology editors to interpret.
Waclaw was helpful in explaining the technical aspects of his paper. He was keen to make it clear that although he is very flexible on the particular details of any work done, he would be very supportive of any attempt to actually implement the idea. Waclaw supports the idea that GO-TS links should not be placed within the structure of the GO, but rather be provided as an additional taxonomic annotation database.
As Judy felt strongly that taxon information should not be included in the ontology file, we also discussed whether the prokaryote category should be allowed to:
a) continue to exist
b) continue to be maintained as part of the ontology file.
This subject then expanded to include the taxon specific slims for plant and yeast. These are taxon specific but are regular slims, in that they are a set of more high-level terms. Following this the discussion broadened to the question of whether slims should be maintained in the GO file at all.
1) Prokaryote Category. Jane said that this category has been very helpful for certain communities who want to be able to view a smaller set of terms and find the terms they need more easily. She said that this set could not be made automatically just from the annotation files as a lot of the prokaryote terms did not have any annotation.
Judy felt that groups should not look at a category like this as the GO is for all species and that all terms should be viewed at all times. The point was made in response that many users found the ontologies unmanageably large and that they really wanted these categories. Several people agreed with this.
Judy suggested that the category could be maintained externally by the group that need it. However, Jane and Midori pointed out that it made a lot of sense to have the category within the file so that it automatically adjusted for changes in the structure of the ontologies. Chris said this would be fairly trivial to do automatically even with an external file.
2) Plant slim/Yeast slim There was some discussion of whether these slims that are specific to taxon groups should also be maintained outside of the file by the communities that use them. Judy felt that this might be beneficial in removing taxon data from the file. However others pointed out that the plant and yeast slims were not like the prokaryote category as they were regular slims, not really capturing taxon data specifically.
3) Other slims. It was suggested that in clearing out all taxon data from the file we could stop including slim data in the file at all. It was also pointed out again that maintaining all slims external to the file would be difficult as they would not adjust with the ontology structure. As before, Chris said this would be fairly trivial to do automatically even with an external file.
The majority of participants felt that taxonomic data for the GO would be very helpful. Judy (and David I think) felt that it should not be integral to the GO files. Midori did not give a view in either direction, but felt that whatever we do should entail an automated means of keeping the taxon data, slims, etc. in sync with the GO files. As a constructive response to this, the last half hour of the call was spent in figuring out the specifications for a separate community project that would enable taxonomic data to be captured but kept separate from the GO file itself.
It was proposed that the taxonomic data could be captured in a file resembling the gene_association files. Each line of the file could capture a relationship between a GO term and a taxon and contain the following pieces of tab-delimited data:
GO term name
evidence for connection (pubmed id or curators id)
[Notes added later by Judy: what would the evidence for the connection be other than the annotation from experimental data from a given organism? There is no other, so this would be, I think, the gene association row from a given organism. It can't be extended beyond the organisms being annotated since we don't have the knowledge to do so. Our whole annotation paradigm is that we seek to capture experimental data from a specific organism. Annotations collected by orthology or sequence similarity are at the molecular, not organismal level. So, anytime anyone wants to know knowledge from, say, mouse, and they want to consider using this knowledge in some way, they might just as usefully get the mouse annotation files. ]
There was some discussion of the relationships in the Waclaw's paper and in general we were very happy with those that he had proposed although there was some discussion of how the names could be made more intuitive. Waclaw was entirely supportive of any name change that would make the system more intuitive to users.
It was further proposed that an example file with a few relationships could be made on the wiki for discussion.
Here are a couple to start:
GO:id GO term name Taxon id Taxon name evidence Relationship GO:0009553 embryo sac development 3398 Magnoliophyta ISBN:047186840X specific_to GO:0048229 gametophyte development 33090 Viridiplantae ISBN:047186840X specific_to
Here is a brief summary of the relationships:
Note: a feature can be a biological process, cellular component or molecular function.
validity (valid_for) The feature can be found in all species subsumed under this taxon but not in all individuals. For example, all mammal species exhibit suckling, but only female individuals of these species.
[Notes added later by Judy: we will never know this until we have determined this for all species. Again, this presumes that the taxonomies for species correspond to molecular characteristics exactly, and we do not know this. This is why this approach is so dangerous. The only validity we have is what we have specifically annotate for a given species].
Specificity (specific_to) The feature cannot be found outside of this taxon. For example leaf development is not found outside of Viridiplantae.
[Notes added later by Judy: I think we will find in general that specificity can only be stated at very gross levels of taxonomy of species].
relevance (relevant_to) The feature can be found in organisms of some of the species subsumed by the taxon but not necessarily all. The feature may also be present in organisms outside of the taxon. For example hatching is relevant for bird species and platypus, for example. This relationship would be used to label such examples.
[otes added later by Judy: and how are you going to know this? It seems to me that all features are potentially relevant to all species at this point in our annotation of information. We simply do not have the knowledge to make these statements in more than a very few cases and only in specific reference to the 10 or so model organims.]
These three relationships can be propagated in various ways up or down the graph. Please see Waclaw's paper for details. For a copy of the paper please contact Waclaw on waku at idi.ntnu.no and please bear in mind that the paper is not yet published and that the contents should be kept confidential.
[Notes added later by Judy: I am not in favor of incorporating presumptions of knowledge in the GO. If a researcher is interested in knowing what parts of the GO have been shown to be relevant to a specific taxa, they should work directly with the annotations we have. I think this effort to provide relevance, specificity and validity is very misguided in this case because, and I know I repeat myself, we do not have the knowledge yet to apply molecular distinctions at a taxonomic level.]
Discussion point 1: specificity can be stated at very gross levels of taxonomy of species
Judy: I think we will find in general that specificity can only be stated at very gross levels of taxonomy of species.
Jennifer 03:20, 8 May 2007 (PDT): I agree that specificity can only be stated at very gross levels of taxonomy of species. Would you have any objection to stating specificity at this gross level, ether within the GO or in a separate file? For example:
embryo sac development specific_to Angiosperms. chloroplast specific_to Virdiplantae