Draft Proposal

From GO Wiki
Revision as of 12:52, 23 April 2014 by Gail (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Community approach

Background

A small working group has been considering, after the sensu overhaul, whether we should retain/maintain any taxonomic information associated with GO terms. For example, since the sensu overhaul, when curators use a term like 'wing development', it should be obvious from the term and definition the type/s of wing to which the term refers. However, some terms such as 'chitin- and beta-glucan-containing cell wall' may still be confusing to some curators and users if their knowledge of differences among the taxa is limited to specific areas. In addition, as emerging genomes utilize computational methods to infer gene-term associations, those genomes will not want to infer annotations to GO terms that are not appropriate for their organism. Having taxa-term associations may be useful in filtering out nonsensical annotations in these efforts. Do we want to have information written down more explicitly to make it very clear about which species could be used with this term?

The discussion was aided by reading of a paper by Waclaw Kusnierczyk that is available to read, but not yet published. If you would like to read it please contact him on 'waku at idi.ntnu.no'. The paper outlines a method that would allow us to capture information on how the various terms intersect with the different taxa. The method would allow us to capture only the infomation that is easy to come by without having to immediately capture all species information.

The minutes of the meeting are at http://gocwiki.geneontology.org/index.php/Taxon_Main_Page.
As a result of the discussion we have produced the following proposal.

Proposal

In this proposal we take into account of the following factors:
1) Taxonomic data associated with GO terms would be very helpful.
2) Taxonomic data should not be integral to the GO files.
3) Taxonomic data can be captured by looking at the gene_association files.
4) We need an automated means of keeping the taxon data in sync with the GO files.
5) This system would enable us to spot errors in annotation, where a gene product had erroneously been annotated to a term that was not appropriate for that species.

To fit all these requirements we have come up with a proposal that outlines the specifications for a separate community project file that would enable taxonomic data to be captured but kept as a mapping file, separate from the GO file itself.

It was proposed that the taxonomic data could be captured in a file resembling the gene_association files. Each line of the file could capture a relationship between a GO term and a taxon and contain the following pieces of tab-delimited data:

GO:id
GO term name
Taxon id
Taxon name
evidence for connection (pubmed id or curators id)
Relationship

There was some discussion of the relationships in the Waclaw's paper and in general we were very happy with those that he had proposed although we considered how the names could be made more intuitive. Waclaw was entirely supportive of any name change that would make the system more intuitive to users.

Here is a brief summary of the proposed relationships as described in Waclaw's paper:

Note: a feature can be a biological process, cellular component or molecular function.

validity (valid_for)
The feature can be found in all species subsumed under this taxon but not in all individuals. For example, all mammal species exhibit suckling, but 
only female individuals of these species. 
Specificity (specific_to)
The feature cannot be found outside of this taxon. For example leaf 
development is not found outside of Viridiplantae. 
relevance (relevant_to)
The feature can be found in organisms of some of the species subsumed by 
the taxon but not necessarily all. The feature may also be present in
organisms outside of the taxon. For example hatching is relevant for 
bird species and platypus. This relationship would be used to label such examples. 

To help clarify our proposal we are collecting some examples of lines for the new file.

Here are a couple to start:

GO:id	        GO term name	          Taxon id	Taxon name	evidence 	   Relationship
GO:0009553	embryo sac development    3398	        Magnoliophyta	ISBN:047186840X	   specific_to
GO:0048229	gametophyte development   33090	        Viridiplantae	ISBN:047186840X	   specific_to

Implementation ideas

1) The ontology editors would add any relationships that they deem to be useful as they edit the terms.
2) The file would be held in GO cvs.
3) All manual GO annotations would be included as relevant_for links. To see an example of this please see the relevant_for page.

The members of the taxon working group are: Waclaw Kusnierczyk, Chris Mungall, David Hill, Jennifer Clark, Midori Harris, Jane Lomax, Melissa Haendel, Judy Blake. (Please note that they have not yet signed off on this proposal. It is being written here in advance of sending to them for discussion/approval.)