GO website page GO history
History page removed from the GO site and being kept here for the time being, until its fate is decided.
The development node
Process Ontology Discussion
From before the meeting actually started through to the end of the first day, discussion kept ranging back to the Process Ontology. By 9AM, Michael and Judith were discussing how the term 'heart development' might be exist in both GO and MGI DAGs, but that they would mean different things. By the end of the day, it appeared that a consensus had been reached whereby 'heart development (sensu Mammalia)' in GO might be a conceptual equivalent to what MGI and its users might need under the heading 'heart development.' Needless to say, much additional discussion need follow this, but there was a definite feeling that progress was made in tackling the 'anatomy of process' problem (my term, not introduced at the meeting). It appeared that this was very much a fly/mouse oriented discussion, that the granularity around the worm and plant and fungal anatomical issues were not deeply addressed. That being said, the 'solution' arrived at was to 'embrace the explosion' in such a manner that any organism might be handled. It also appeared that addressing mouse anatomy was a generally accepted surrogate for mammals in general, as a first pass.
Embracing the Explosion
What does it mean to include anatomy? Better yet, what happens if you do not include it, or include it 'incorrectly'? It was thought at one time that anatomy could indeed be excluded from the GO, but this runs into a number of problems, chief among them being that anyone looking at a Process Ontology might well expect to see terms such as 'spermatogenesis' or 'heart development'. In the absence of such 'important' and pervasive terms, the utility of the GO is significantly diminished.
One possible solution discussed at length and eventually abandoned was that of combinatorial terms. Such Combi-Terms (my phrase, not used at the meeting) would take components from existing ontologies, creating a new term that does not exist in either, effectively creating a term set that is not itself hierarchical but bridges two hierarchies, thereby allowing entry and tracing of each one touched. For example, the notion of encoding the concept 'heart development' as 'heart (anatomical ontology)' + 'development (process ontology)' was discussed. In such a situation, a term could be either a literal (an explicit part of one of the ontologies) or a symbol combination (a combination of literals, a Combi-Term). The combination inherits the lexicon and the hierarchical position of each component term, but not the definitions of the components; a definition and GO ID would be assigned to the Combi-Term. However, this solution is prone to drift- error, meaning that the Combi-Terms might not keep up with changes in the anatomical and process ontologies. Further, there are problems inherent in using only symbols and divorcing those from the meanings of the symbols while retaining their hierarchy positional information, chief among these being suspicious hierarchy tracing. For instance, given 'heart development' applied to a fly, there is no constraint preventing hierarchy tracing to 'heart valve development,' which is inherently non-sensical for the fly.
Sensu to the rescue
The solution that seemed communally acceptable was to invoke the existing 'sensu' notation. In the case of 'heart development,' the sub-graph might look like:
heart development %heart development sensu Insecta %heart development sensu Mammalia <atrium development
It is sensical for 'atrium development' to only appear under the appropriate 'sensu branch'. The model for this solution was the 'chitin' example (see Usage Guide).
A discussion ran during the pre-meeting around context sensitivity of Terms. In the pre-meeting it was conceded that the goal should be making Terms context independent; the example was 'Suzie' vs. 'Suzanna', which are used in different contexts but both refer to the same person. It became clear during the main meeting, though, that such context independence is a difficult thing to achieve. This issue was not aired in the main meeting.
Note that it appears that David and Heather will be heavily involved in the addition of new high-level terms to the Process Ontology
Anatomical Lexicons and GO
The issue of incorporating anatomical information into the GO is intimately linked with the discussion ranging around the Process Ontology (see Process Ontology Discussion section). There is a clear-cut boundary in the Process Ontology between Terms that can be abstracted from Anatomy and those that cannot. Practically speaking, few processes peculiar to multicellular organisms have meaning when abstracted from anatomy, a particularly good example being developmental processes. The general consensus is that 'when it gets multicellular, it gets complicated'. As was pointed out above, a decision was made to incorporate Anatomies in the following manner:
by using the pre-existing 'sensu' notation by restricting (initially) to class-level distinctions. For instance, mouse will represent the class Mammalia; Drosophila will represent the class Insecta.
This quite different from the conclusion reached in Hinxton, which was to eliminate anatomy from the existing GO Terms and introduce a 4th Anatomical Ontology. Another major shift in this meeting vs. Hinxton is that the sentiment emerging from Hinxton was to eliminate multicellularity from GO; the sentiment emerging from this meeting is that multicellular processes should be incorporated into GO.
DAG-based Anatomical Lexicons already exist for fly - available [DAG] mouse - available [DAG] (both developmental [public] and adult [not yet released]) Worm - available Arabidopsis - being worked on by Leonore; plan to extend as a 'standard plant anatomy' yeast - not available
A proposed solution for representing anatomies in GO was made by Michael:
GO ID: 12345 Process: Heart Development sensu Insecta Anatomical Component: Heart with Anatomy ID (from Flybase Anatomical Lexicon) Reference ID Definition ID
In this solution, each time a 'sensu' reference is made, an explicit reference to an Anatomical Lexicon is made, so that there need be nothing implicit in the definition of 'heart as in insects'. By providing an Anatomical Lexicon cross-reference, a chief concern, drift, is addressed.
A goal that emerged late in the meeting was to work toward an 'Anatomy SLIM' that could represent high level anatomical structures across classes, though it is unclear where this will be taken.
4. Sort Process Ontology into component parts and other Process considerations.
We discussed whether the process ontology should be separated into two parts: cellular and multicellular. This discussion is not new. We recognized the utility of having a complete unit of the process ontology representing cellular-level processes since this is needed and practical for the unicellular organisms. Thus we will work towards a robust representation of cellular processes that will be useful to all. This decision led further to a recognition the process ontology is sorting into 4 major components. These are: cellular processes, developmental processes, physiological processes, and behavioral processes. We agreed to break out cellular processes and to specifically represent them at the top of the Process ontology. Most of the discussion over the rest of the meeting then focused on developmental processes.
a.) Differentiation is a cellular process; morphogenesis is a multicellular process
5. "Determination", "Differentiation" and "Development". Definition of 'Determination' and definition of 'Differentiation'. How shall we represent these concepts? Reflects a 50 year debate in developmental biology. Need to rewrite these definitions so that they are less experimentally based. Consider, throughout, 'has the definition been written in terms of the experimental method?', If so, consider revising definition.
Sound bites from this interesting discussion:
- determination when the decision has been made to adopt a developmental stage ( tricky because it is often before the actual differentiation occurs)
- differentiation when you actually express a set of characteristics...process whereby relatively unspecialized cells acquire
- so is 'cell specification' a synonym for determination? Or is it that specification is the same as establishing an identity but not yet determined. It is a temporal thing. you are getting signal. it is not the same as determination.
- autonomous specification. specification produced by a inheritance of molecules. a type of cell specification
- conditional specification is the specification determined by the relative position of cells in an organisms. A type of cell specification.
- not the same as competence which is a characteristic of a cell.
Conclusion: Competence is the 'ability' to do something. Competence is not a process, it's a state. So we throw it out. but, if useful, we could have 'establishment of competence' or 'maintenance of competence'
Conclusion: the term 'Development' as a high level process will be used to consider the whole history of the organism.
This generated a lot of discussion as we considered 'embryonic development' and 'post-embryonic development'. This is a hard distinction to support for plants and for larval development. Different communities use these terms in different ways. Post-embryonic development is useful for fly...keep it in??? What is covered by the term 'brain development'? It continues throughout life. What do we mean by a term like 'heart development'? Does that mean the developmental process up until you have a heart? Or does it include the further development of the heart after a recognizable organ is formed? Embryogenesis, morphogenesis, organogenesis are all DAGs...some things that parts of embryogenesis will be part of morphogenesis as well. So...
development morphogenesis aggregation differentiation maturation aging senescence
Option 1 global heart development formation of the heart beyond formation of the heart ===Major Divisions of the Process Ontology=== % cell % development % physiology % behavior ===Report on Narrative vs. Combinatorial approach re anatomy in biological process terms.=== This was the major event of this meeting. For many meetings, we have come back to the issue of species-specific anatomies and the incorporation of anatomical terms in the process ontology. Over a year ago, Joel Richardson proposed a combinatorial approach wherein a process term combined with an anatomical term would be used to annotate knowledge about a gene product. At the Hinxton meeting, the group agreed that this was a sensible and powerful approach. However, subsequent implementation efforts revealed difficulties in incorporating such biologically useful concepts as 'gametogenesis'. Also, the management of the combinatorial approach would be harder than the further development of what is now call the 'narrative' approach. The narrative approach is the current paradigm of building up the ontology incrementally as we describe the process in biological terms. Yet, in following discussions, the issue of whether or not to incorporate anatomies, which are themselves highly developed and precise ontologies, in the process ontology kept arising. Finally, at the human annotation meeting at Banbury last summer, we agreed that David Hill would 'do the experiment' and give a presentation at this meeting for the group to consider. David used the example of "Heart Development". He developed ontology for heart development in both the narrative and the combinatorial manners. A copy of that presentation is available. (This is now in cvs, in the teaching resources/presentations folder). The end result was that the group was overwhelmed with the power of the combinatorial approach both to provide self-structured cross-product terms and to reveal new information and avenues for experimentation. #Do we leave it up to each group to decide whether to use this approach to process annotation? A resounding NO from the group. #Can we separate out subtrees that can be used to generate cross-products? Yes, could use GO-SLIM or other subtrees. In fact, the GO-SLIM set may be the mechanism for grouping annotations across species. #There could be cross-products of cross-products....how far do we want to break this down? Don't have to go all the way down as long as the representation of the biology is correct. #Works as long as the two concepts are orthogonal, can't do with just anything and get the consistency needed. #Big worry...if each group is incorporating combined terms relative to their particular anatomy, we lose the power of the combination of all annotations. One approach is to ask the query...'give all products in heart development', and have query go out against all cross-products. We will have to work on this. #Can we have a join of the anatomies? then have a single anatomy to use in the cross-product with developmental processes? don't know...right now, we think the combinatorial approach is the right way to go, we will have to work on the implementation. #Some concern about ripping out anatomical terms from process right now. Can the primary process ontology be made more amenable to cross-species specific anatomical parts? #If we have multiple anatomies, then the search needs to go against anatomies...this can be done. Summary....Issues #There is general consensus to go forward with the combinatorial approach. #Do we need to have a shared anatomy? #How will others be able to use the ontologies to annotate if we have this complicated approach? #Parser...need to put into better language...earlier we tackle the problem of language, the better we can promote this for ourselves and others. #GOAL...write definitions for common developmental process terms. #Start working on further experiments with this approach...write definitions, work out mathematical properties. #Each group needs to provide an anatomy. #The anatomies needn't have GO:IDs, but the cross-products should have IDs. #We will use the developmental process as a demonstration of this approach.. #Immediate action items include: ##schema changes (Joel and Suzi) ##editor will work fine for now. #More about cross products and anatomy. a). Cross products- David Hill expanded on the idea of cross products. We are in agreement to strip anatomy from process terms and proceed with cross products. For making cross products, it is important that the ontologies be orthogonal. We can expand the concept of cross products to many areas and it would be good to have a general tool for doing this that allows you to select specific nodes to create cross products (see d below). With respect to making anatomy ontologies for generating cross products with developmental process node in process ontology we need to first make orthogonal ontologies of anatomy and developmental stage, then take the appropriate cross products from these to make the cross product with development. It is essential to take the time component out from staging (e.g. days post-fertilization, post-germination are not useful as there is a lot of variation in how rapidly development occurs within a species). An example [stage (organism specific, internal ID) X anatomy (organism specific, internal ID)] X [developmental process ( Go-generic, GO ID)] = GO ID. Also, the cross-product of stage X anatomy will have an internal ID. b). Anatomy browsing- Pavel from BDGP demonstrated a browser being developed at BDGP to display gene expression patterns and fly anatomy. Uses both images and text display. c). Each organism database contributes their anatomy/ developmental stage ontologies and definitions to the GO and it will go into the CVS. Each group should be responsible (and responsive) for updates to their anatomy ontology. d). Developing a tool to generate cross products. John Richter thinks he can adapt the editor to have this function. Realistically there will not be a tool till after October for generating cross products. ===10. Erich Swartz, WormBase === IEA has done 1/3 of 19,000 genes. Creating new parsing of ontologies and expanding automatic annotation. Building an anatomy and developmental timing ontologies. Erich has been trying to finish RNAi ... 52 phenotypic types. Currently limited to not having full GO curator. Ideally by next meeting... Proteome has asked WormBase for help them with 'GO-izing' their standard vocabulary that they use for all the phenotypes. Erich has been in contact with WIT2 annotations group. have set up a collaboration with them to do that. TAIR has developed anatomy (1000) and development (120 terms) ontologies. * WormBase - Eimear Kenny - goal to have detailed descriptions for genes by mid-2003 - Andre P. - Erich working on more extensive gene descriptions - 2 new WB curators, 1 is in the process of moving to CA, already doing lit curation - now 3 WB curators working on GO - WB is developing ontologies: due for release soon o cell lineage ontology (Raymond Lee) o developmental ontology (Wen Chen) life stages Can we come up with a way to tell whether a given area within an ontology has been extensively reviewed? There was an unfortunate incident recently where a change was made to a bit of the newly revamped 'development' portion of the process ontology. We'd like to avoid this sort of fumble in the future, but it's impossible to tell just by looking at the ontology which bits have been reviewed thoroughly and which parts still look much as they did two or three years ago. There's a lot of information socked away in CVS log files, the email archive, and meeting notes, but it would be much more convenient for curators if the excavation of ontology content history could be streamlined. ===Cellular differentiation vs cell fate commitment and cell type development vs cell type differentiation === David Hill outlined a suggestion: cell differentiation can be broken down into the following steps; cell fate commitment where a cell senses its location and begins to specialize, but can still switch types, cell type determination where a cell switches irreversibly to a specific type and cell development where a cell physiologically matures into its type. Should we use these divisions in GO? The group agreed that we should. Conclusion: Cell differentiation and its children will have the following structure: cellular process [i] cell differentiation [p] cell fate commitment (exact synonym: cell fate specification) [p] cell fate determination [p] cell development (exact synonyms: cell morphogenesis, cell maturation) ===Appendix 4A. Email from Tanya Berardini === Cellular process issues (from Tanya): Subject: Cellular process issues for St.Croix Hi everyone, Here are a few issues that I think would be good to address at the meeting. David will be attending, while I won't be able to make it. 1. cell differentiation vs. cell fate commitment right now, these terms are siblings cell differentiation: The process whereby relatively unspecialized cells, e.g. embryonic or regenerative cells, acquire specialized structural and/or functional features that characterize the cells, tissues, or organs of the mature organism or some other relatively stable phase of the organism's life history. ref:ISBN:0198506732 cell fate commitment: The commitment of cells to specific cell fates and their capacity to differentiate into particular kinds of cells. Positional information is established through protein signals that emanate from a localized source within a cell (the initial one-cell zygote) or within a developmental field. ref: ISBN:0716731185 2. response to endogenous stimulus and response to exogenous stimulus Move to be children of physiological process/add physiological process as additional parent? Right now, they are children of cell communication. response to endogenous stimulus: The change in state or activity of a cell or an organism as a result of the perception of an endogenous stimulus. ref: TAIR:sm response to exogenous stimulus:The change in state of activity of an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of the perception of an external stimulus. ref: FB:hb 3. cell_type development vs. cell_type differentiation Do we need both terms? Are they meant to describe different things? (e.g. pole cell development vs. pole cell differentiation) Check out the children of cell differentiation for a sample. ==Entry of EC numbers in the Gene ontology== Enzymes in the molecular function ontology carry a cross-reference to an EC number from the enzyme nomenclature database http://www.Chem.Qmw.Ac.Uk/iubmb/enzyme/ This allows automated cross-referencing by other databases and in the past was used for automatic transfer of definitions from the enzyme nomenclature database to the gene ontology. The EC numbers are not entered into the cellular component ontology. ===Incomplete EC numbers: === The EC numbers all follow the pattern <b>EC n.N.N.N</b>, for example EC 18.104.22.168 which corresponds to acetoin dehydrogenase. Each of the individual digits indicates a more precise level of definition of enzyme function. e.g. EC 1 Oxidoreductases EC 1.1 Acting on the CH-OH group of donors EC 1.1.1With NAD or NADP as acceptor EC 22.214.171.124 Alcohol dehydrogenase In some cases the function of an enzyme has not been completely characterized and so the EC number will also be incomplete and would take the format <b>EC n.N.-.-</b>. Incomplete EC entries have all been removed from the molecular function ontology, but they have been retained in the EC to GO flatfile. In the future we will work through these numbers to distinguish between the EC numbers corresponding to uncharacterized enzymes and those that simply correspond to enzymes whose functions require less than four digits for a full definition. ==Enzyme PH ontology== Some similar enzymes are found to act at different pHs. In this case a separate entry will be made only if the two enzymes carry different EC numbers. ===Coupled and uncoupled enzymatic reactions: ATPase === EC 3.6.3.N. And EC 3.6.4.N all correspond to ATPases. EC 126.96.36.199 Is the number for the enzyme that can hydrolyse ATP without being coupled to any other reaction, whilst all other ATPases have another reaction coupled to their activity, as in the case of EC 188.8.131.52, Which is the Ca2+-transporting ATPase. For the purposes of GO, the enzymes are categoriesed according to whether they perform coupled and uncoupled reactions. ==What is the distinction between the <b>membrane fraction</b> term and a subset of the <b>membrane term</b>?== These two concepts are very different because one is derived from a type of experimental data (cellular fractionation by centrifugation) and the other corresponds to a cellular component (the membrane): 1) Membrane term GO:0016020: Double layer of lipid molecules that encloses all cells, and, in eukaryotes, many organelles; may be a single or double lipid bilayer, also includes associated proteins. 2) Membrane fraction term GO:0005624: That fraction of cells, prepared by disruptive biochemical methods, that includes the plasma and other membranes. The experiments that give rise to <b>membrane fraction</b> data involve the complete disruption of cellular structure followed by the mixing of cellular components and their division into soluble and insoluble fractions by centrifugation. There are various other <b>fraction</b> term names to encompass data from similar kinds of experiment which distinguish between substances with different solubilities. It is clear, that the <b>xxx fraction</b> term is not synonymous with a <b>part of</b> child of <b>xxx</b>. ==Do we derive annotations to the <b>xxx</b> term from annotations to the <b>xxx fraction</b> term?== The experiments that give rise to <b>xxx fraction</b> data involve the complete disruption of cellular structure followed by the mixing of cellular components and their division into soluble and insoluble fractions by centrifugation. Therefore, whilst we can derive information about the solubility of components from the result, we cannot draw any conclusions about their original cellular location. For this reason, the <b>xxx fraction</b> term is retained only as a repository for experimental data, and no interpretation of the data is carried over to the children of the <b>xxx</b> term. ==Synonyms== Where more than one expression can be used to refer to the same concept, a single term will be created and the other names will be entered as synonyms. This may also include expressions that are in common usage, and that are likely to be used by scientists searching for the particular term that pertains to their work. It may also include concepts that are a more general or more specific version of an existing term. In this case it is likely that the synonym will eventually be added as a term in it's own right, and that the addition of the synonym is simply a temporary measure. In order to distinguish these different types of synonyms, a set of symbols was created to show the different relationships of synonyms to their respective terms: <pre> Synonym types = the term is an exact synonym ~ the terms are related (eg. XXX complex and XXX) < the synonym is broader than the term name > the synonym is more precise than the term name != the term is not the same as the synonym (no precise relationship assigned)
For more information about synonyms and their documentation and use please read the <a href="GO.usage.shtml#synonyms">Guide to Synonyms</a>.
The Distinction between Function and Process
A gene product can have one or more functions and can be used in one or more processes; a gene product may be a component of one or more supra-molecular complexes. Function is what something does. It describes only what it can do without specifying where or when this usage actually occurs. Process is a biological objective. A process is accomplished via one or more ordered assemblies of functions.
How granular should we get in the ontologies- what belongs?
In general, the answer is that the GO should be as granular as possible, within the bounds already defined. Community feedback into the ontologies is especially important as a means to improve granularity. The general sentiment is that annotation should be done to the highest level of granularity available in order to provide an ontology that is useful for querying domain-specific databases. Note that the stated utility here is in the GO's use in interrogating domain-specific databases, not across domains. One example of the currently appropriate level of granularity was found in a discussion of the sensu Magnoliophyta terms (see next item).
A problem was discovered with the sensu Magnoliophyta terms. Many of these temrs seem misleading because they actually refer to phenonmena that also occur more broadly outside Magnoliophyta. However it was pointed out that that 'sensu Magnoliophyta' just means 'in the sense of Magnoliophyta' and so does not exclude annotation of non-flowering plant gene products to such a term.
To get rid of the Magnoliophyta term and add a sensu term covering all groups that could actually by annotated to a term would be quite time consuming because we would have to check in the case of each term, whether it applied to all plants and whether all green algae were included etc.
At the moment there are no non-flowering plant species being annotated and so there is not an urgent need for terms to be created for the annotation of non-flowering plants.
With these points in mind it was decided that we should concentrate on making the flowering plant terms exhaustive and stick to sensu Magnoliophyta and then we can do the rest once we have non-flowering plants being annotated.
This illustrates some of the forces governing the level of granularity of the GO. For plant annotation, it is currently appropriate to make terms to this level of granularity, but this will change in the future when other plant species are being annotated to the GO. Therefore, the granularity of the GO may be assumed to be constantly changing in response to the various forces at work.
Very often, similar processes operate in markedly different ways between organisms. The sensu terms were created to address this problem.
An example of the use of 'sensu':
- embryonic development ; GO:0009790 The development of an organism from zygote formation until the end of its embryonic life stage. The end of the embryonic stage is organism-specific and may be somewhat arbitrary. For example, it would be at birth for mammals, larval hatching for insects and seed dormancy in plants.
- embryonic development (sensu Magnoliophyta) ; GO:0009793 The embryonic development of flowering plants that ends with seed dormancy.
- embryonic development (sensu Animalia) ; GO:0009792 The embryonic development of an animal from zygote formation until the end of its embryonic life stage. The end of the embryonic life stage is organism specific and may be somewhat arbitrary; for mammals it is usually considered to be birth; for insects the hatching of the first instar larva from the eggshell.
If more than one taxonomic group is included then the names are separated by commas, e.g.: %female meiotic spindle assembly (sensu Drosophila, sensu Mus) ; GO:0007056 One risk is that the many differences between process in different groups migh lead to the excessive proliferation of sensu terms. It was decided that sensu terms should only be created in the case of homonym terms; those which have the same text strings with unique meanings for each organism. To help enforce this, each time a 'sensu' terms is made, an explicit reference is included made, so that there need be nothing implicit in the definition.
In addition, the creation of sensu terms was restricted to those with class-level distinctions. For instance, mouse will represent the class Mammalia; Drosophila will represent the class Insecta.
In spite of the creation of these rules to minimise the number of sensu terms, the word 'sensu' has found it's place as one of the most frequently used words in the ontologies, alongside 'of' and 'and'.
Should GO include behavior terms or are there too few that are proven to be directly affected by gene activity? Peter Midford in Arizona, already is working on behaviour ontologies for loggerhead turtles, jumping spiders and we feel this level of detail seems to be beyond the scope of GO. However, there still needs to be some descriptive capability for behaviour within GO, both for Drosophila and maybe for mouse, to be able to annotate certain genes since both species have genes known to directly affect behavior. The essential questions relate to what should be included in Process. It is clear in Drosophila that one can pin certain genes to behaviours like walking or circadian rhythms because these are hard-wired. Conversely, there is need for an auxiliary ontology developed specifically to deal with behaviors in mouse since much knowledge in this area is not tied directly to specific gene activity
Conclusion - we do want behaviour in GO, but there may be other ontologies, for groups like mouse, that will extend these. In these cases we'll recommend that these auxiliary ontologies be consistent with GO and include any necessary cross-references to GO terms. To support this the GO terms should be at a level that can be used for many organisms for behaviours that have a genetically defined component.