GO textbook approach for annotation

From GO Wiki
Jump to: navigation, search

We are going to be taking a much more focused approach in the renewal that will be topic oriented. We will enumerate the areas of biology, like the chapters and sections in a biology textbook and proceed through them progressively (and if possible in parallel). The different areas will be our targets for ontology development. Moreover, we will ensure that gene products annotation is closely coupled to ontology development. (The following section originally drafted by E. Dimmer). Any selected textbook process should be broken down into a list of subprocesses, each with a manageable target list - which we will aim to completely annotate within a few months. Smaller, well-focused target sets will ensure that curators will be annotating the same pathway at the same time. This will also simplify measuring our progress.

Metabolic processes

There are some biological processes which are carried out in all or most organisms, and where the central biochemical pathway differs relatively little between species. These kinds of processes are the ones where it is important to involve experts with domain expertise to produce a consistent annotation set for all species, and where any species-specific differences are fully highlighted. Such areas include:

  • glycolysis,
  • anaerobic respiration,
  • citric acid cycle,
  • oxidative phosphorylation
  • pentose phosphate pathway
  • fatty acid oxidation
  • gluconeogeneiss
  • porphyrin/heme synthesis
  • urea cycle
  • HMG-CoA reductase pathway
  • carbohydrate metabolism:
  • lactose synthesis
  • starch/amylopectin/glycogen synthesis
  • sucrose synthesis

The annotations in metabolic processes offered by the GO Consortium should be consistent with those found in pathway databases such as KEGG, BioCyc, UniProt UniPathways. We will take advantage of our existing collaborations with EBI resources and Reactome, and extend this approach to other resources to improve the GO and associated annotations. To do this requires the inclusion of representatives from these resources among the external experts we bring in. Such collaborations minimizes our efforts, avoids duplication, and assures that our annotations are consistent as possible with these other resources. At the Montreal meeting we agreed that it was important that all such pathway databases get together and agree on a common definition as to what a metabolic pathway involves - where it begins and ends. The software group would need to contribute to automate as much of this as possible. The ontology development group would work with them to bring as many of these external terms into the GO as possible in collaboration with the annotation an reference genomes group, who would direct GOC curators to completely annotate such targets for GO, ensuring that the gene products which regulate these pathways differently in different organisms are included. Using existing external resources could help us quickly through such metabolic 'book chapters', and GO’s manual annotation to gene products which regulate such pathways would provide 'added value' for our users.

Cellular processes:

  • Molecule transport
  • Exocytosis
  • Endocytosis (pinocytosis/phagocytosis)
  • Homeostasis
  • DNA replication
  • Reproduction
    • mitosis
    • meiosis
  • Protein synthesis
  • DNA transcription
  • RNA translation
  • Cellular signaling
  • chemical signaling (local-chemical mediator, hormone, neurotransmitter)
  • receptor-mediated signaling

All of the above are subject areas which are carried out by all/most MODs in the GO. And perhaps, again, where some of the core mechanisms are seen to differ only relatively slightly between MODs, so that it would make sense to have a group of curators who have domain knowledge, working with outside experts and Reactome as appropriate, make a comprehensive annotation set which would be integrated by all species. Taking mitosis as a possible annotation target: it is a central biological process, whose genes are important for many research communities. For instance, targets have a high degree of disease-relevance: “Tumor cell proliferation is frequently associated with genetic or epigenetic alterations in key regulators of the cell cycle. Most known oncogenes and tumor suppressors target entry into the cell cycle and control the G1/S transition.” (PMID: 17259655 e.g. http://carcin.oxfordjournals.org/cgi/content-nw/full/28/5/899/TBL1). However there are interesting differences in the mechanisms used between different species, therefore the comprehensive annotation of mitosis may nicely capture the similarities and differences that exist in different organisms (for instance: animals undergo an "open" mitosis, where the nuclear envelope breaks down before the chromosomes separate, while fungi such as S.cerevisiae undergo a "closed" mitosis, where chromosomes divide within an intact nucleus. It is also quite a challenging area, where ontology development is needed. It is likely that any target set could also be broken down into a range of subprocesses, for example in mitosis:

  • DNA replication
  • microtubule dynamics
  • centrosome/spindle formation
  • mitotic chromosome condensation
  • cytokinesis

For most of these topics we will already have some MOD expertise, such as SGD for cell cycle. Curators could also consider that subcellular structures that could additionally be fully curated from this target set, e.g. mitotic spindle, condensed chromosome, kinetochore.

Environmental Interaction and maintenance

  • abiotic stresses: response to cold/heat/oxidative/osmotic stress
  • response to nutrient deprivation
  • response to biotic stresses; pathogen attack (viral, bacterial, toxins from predators)
  • apoptosis, programmed cell death
  • aging
  • DNA damage repair
  • cell signaling

For example, oxidative stress is additionally involved in numerous human diseases, for example: atherosclerosis, Parkinson's disease, Heart Failure, Myocardial Infarction, Alzheimer's disease, Fragile X Syndrome[1], chronic fatigue syndrome, etc., and, in addition, oxidative stress will encompass some responses to pharmaceutical drugs.

Multi-cellular organism development, cell/organ differentiation

Those will be addressed after the basic cellular processes have been covered. Many (most?) of these projects will be annotated though collaborative work between curators with specific MOD expertise; as different structures can differ considerably between species. This work will provide an excellent opportunity for the ontology and cell ontology groups to ensure that the cell types and anatomy ontologies are used appropriately, and concurrently refined. We expect the work done in these subject areas to result in quite interesting papers for comparative biology/function, as orthologous genes in different species may produce very different organ structures. This is also an area where external expertise is essential, and may need additional support from external grants (analogous to current the Kidney Research UK and British Heart Foundation grants), where dedicated curators from these grants use expertise from the Reference Genome group to be improve their ability to confidently make an impact on complex subjects such as immunology, eye development, nervous system development, nutrition, embryogenesis?   Operations The decision of which topics to commit to annotating will be taken carefully, with extensive planning and guidelines for establishing priorities. We estimate the amount of time it takes to complete a topic adequately is approximately six months. We may be able to speed this up considerably by making the process more efficient , in particular by having concurrent annotation projects, but we should be conservative in our promises for what we will be able to annotate in a 5 year period. For instance, to annotate mitosis comprehensively we might need to annotate 250 genes, and this may take us up to 1 year to complete. We could ask individual members of the GO consortium to make proposals (in areas where they have expertise) which would include an assessment of current status, the likely amount of work required for completion, specific lists of genes and subprocesses, and the scientific impact and benefits that could entail. GO-top and managers would review these proposals carefully, looking at them with various criteria in mind, such as:

  1. The process is of importance to groups studying certain well-studied/funded human diseases (e.g. cancer, diabetes, asthma, neurological disorders)
  2. The process is one that GO could describe better than any other existing resource (i.e. we leverage the work done by existing metabolic pathway databases to rapidly improve GO annotation for the well-characterized enzymes, thereby giving us more time to spend on biological processes where GO is the only publicly-available option)
  3. The process has an active, enthusiastic and cohesive group of researchers. If a particular research community for a species has a very active focus on one particular type of biological process, then by deciding to annotate it we may be able to obtain more time from non-NIH funded curators. We benefit here from increased curator-power, and the involvement of external expertise that MODs may be able to draw into a particular topic.
  4. What combination of gene lists could be annotated, which together would provide us with the most exploitable annotation set? i.e. by combining the different gene sets can we satisfy a large number of our users and a better generate awareness of the value of GO. For instance the mitosis set could be exploited by looking at genes involved in phases of cell division, used to look at the different pathways involved in cancer, or composition of specific cellular components e.g. mitotic spindle, kinetochore. Could we then combine this gene list with another to provide annotations to a different research group? E.g. including annotations to biotic stress might enable us to look at the targets implicated in other diseases, or those involved in aging?).

The objective is to involve as many other projects and individuals as we can for each annotation topics. Users rarely use our GO annotations or the ontologies in isolation, so by synergising with other efforts being carried out in research, by text-mining, analysis tool developers or complementary ontology efforts we could ensure our annotation efforts are well targeted, authoritative and fully connected to our users needs.