Noctua: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 138: Line 138:
* In addition, curators can add annotations to:
* In addition, curators can add annotations to:
** Biological Process (the form assumes a part_of relation between the Molecular Function and Biological Process)
** Biological Process (the form assumes a part_of relation between the Molecular Function and Biological Process)
*** Existing annotations+evidence can be imported as described in 3.b. above for Molecular Function
*** Additional BP part_of "extensions" can be made to provide contextual information to the BP term.
*** Additional BP part_of "extensions" can be made to provide contextual information to the BP term.
** Cellular Component (the form assumes an occurs_in relation between the Molecular Function and Cellular Component)
** Cellular Component (the form assumes an occurs_in relation between the Molecular Function and Cellular Component)
*** Existing annotations+evidence can be imported as described in 3.b. above for Molecular Function
*** Additional part_of "extensions" can be made to provide contextual information about cell and/or tissue type.  
*** Additional part_of "extensions" can be made to provide contextual information about cell and/or tissue type.  



Revision as of 16:04, 18 July 2018

This documentation is still under construction as of June 2018. Feedback is welcome; please email Kimberly with your suggestions.

Introduction

What is Noctua?

Noctua is a web-based, collaborative GO annotation editor. While creation of simple GO annotations is supported, Noctua was designed to enable connecting GO annotations, thus enriching the expressivity of the annotations and presenting a more complete picture of biology. Models produced with Noctua are called GO-CAM models. The overall goal is for each model to represent a unit that corresponds to a biological pathway. This document describes how to make GO-CAM models using Noctua.

What is a simple GO annotation?

A simple GO annotation is a gene product associated to a GO term, using an evidence code and a supporting reference (a primary research article, for example). The GO term may come from any of the three aspects of the GO: Molecular Function (MF), Biological Process (BP), or Cellular Component (CC). Gene products can correspond to proteins, complexes, or non-coding RNAs, and must be represented by a stable identifier. Gene identifiers may serve as representative of one or more gene products.

What is a GO-CAM model?

A GO-CAM model is a combination of simple GO annotations to produce a network of annotations ("model"). Minimally, a model must connect at least two simple annotations. The primary unit of biological modeling, or annotation, in GO-CAM is a molecular activity, e.g. protein kinase activity, of a specific gene product or complex. A molecular activity is an activity carried out at the molecular level by a gene product; this is specified by a term from the GO MF ontology. GO-CAM models are thus connections of GO MF annotations enriched by providing the appropriate context in which that function occurs. All connections in a GO-CAM model, e.g. between a gene product and activity, two activities, or an activity and additional contextual information, are made using clearly defined semantic relations from the Relations Ontology.

Providing context for molecular activities

One major difference between simple annotations and GO-CAM models is that the former does not have explicit relations between the gene product being annotated and the GO term. In GO-CAM models, using defined, semantic relations allows us to capture how a gene product’s molecular function relates to other aspects of GO. GO-CAM explicitly defines the relationships between: 1) different aspects (MF, BP, CC) of each gene product as defined in GO, 2) the combined functions of different gene products (“pathways”), and 3) different systems of interacting functions (“modules”).

  • a molecular activity may act upon another “target” molecule, this can be specified using a gene product identifier (for a protein or a gene) or a term from the ChEBI ontology (for a small molecule). In this case the MF is qualified with "has input [target_id].
  • a molecular activity occurs in a location: this includes cellular structures (described by a GO CC class (i.e. term), excluding the “macromolecular complex” branch), which can be further nested within larger structures using the appropriate cell and anatomy ontologies.
  • a molecular activity is part of (i.e. helps to accomplish) a biological process, i.e. a biological program that also includes other molecular activities, which is described by a GO BP class (i.e. term). In turn, a biological process can be nested inside an even larger biological process.
  • if the molecular activity occurs during a particular biological phase (e.g. a particular stage in organism development), this can be specified using a term from an appropriate ontology; i.e. any descendant of the term “GO:0044848 biological phase”.
  • a molecular activity may also have upstream, causal roles with respect to a process, "acts upstream of" or its precise relationship to a process may not yet be known, in which case the relation would be "acts upstream of or within". Each of these relations may further be qualified by indicating a positive or negative effect, e.g. "acts upstream of, positive effect".

Linking different molecular activities

Once the GO-CAM unit has been created (MF+BP+CC), these different units can be linked to each other to represent a causal activity model. The most common relations are directly (positively/negatively) regulates and provides input for, but there are other relations of greater and lesser specificity, depending on what is known. Regulates should be used to denote biological control of a downstream activity. Provides input for should be used when there is no control, but an upstream function creates a molecular entity that is the target of the downstream function, such as in a metabolic pathway.

Incomplete GO-CAMs

We recognize that the knowledge of biology is incomplete; in cases where some or most of these aspects are unknown, a model may still be constructed with details added as more information becomes available. Users should attempt to specify functions as fully as possible, but partial models are expected and still contribute to the GO knowledgebase.

System Requirements

  • A web browser; Chrome is recommended.

Account Setup

Fill out the online new user form and contact sjcarbon at lbl dot gov once complete. Propagating the metadata information may take a little time, so please do this as early as possible.

To fill out this form, you need to have three things:

If you don't already have a GitHub or ORCID account, please obtain these before continuing (note that in exceptional circumstances, it is possible to use Noctua without these).

Entities and Ontologies for Annotation

Genes and Gene Products

Gene Ontology

Contextual Ontologies

Cell and Anatomy Ontologies

Chemical Ontologies

Life Stage Ontologies

Sequence Ontologies

Using Noctua

Noctua URL

Login

  • GO-CAM models may be viewed without logging in to Noctua, but if you wish to create new annotations or make edits to a model, you must be logged in.
  • To log in, click on the Login button in the upper right corner, and on the resulting page click on "Sign in with Github." When you are signed in, press the "Return" button to return to the Noctua landing page.

Filtering and Searching Models

  • The existing models list can be filtered using the search box just above the list of available models.
  • Currently, models may be searched by title or the orcid of anyone who has contributed to the model.

Navigating the Noctua landing page

    Tip: You must be logged in to see the two options for starting a new model.
  • For starting a new GO-CAM model, you can begin entering annotations in two ways (Figure 1):
    Figure. 1 Starting a new model
    • Create new model in Form
      • The Form editor provides three task-specific forms for entering annotations:
        • Default - use this form to make a statement about a gene product's activity when that activity is an integral 'part of' a biological process and that activity 'occurs in' a specific cellular component.
        • CC only - use this form to make cellular component annotations only for a gene product when you do not want to make a statement about the gene product's activity in those locations.
        • BP only - use this form to make biological process annotations when you wish to relate a gene product to a biological process when its activity is not an integral part of that process, e.g. the gene product affects a biological process but the underlying mechanism for its action is not known.
    • Create new model in Graph Editor
      • The Graph Editor allows curators to make any type of annotation, but does not provide task-specific 'templates' and does not have all of the annotation shortcut and search capabilities of the Form editor.
      • The Graph Editor is used to link individual annotations entered via the Form interface to create more connected, or complex, GO-CAM models.

Navigating between editors

  • While working on a GO-CAM model, curators may wish to go back and forth between the Form editor and the Graph editor; this can easily be accomplished through the respective form menus.
  • From the Form editor, select View > Graph editor to enter the Graph editor (Figure 2). The graph editor will open in a new browser window.
  • From the Graph Editor, select Workbench > Noctua form (Figure 3). The Form editor will open in a new browser window.
Figure 2. Navigating from the Form to the Graph Editor
Figure 3. Navigating from the Graph Editor to the Form

Editing an existing model

Note: This section will need updating as the new Form for annotation review and editing comes on board.

  • From the Noctua homepage, click on the blue "Edit" button in the rightmost column of the model table (Figure 1).
  • This will take you to the Graph Editor view of the model, where you may make changes to your annotations.
  • More information about how to edit an existing model can be found in the section on Editing a model below.
 Curation note: Wherever possible, curators should try to build on an existing GO-CAM model, rather than create separate models for each paper curated.  This will allow the set of GO-CAM models to provide the most accurate, up-to-date view of a given Biological Process.

Creating a new model using the Noctua form

  • To start a new model using the Noctua form, click on the 'Create new model in Form' link found on the Noctua homepage.
  • This will take you to the annotation Form where you can add model metadata and create GO annotations (Figure 4).
Fig. 4 Noctua form.

Step 1. Adding model metadata

  • In addition to the GO annotations that you'll make, each GO-CAM model has specific metadata associated with it, e.g. a title, production state, and curator group.
  • Model metadata is added at the top of the form. From left to right, add the following metadata to your model:

1.a Enter a title for your model

Curation note: Currently, curators can add any text to create a meaningful title to their model, e.g. species, biological process, PMID, etc.  In the future, however, we may converge upon minimal standards for model naming.

1.b Select a model state

  • By default, all models begin in a "Development" state.
  • When ready, models may be moved to a "Production" state for publication by clicking on the grey downward-facing arrowhead, and selecting "Production" from the dropdown list.
  • Development models will not be published on geneontology.org and resulting "conventional" annotations derived from development models are NOT included in the derived GPAD file.

1.c If needed, change your annotation group

  • Some GO curators perform annotation for more than one group, e.g. if they are funded by more than one project.
  • Annotation groups are associated with curators in the users metadata file described above.
  • By default, the first group associated with your entry in the metadata file is the group listed in the form.
  • If you belong to multiple groups, you can select the appropriate group for your current work by clicking on the grey downward-facing arrowhead, and selecting the appropriate group from the dropdown list..

Step 2. Selecting a curation template

  • Currently, there are three different curation templates available in the Form: Default, BP only, and CC only.
  • The Default template is used to create annotations when the curator is confident that the molecular activity (Molecular Function, MF) of the gene product is an integral part of a Biological Process (BP) and that the Cellular Component (CC) is the location in which the activity occurs.
  • The Default template can also be used to annotate combinations of MF, BP, and CC, but the curator must always enter an MF annotation and understand that they are making explicit statements about the relationship between that activity and the BP and CC, i.e. the MF is an integral 'part of' the BP or the MF 'occurs in' the CC. If you're not sure that this is the statement you want to make, you can use the BP only or CC only forms to make your annotations.
  • To help guide curators in understanding the statement that they're making, relations between ontology terms are shown in the field where that term is added.
  • The CC only template is used to make cellular component annotations for a gene product specifically when you do NOT want to make a statement about the gene product's activity in those locations.
  • The BP only template is used to make biological process annotations when you wish to relate a gene product to a biological process but its activity is not an integral 'part of' that process, e.g. the gene product affects a biological process but the underlying mechanism for its action is not known.
Curation note: When making annotations in the form, try to fill in as many fields as possible, by typing in the field, and then selecting from the autocomplete suggestions by moving the mouse over your selection and clicking on it.
Tip: In the autocomplete, enter a space after a complete word, to narrow down the choices.

Step 3. Creating annotations using the Default template

3.a. Enter gene product or macromolecular complex to be annotated

By default, the form allows you to enter a single gene product. Start typing, choices will appear, and then select the gene product.

Tip:
You can type in the gene symbol, e.g. Wnt3a or the unique identifier or accession, e.g. UniProtKB:P56704.  If necessary to narrow down the choices, type a space after the symbol, and enter the three letter code for the species (first letter from genus and two from species name, e.g. mmu for Mus musculus).  Each entry in the autocomplete will also show the associated unique database identifier or accession, so curators can confirm that they are selecting the appropriate entity for annotation.

3.b. Enter the molecular function, evidence, and reference

  • For the default version of the form, these three fields are required.
  • If the Molecular Function is known, enter the appropriate GO term, evidence code, and reference (you can add multiple pieces of evidence by clicking on the "..." button to the right of the fields, and selecting "More evidence").
    • Alternatively, you can select an existing annotation+evidence from the GO annotation database, by clicking on the "..." button to the right of the fields and choosing "Search database". With this option, you can select multiple annotations to the same term, which will add evidence from all selected annotations.
 Curation note: Wherever possible, curators should use a PMID as a reference, e.g. PMID:29802214.  If a PMID is not available, curators may use, in order of preference, a doi or an internal database paper identifier.  Curators may also use a reference from the GO Reference Collection.
  • If the Molecular Function is unknown, but you are also making a Biological Process annotation, enter "molecular_function" and the same evidence and reference as the Biological Process annotation.
  • If the Molecular Function is unknown, and there is no evidence for what the Molecular Function might be, enter "molecular_function" and the ND evidence code.

3.c. Enter other fields (optional)

  • For Molecular Function, the following "extensions" can optionally be added:
    • has_input(molecule): fill in the "has input" field, evidence, and reference.
    • happens_during(biological phase): fill in the "happens during" field, evidence, and reference.
  • In addition, curators can add annotations to:
    • Biological Process (the form assumes a part_of relation between the Molecular Function and Biological Process)
      • Existing annotations+evidence can be imported as described in 3.b. above for Molecular Function
      • Additional BP part_of "extensions" can be made to provide contextual information to the BP term.
    • Cellular Component (the form assumes an occurs_in relation between the Molecular Function and Cellular Component)
      • Existing annotations+evidence can be imported as described in 3.b. above for Molecular Function
      • Additional part_of "extensions" can be made to provide contextual information about cell and/or tissue type.
We recommend that you fill in as many fields as possible before creating the activity, as after it is created, you will need to edit it from the graph canvas, which requires more steps to do.

Step 3. Add the new activity to a model

Press the CREATE button. A new activity will appear on the graph canvas (the main window).

Tips:
1. Each new activity will appear on the same part of the canvas, so if you add more than one activity you will need to move them around on the canvas (by clicking and dragging) to see the ones underneath.
2. If the CREATE button is grayed-out, there is some information missing from the form that you still need to fill in.  You can press the "why is the save button disabled?" for a list of missing fields.

Specifying the causal ordering of the activities

Once you have created at least two activities, you can specify the causal relations between them. This is done on the graph canvas, by dragging from the blue circle of the upstream activity box, onto the downstream activity box (Fig. 3). You can then select the relation. Relations that are “direct” mean that there is a physical interaction mediating the effect on the downstream activity.

Fig. 3 Making causal relations between activities.

Choosing the right causal relation between activities (MFs)

Direct Relations

Activities mediated by small molecule concentration

Small molecules can be substrates (inputs) of activities, created by activities (outputs) or modulators of activities (regulatory). In these cases, GO-CAM models make explicit nodes representing small molecule concentrations. To add a small molecule to a model, use the "Add Individual" item on the left of the graph canvas. These should have CHEBI identifiers.

  • a small molecule in a metabolic pathway: in this case, connect the upstream activity (e.g. hexokinase activity) to its output (glucose-6-phosphate) using the has_output relation. Then connect the small molecule to the downstream activity (e.g. phosphoglucose isomerase activity) using the has_input relation.
  • regulation via a small molecule intermediate: in this case the downstream activity must be a compound function, i.e. you will need to create TWO DISTINCT activities for the same gene product. The first activity must be X binding, where X is the small molecule. The second activity is the regulated activity. Connect the upstream activity to the small molecule using has_output, and the small molecule to the X binding activity using has_input. Then connect the first activity of the compound activity to the second one using a directly positively regulates or directly negatively regulates relation.
    • ADCYA1 creates cAMP, which is an input to the cAMP binding function of PKCR1. The cAMP binding function of PKCR1 then directly negatively regulates the protein kinase inhibitor activity of PKCR1.
    • ADCHE1 breaks down acetylcholine, which directly binds to ACHR1 (acetylcholine binding) and activates its GPCR activity.

Activities mediated by an intervening biological process

  • Similarly to mediation by small molecule concentration, the effects of some molecular activities on other activities are not strictly direct, but are mediated by a biological process. Key examples are transcriptional regulation, regulation by ubiquitination and degradation, and regulation via membrane depolarization.
  • In these cases, link the upstream activity (e.g. RNA polymerase II transcription factor activity, sequence-specific DNA binding, (GO:0000981)) to the mediating process (e.g. positive regulation of transcription by RNA polymerase II (GO:00045944)) with part of, and the mediating process (e.g. positive regulation of transcription by RNA polymerase II promoter (GO:00045944)) to the downstream activity (the activity of the transcribed gene product) with causally upstream of, positive effect.
  • The equivalent model would be made if the transcription factor activity negatively regulated transcription by using the appropriate GO Biological Process term and the "casually upstream of, negative effect" relation between the transcription BP term and the downstream target.

Indirect and unknown causal mechanisms

  • If the mechanism of the causal relation is not known, use the more general causally upstream of relations (these can include a positive/negative effect, if known).

Subfunctions: specifying more detail about molecular activities

Sometimes, molecular activities are composed of distinct subfunctions, and those subfunctions may even be carried out in distinct locations, or by distinct subunits of a complex. For example you may want to specify “hormone binding” in the “cytosol” as a subfunction of a nuclear receptor, that then activates (directly positively regulates) “transcription factor activity” in the “nucleus”. To specify subfunctions, you will create new activities and link them to an activity that you have previously created that describes the overall function of the gene product (e.g. “nuclear receptor activity”). Subfunctions (e.g. “hormone binding”) can be created using the Noctua form, but do not fill in the biological process field as it is the same as for the overall function. Once the new activity is created, link it to the overall molecular function you created earlier, by dragging (on the graph canvas) from the subfunction activity (blue circle) to the overall activity, and selecting the “part of” relation. You will then need to add evidence by clicking on the "part of" edge; a box will pop up; fill in the evidence fields and press the "Add" button.

Editing a model

Editing can currently be performed only on the graph canvas (the simple annoton editor form does not pick up any operations you have performed on the graph canvas).

Note that only one edit operation can be done at a time.  To change something on the canvas, you will need to first ADD the correct part, and then DELETE the incorrect part, as separate operations.  We recommend that you add first, so that you can transfer evidence from the incorrect part if necessary, by using the “clone other” operation.

Editing relations

Relations can be removed by dragging the end of the relation arrow away from the box it connects to, into an empty part of the canvas. Relations can be added by clicking on the blue circle inside the upstream box, and dragging to the downstream box. Evidence for a relation can be edited by clicking on the relation arrow.

Editing the type/label on a graph node

To edit a simple box on the graph (no colored bars indicating that it has multiple parts folded together for easy viewing), just click on the green square. To change it, first add the new term by filling in the field under “add type”, and clicking add. Then reopen the box again and delete the old term by clicking on the red “x” next to it.

Editing types/labels that are inside a graph node

  • To edit properties of an activity that are “folded” into the molecular activity box on the canvas, click on the green box in the corner of a box. Note that only one edit operation can be done at a time, so do not make more than one edit before pressing a button to save the edit. To change part of the annoton, you will need to first ADD the corrected part, and then DELETE the incorrect part, as separate operations.
  • To remove a property of the annoton, click the “x” next to it.
  • To edit the evidence, click on the “E” next to the part for which you want to edit evidence (e.g., the “E” next to enabled by is the evidence that the molecular function is enabled by the gene product).

Making "traditional" (single aspect) GO annotations using Noctua

Molecular function annotation

  • Use the "default" form
  • Fill in the gene product field
  • Fill in the molecular function field, including evidence
  • Optionally, the following "extensions" can be added:
    • has_input(molecule): fill in the "has input" field and evidence
    • happens_during(biological phase): fill in the "happens during" field and evidence
    • occurs_in(cellular component): fill in the "cellular component" field and evidence
    • part_of(biological_process): fill in the "biological process" field and evidence

Cellular component annotation

This is for annotations of where a gene product has been observed (but is not known to be active). Note that these annotations have a different meaning than using the default form: the gene product has been observed in the CC, but may or may not be active there.

  • Use the "CC only" version of the form (select by clicking on the drop-down on the right that says "DEFAULT").
  • Fill in the gene product field.
  • Fill in the cellular component field with the desired GO term, and evidence.
  • * Optionally, one or both of the following "extensions" can be added:
    • part_of (a larger cellular component)
    • part_of (cell type)
    • part_of (anatomy)

Biological process annotation

This is for annotations that assert a relationship to a BP other than part_of, e.g. for regulates or causally upstream of relations.

  • Use the "BP only" version of the form (select by clicking on the drop-down on the right that says "DEFAULT").
  • Fill in the gene product field.
  • Choose the relation between the gene product and the BP.
  • Fill in the biological process field with the desired GO term, and evidence.
  • Optionally, one or both of the following "extensions" can be added:
    • part_of (cell type)
    • part_of (anatomy)

Naming your Model and Saving your Work Using the Graphical Editor

While you create or edit your model, you will see an asterisk appear around the "Untitled" text in your browser tab. The asterisk indicates that your work is not yet saved, and the "Untitled" indicates that you have not yet named your model. To name your model and save your work, click on the drop-down menu under the Model heading and select the "Edit Annotations" option. In the "Title" section, add a title for your model. The beginning of the title will now appear in the browser tab. To save your work, click on the Model heading again and select the "Save" option. Your work is now saved and the asterisk in the tab will disappear. Save your work often while editing!

Tip:
If your model already has a name, you will need to delete the name first, before you can rename it.  Follow the same instructions above, but press the Delete button next to the name instead

How to Make a Model Public (Production)

  • By default, new models are considered under "development" meaning that curators may work on the model, but the model, and any GO annotations derived from it, are not available for public consumption
  • This allows curators to work on a model over a period of time, perhaps review them with colleagues or experts in the field, and then publish them to the GO or other web sites.
  • When ready, curators have the ability to explicitly change the production status of their model.
  • To do this in the Simple annoton editor, select "Production" from the State drop-down to the right of the model title.
  • To do this, in the graphical editor, click on the Model drop down menu and select "Edit annotations" from the list.
    • Under the "Annotation state" section, delete the "Development" status.
    • Return to the Model drop-down, select "Edit annotations" from the list and under "Annotation state" select "Production" from the drop-down list.
    • Production - model will be available for viewing on the GO web site and annotation files available for consumption

Noctua Output Files

GPAD

OWL

Providing Feedback

  • Bug reports and requests for new features should be entered on the GO's Noctua issue tracker on GitHub.
  • Before entering a new ticket, please be sure to search the tracker to see if the bug or feature request has not already been reported!