Annotation Conf. Call 2015-11-10

From GO Wiki
Jump to navigation Jump to search

Bluejeans URL: https://bluejeans.com/993661940

Agenda

Deprecated Annotation Extension Relations

Entity IDs in Annotation Extensions

  • There is varied entity ID usage in the annotation extensions.
  • We'd like to review what currently exists and reach agreement on what we should use going forward
  • IDs used should be compatible with the source group submitting the GAF
  • The goal is to enable seamless curation and query
  • Suzi: Will alt_ids for these entities be supplied in the GAF?
    • David H: No, we don't currently supply synonyms in the GAF for entities used in Col. 16/AEs, but each group could supply all valid IDs in a gpad file
  • Paul T.: UniProt has mapping files and we should take advantage of this
    • Midori: Who is ultimately responsible for the mappings? The MODs? Not all groups maintain mappings for all possible IDs.
    • Suzi: With advice from the MODs, UniProt has agreed to provide mapping files
    • Kimberly, Midori: This is okay for MOD and UniProt IDs, but what about other sources?
  • Suzi: What other ID spaces are in use?
    • David H.: This is part of the purpose of this exercise - to see what ID space is being used.

Entity IDs in Annotation Extensions

Goals

For this meeting

To understand the scope of the project by discussing what types of objects annotators would like to use in annotation extensions.

Final Goals

  • To agree on a set of ID spaces that will be used in annotation extensions that provides both manual and computational consistency.
  • Ideally a user would be able to query on ANY type of ID that is deemed to be recognized by the GOC and return results of a seamless query of those IDs.
  • To use the results of this discussion for the creation of annotation documentation.

Initial Assumptions

  • We will not be able to mandate the types of IDs that are used by annotation groups, but we should be able to mandate that the IDs used are compatible/translatable to the ID spaces used by the group (MOD) that is primarily responsible for submission of annotations to the GOC.
  • Upon processing of submitted annotations, primary responsible groups (MODs) will translate IDs into the objects used for curation by that group. That group will then provide the translated (normalized) annotations to the GOC.
  • The GOC may then translate the submitted IDs for the purpose of data integration across species.
  • Before submission of annotations, the submitter and the primary responsible group (MOD) will work together to make sure that all ID spaces can be normalized. This should become an SOP for any new group wishing to submit annotations.

Types of IDs used in Annotation Extensions

  • IDs used to represent genes
    • MOD gene identifiers (MGI:MGI:, WB:, ZFIN:ZDB-GENE-, TAIR:locus: etc)
    • Generic UniprotKB Ids (UniProtKB:)
    • ENSEMBL gene IDs (Ensembl:)
    • NCBI gene IDs (NCBI_gene:)
    • RNA central IDs (RNAcentral:)
    • HGNC IDs (HGNC:)
  • IDs used to represent cell types
    • cell ontology IDs
    • wormbase anatomy and cell IDs
    • Plant ontology IDs
  • IDs used to represent chemicals
    • ChEBI
  • IDs used to represent gene products
    • Proteins/Proteoforms
      • Protein ontology IDs (PR:)
      • UniProt isoform-specific IDs (UniProtKB:######-#)
      • MOD gene identifiers
    • Transcripts
      • EMBL IDs
      • MOD gene identifiers
  • IDs used to represent protein domains
    • InterPro IDs
  • IDs used to represent biological processes
    • GO IDs
  • IDs used to represent molecular functions
    • GO IDs
  • IDs used to represent cellular components
    • GO IDs
  • IDs used to represent anatomical structures
    • EMAPA IDs
    • UBERON IDs
    • Wormbase anatomy and cell IDs
    • Plant ontology IDs

Next Curation Consistency Exercise

  • 2015-11-24
  • TAIR is up next to select a paper
  • Continue with consistency exercises in 2016?
  • Suggestions for changes or improvements?
    • Model each paper in LEGO
  • Groups still to select paper: dictyBase, EBI/UniProt, BBOP (Moni?), NextProt, USC, AgBase, anyone else?

Minutes

  • On call: Aleks, Alex, David H., Edith, Kimberly, Li, Melanie, Midori, Paul T., Petra, Rachael, Ruth, Shur-Jen, Stacia, Stan, Suzi, Tanya

Deprecated Annotation Extension Relations

Entity IDs in Annotation Extensions

  • There is varied ID usage in annotation extensions
  • We'd like to review what currently exists and agree on what we should use going forward
  • IDs used should be compatible with ID space of the submitting source of the GAF
  • The goal is to have seamless curation and querying of the annotations
  • Suzi: Will alt_ids be supplied in the GAFs?
    • David H.: No, ID synonyms are not provided in the GAFs for entities used in Col. 16, but each group could provide all valid IDs in a gpi file
  • Paul T.: UniProt has mappings files; we should take advantage of this
    • Midori: Who is responsible for maintaining the mappings? MODs? Not all groups maintain mappings for all possible namespaces.
    • Suzi: With advice of MODs, UniProt will provide the mappings files.
    • Kimberly, Midori: This is okay for MOD and UniProt IDs, but what about other IDs?
  • Suzi: What other ID space is in use?
    • David H.: This is one of the goals of this work - to determine what ID space is currently used and what we should use in the future.
  • Ruth, Rachael: Have been using ENSEMBL gene IDs for AEs, but could change to UniProt IDs to represent a gene. Also have been using RNAcentral IDs, but this is okay for representing ncRNAs.
  • David H.: The submitting group need to make sure all ID spaces can be normalized; the MOD - UniProtKB - annotator need to all be in agreement
  • Review of specific types of IDs used
    • Gene IDs
      • HGNC - the human MOD?
      • Paul T.: UniProt Reference Proteomes map to HGNC identifiers as the preferred source of human gene identifier
      • Ruth: Can see why UniProt uses HGNC as this allows them to retrieve the correct gene symbol, but there won't necessarily be a 1:1 relationship between HGNC gene and UniProt protein IDs
      • David H.: MGI leaves UniProtKB IDs as is - should this continue? This question touches on the issue of what identifiers should be used with what relation
      • Rachael: If we change the ENSEMBL gene IDs to UniProtKB IDs, will we have to then change back to a gene ID is we decide that the AE entity should be a gene?
      • David H.: Parent UniProtKB IDs are thought of as the equivalent of gene IDs in GO, so they should be okay as a gene ID.
    • Gene Product IDs
      • Existing IDs used here are okay, but we need to have guidelines on how to treat isoforms
      • Please send examples to David and Kimberly for AEs where you'd like to represent isoforms
    • InterPro IDs
      • David H.: Some discussion of this wrt protein binding annotations at the DC meeting. See meeting minutes for proposal.
      • Would like more examples for this, too.
    • Anatomical Structures
      • Mappings between different ontologies still needs to be done
      • What anatomy ontologies do groups use?
      • Suzi: Groups should coordinate with Chris to make sure that their cell and anatomy ontologies are compatible with UBERON; some work still needs to be done here.
    • Midori: Need to also include SO identifiers and Pfam identifiers

Curation Consistency Exercises

  • Tanya: Will chose a paper from TAIR for next exercise and collate annotations
  • David H., Ruth, Paul T. expressed support for continuing with the consistency exercise
  • Ruth: If papers chosen are particularly complicated, we could always decide to focus on a subset of Figures/experiments
  • Tanya: Would be nice to have a consistent template for collating annotations
  • Ruth: From these exercises, we've reached some agreements on how to annotate, but we need to provide better documentation, ideally linked to the papers, to illustrate how to annotate and why
  • Paul T.: agrees that we need work on the documentation - this will be a formidable task