Annotation Conf. Call 2015-11-10

From GO Wiki
Jump to navigation Jump to search

Bluejeans URL: https://bluejeans.com/993661940

Agenda

Deprecated Annotation Extension Relations

Entity IDs in Annotation Extensions

  • There is varied entity ID usage in the annotation extensions.
  • We'd like to review what currently exists and reach agreement on what we should use going forward
  • IDs used should be compatible with the source group submitting the GAF
  • The goal is to enable seamless curation and query
  • Suzi: Will alt_ids for these entities be supplied in the GAF?
    • David H: No, we don't currently supply synonyms in the GAF for entities used in Col. 16/AEs, but each group could supply all valid IDs in a gpad file
  • Paul T.: UniProt has mapping files and we should take advantage of this
    • Midori: Who is ultimately responsible for the mappings? The MODs? Not all groups maintain mappings for all possible IDs.
    • Suzi: With advice from the MODs, UniProt has agreed to provide mapping files
    • Kimberly, Midori: This is okay for MOD and UniProt IDs, but what about other sources?
  • Suzi: What other ID spaces are in use?
    • David H.: This is part of the purpose of this exercise - to see what ID space is being used.

Entity IDs in Annotation Extensions

Goals

For this meeting

To understand the scope of the project by discussing what types of objects annotators would like to use in annotation extensions.

Final Goals

  • To agree on a set of ID spaces that will be used in annotation extensions that provides both manual and computational consistency.
  • Ideally a user would be able to query on ANY type of ID that is deemed to be recognized by the GOC and return results of a seamless query of those IDs.
  • To use the results of this discussion for the creation of annotation documentation.

Initial Assumptions

  • We will not be able to mandate the types of IDs that are used by annotation groups, but we should be able to mandate that the IDs used are compatible/translatable to the ID spaces used by the group (MOD) that is primarily responsible for submission of annotations to the GOC.
  • Upon processing of submitted annotations, primary responsible groups (MODs) will translate IDs into the objects used for curation by that group. That group will then provide the translated (normalized) annotations to the GOC.
  • The GOC may then translate the submitted IDs for the purpose of data integration across species.
  • Before submission of annotations, the submitter and the primary responsible group (MOD) will work together to make sure that all ID spaces can be normalized. This should become an SOP for any new group wishing to submit annotations.

Types of IDs used in Annotation Extensions

  • IDs used to represent genes
    • MOD gene identifiers (MGI:MGI:, WB:, ZFIN:ZDB-GENE-, TAIR:locus: etc)
    • Generic UniprotKB Ids (UniProtKB:)
    • ENSEMBL gene IDs (Ensembl:)
    • NCBI gene IDs (NCBI_gene:)
    • RNA central IDs (RNAcentral:)
    • HGNC IDs (HGNC:)
  • IDs used to represent cell types
    • cell ontology IDs
    • wormbase anatomy and cell IDs
    • Plant ontology IDs
  • IDs used to represent chemicals
    • ChEBI
  • IDs used to represent gene products
    • Proteins/Proteoforms
      • Protein ontology IDs (PR:)
      • UniProt isoform-specific IDs (UniProtKB:######-#)
      • MOD gene identifiers
    • Transcripts
      • EMBL IDs
      • MOD gene identifiers
  • IDs used to represent protein domains
    • InterPro IDs
  • IDs used to represent biological processes
    • GO IDs
  • IDs used to represent molecular functions
    • GO IDs
  • IDs used to represent cellular components
    • GO IDs
  • IDs used to represent anatomical structures
    • EMAPA IDs
    • UBERON IDs
    • Wormbase anatomy and cell IDs
    • Plant ontology IDs

Next Curation Consistency Exercise

  • 2015-11-24
  • TAIR is up next to select a paper
  • Continue with consistency exercises in 2016?
  • Suggestions for changes or improvements?
    • Model each paper in LEGO
  • Groups still to select paper: dictyBase, EBI/UniProt, BBOP (Moni?), NextProt, USC, AgBase, anyone else?

Minutes

  • On call: Aleks, Alex, David H., Edith, Kimberly, Li, Melanie, Midori, Paul T., Petra, Rachael, Ruth, Shur-Jen, Stacia, Stan, Suzi, Tanya

Deprecated Annotation Extension Relations

Entity IDs in Annotation Extensions

  • There is varied ID usage in annotation extensions
  • We'd like to review what currently exists and agree on what we should use going forward
  • IDs used should be compatible with ID space of the submitting source of the GAF
  • The goal is to have seamless curation and querying of the annotations
  • Suzi: Will alt_ids be supplied in the GAFs?
    • David H.: No, ID synonyms are not provided in the GAFs for entities used in Col. 16, but each group could provide all valid IDs in a gpad file
  • Paul T.: UniProt has mappings files; we should take advantage of this
    • Midori: Who is responsible for maintaining the mappings? MODs? Not all groups maintain mappings for all possible namespaces.
    • Suzi: With advice of MODs, UniProt will provide the mappings files.
    • Kimberly, Midori: This is okay for MOD and UniProt IDs, but what about other IDs?
  • Suzi: What other ID space is in use?
    • David H.: This is one of the goals of this work - to determine what ID space is currently used and what we should use in the future.
  • Ruth, Rachael: Have been using ENSEMBL gene IDs for AEs, but could change to UniProt IDs to represent a gene. Also have been using RNAcentral IDs, but this is okay for representing ncRNAs.
  • David H.: The submitting group need to make sure all ID spaces can be normalized; the MOD - UniProtKB - annotator need to all be in agreement
  • Review of specific types of IDs used
    • HGNC - the human MOD?
      • Paul T.: UniProt Reference Proteomes map to HGNC identifiers as the preferred source of human gene identifier
      • Ruth: Can see why UniProt uses HGNC as this allows them to retrieve the correct gene symbol, but there won't necessarily be a 1:1 relationship between HGNC gene and UniProt protein IDs