Extension of Protein2GO to non-UniProtKB Identifiers

From GO Wiki
Revision as of 14:29, 9 December 2013 by Vanaukenk (talk | contribs) (→‎Minutes)
Jump to navigation Jump to search

Conference Call Agenda

Google Spreadsheet

https://docs.google.com/spreadsheet/ccc?key=0Aiei4RvoiQdqdHBFVEcwXzRvcW94V2JOLVFSNjJaTHc&usp=drive_web#gid=0


What types of entity identifiers might be needed?

  • Proteins not in UniProtKB
  • ncRNAs
    • Examples
      • C. elegans gene lin-4 encodes a miRNA that regulates gene expression during larval development - Currently annotations are made to the WB gene ID
  • Orphan genes
    • Examples
      • C. elegans gene abc-1 is an uncloned locus defined by a variation that results in defective chromosome segregation - Currently annotations are made to the WB gene ID
  • Protein complexes

Knowledge Representation

  • What kind of biological statements do we want to make?
  • Given these statements, what is the appropriate resource for the entity IDs?
  • How will this be represented in the GAFs/GPADs?

Practical Considerations

  • How many of each type?
  • ID stability - if there is churn, can IDs be mapped forward, not go stale?

Overview of representation of complexes in ontology

Minutes

In attendance: Chris, Fiona, Harold, Judy, Kimberly, Paul S., Petra, Rama, Sandra

Unfortunately, people from the UK were not able to call in to the conference line, so we were missing Rachael and Tony.

  • Summary of issue: some entities that curators would like to use for GO annotation cannot currently be used with Protein2GO
    • Examples:
      • Proteins that don't have UniProtKB IDs
      • Gene IDs
      • ncRNA IDs
      • Protein Complex IDs
  • There are two parallel issues here wrt curation:
    • What entities are needed and how to get them into Protein2GO
    • How to consistently represent annotations to other entities (e.g. protein complexes) across the consortium wrt the annotation files (e.g., perhaps a separate GAF for annotations of protein complexes) and wrt how annotations are propagated to protein complex members
  • Issue #1
    • It seems that there is agreement on which database IDs would be helpful to start:
      • MOD gene IDs
      • NCBI RefSeq IDs
      • MOD protein IDs
      • Protein complex IDs - IntAct, RACE-PRO