Ontology meeting 2015-01-13

Attendees: Chris, Heiko, DavidOS, DavidH, Paola, Tanya, Jane, PaulT

Minutes: Tanya

Pending AIs for the ontology group from the Barcelona meeting

"Ontology group look into creating intermediate templates for complexes".

Examples? Where are we with this? What needs done?

  Enzyme complex ones?  MF - complex (capable of) is already in place. 
  Same complex in different locations? template for that? Shuttling complexes depending on different modifications (phosphorylation, myristoylation)
  Not clear what we would add. Continue to use free form.
  Create guidelines for annotation vs. ontology development.  Annotate instead of create a new term. 
  What is cc?  Where does it function vs. where is it found?  Many times location = place of function, but not in all cases.
  Protein complexes != cell anatomy, they are in the cell but that's about it.  
  Can we tag the protein complexes with a location, this is where they are sublocalized within the cell. 
  Aim for cellular component = cell anatomy?
  May run into taxon issues, different localization in different taxa.

On a related note, a reminder that we put this down: "GO annotations for IntAct complexes: Harold: GO, IntAct, and PRO need to coordinate better so that they are aware of the new and planned developments on each project, to ensure there is no duplicated effort."

  GO and IntAct do coordinate, how about PRO?  What happens when direct protein complex annotation goes live?  Complex annotations live in the PRO, 
  we (GO) want the PRO annotations to be incorporated into GO.  Right now, PRO does not submit a GAF to GO.  First step would be to have it in the
  SVN.  Concern: coordinating species specific annotations by PRO curators with their responsible MODs.  Need to play by the GO rules for annotation.
  Work with these people to get automated checks set up.

  Harold to give overview of PRO during an annotation call? Complexes and their annotation.
  Look into getting IntAct ids as xrefs in GO? Look into this more.
  Something needs to be automated for complex creation via TermGenie.
  Make a Trello item to address this.

"Readability of term labels:"

Chris’s 3 proposals based on David OS’s slides: If TG detects the term is overly composed (automatically detected based on a profile of ‘overly composed’), the requester is forced to enter an example of usage in the comment (ie a free text description of their annotation)

Is this check already in place in TG? If not, do we need to make a specific ticket?

  What does it mean to be overly composed?  Bunch of criteria?  Term length to start with?  Build into TermGenie or have guidelines
  for TG gatekeepers to use. Perhaps a check/reminder/pop-up/warning for submitter if there is regulation+involved+regulation in the term
  name.
  Need a ticket to define the criteria for complexity.  SPARQL query?  Will attack retrospective stuff too.

Pascale: The automated check, based on term string length (100 characters ?) and other characteristics, would trigger a mandatory comment. In practice we recommend ALL new GO terms should have at least ONE example of a correct annotation (PubMed ID + figure included). That would help understand what the term means, and perhaps help people annotate to an already existing term.

A sub-editor can tag a term as being overly composed will be auto-unfolded and shown in a decomposed fashion in AmiGO If a term is is auto-detected as being overly composed the curator is forced to enter a Noctua model ID before the term is generated

Guidelines: If there is a commonly used name then that is the first choice for a term/class label. The labels of the parental terms may be helpful, but there is no requirement to reiterate the parental labels in the labels of the child classes.

Complicated terms (e.g. positive regulation of X involved in Y during Z) - editors should tell curators to annotate to individual terms and handle it later. Add a new tag called LEGO. Curator should make a LEGO model in Noctua."

  Very important for that information (in Noctua) to be integrated into the rest of the resource, so that data is not lost for a long time.
  Some basic conversion exists but that information is minimal.
  Action item for Noctua call?  How to interconvert Noctua and 'conventional' annotations with minimal loss of information.
  Get Kimberly and Rama on board, LEGO call?

Follow-ups from last week's call

Protege 4 vs 5

  DOS using P5 after reinstalling, using latest snapshot of ELK, bit buggy but seems to be working.
  Heiko - likes search better than the old one, ongoing issue of unsatisfiable classes detected by reasoner creates long lists which takes
  ages to render, blocks Protege, stuck waiting for P to finish rendering. Existing performance issue. P5 is not worse than P4.
  DOS: can get around this by turning off 'show inferences', work around is odd but can be done. P4 is fine.
  DPH: P4 is fine.
  CJM: Likes views in P5.
  Other people try P5.  Need explicit instructions on where to get it and what things to add.

NEXT WEEK: Remaining axioms in CHEBI that overclassify amino acids"

(see http://wiki.geneontology.org/index.php/Ontology_meeting_2015-01-06).

What do do about glyco and lipid modifies AAs -that, chemically, are alpha amino acids.

NEXT WEEK - Membrane Issues

Logical definition issues with partial membranes. Many membranes are classified as 'membrane part', but have no further classification or like 'dendrite membrane' have no indication that they are part of a dendrite.

https://www.ebi.ac.uk/panda/jira/browse/GO-186

NEXT WEEK - Organization branch

We could get a lot more automated classification if we relax genus to something very general - probably 'cellular process'. There doesn't seem to be any over-inference that results - although we may need to stick to a convention of using has_output in the cellular_component biogenesis branch. Should we do this?