Ontology meeting 2015-10-15

From GO Wiki
Jump to navigation Jump to search

Attendees:

Minutes: David OS

Discussion of MF refactoring

Repository now on GitHub: https://github.com/geneontology/molecular_function_refactoring
- Ticket tracker not yet fully populated from Paul T's requests.
 Tickets to add:
   Ticket for discussion of axiomatisation patterns for single functions
    

Discussion of transcription factor activity (again, again again...)

Confusion still reigns over nomenclature and the possibility, or not, of a grouping term. Paul feels strongly that we don't sufficiently reflect the nomenclature of biologists. Term TF activity is almost universally used for DNA binding. With the exception of some repressors - typically combined with binding to proteins (e.g. other TFs).

 AI: Paul T to write a nice clear ticket on this issue with a proposal for how we might refactor. 
   Any name changes in the DNA binding branch need to be combined with consideration of what changes 
   might also need to happen in the protein binding branch to avoid confusion.  
   Need to consider what currently distinguishes as cofactor from other DNA binding TF activity in GO.
   

Wikidata

Background - We discussed this with WD a month ago. They are mapping lexically, but also (intend to) pull from xrefs to WP. They are expecting a proposal from us on populating GO in WD / wikipedia Term info etc ASAP.

Still some confusion over what WD expect us to do.  
Re review of representation - Core elements of representation should be same for all OBO ontologies
Difference primarily in how xrefs (and maybe some axioms?) might be used to link resources.
While the code they are using to convert is not great (parsing of OWL as XML), it is probably OK for now.  
We may be able to help with alternatives (via JSON-LD conversion?), but solution really should be generic across OB ontologies.
AI: David and Melanie (& perhaps Chris too) to go over notes from last meeting ASAP
and put together rough proposal to be discussed at next week's Editor's call.
AI: Melanie: Email Wikidata to say when we will come back with response.

P5 vs P4

Melanie reports: "I have been using P5 for editing as discussed, but keep running into the error:

[...] java.lang.ArrayIndexOutOfBoundsException

Uncaught Exception in thread AWT-EventQueue-0

java.lang.ArrayIndexOutOfBoundsException

Error logged [...]"

I am not sure what causes this as it seems to happen somewhat randomly, but until this is fixed I have to revert to P4."

DOS: Should file bug ticket. But do we need to discuss further?

Ticket: https://github.com/protegeproject/protege/issues/226#issuecomment-148435087

Supramolecular complex

Should we add the term. Can we agree on the def.

Proposed def: A cellular component that consists of an indeterminate number of proteins or macromolecular complexes, organized into a regular, higher-order structure such as a polymer, sheet, network or a fiber. Examples: micotubule, collagen sheet, collagen fibril, ribosome subunit (?), chromatin

DOS: Having this will allow us to separate out content that is out of scope for InTact and so is not under the purview of planned refactoring under discussion in the protein complex working group. I think it is also better aligned with biologist terminology - I doubt most biologists would refer to the examples listed above as protein complexes. Supramolecular complex has precedent in the ECM lit.

General agreement to add, but Harold points out that ribosome subunits 
 have a determinate number of components, so don't belong under here, 
 given the current def.
Paola was worried this covers secondary protein structures, 
but that is not the intent  - this is at the level of complexes of many 
proteins/subcomplexes. Also Some worry about relationship to potential PRO term
 'protein aggregate'.   We should at least inform Pro that we've added
 the supramolecular complex.

Related / Follow-up: protein fibril

Should we have a generic term for fiber shaped protein polymers? (For details, see email thread 'protein fibril' on the ontology mailing list.)


 Protein fibril:
 This term is quite general.  DOS: would prefer something more specific if poss for Stan's needs, 
 but no-one has any suggestions. If added - certainly covers microtubules and actin filaments, 
 so would want to ensure these and related terms went underneath.   
 It may be able to define using PATO qualities - polymeric and fiber-shaped.
AI DONE: Paola to make ticket - assigned to DOS to make sure term ends up in right place.
GH ticket: https://github.com/geneontology/go-ontology/issues/12099

Design patterns

DOS to present his design pattern system & how we might use design patterns to maintain evolving, consistent patterns across at least some ontology branches.

Slides here: https://docs.google.com/presentation/d/1zXfHOOdPASAPQ9f0EkUYuM5dQLTl4f_Hvmlo85Kh9_4/edit#slide=id.ge41ba2242_0_194

Discussion of future steps
 We should get the validation code running on Travis.
 Can use Maven imports to get dependencies.  
 Then either run Jython scripts from Jython Jar, or package jython + imports as jar and run from that.
 One possible issue:  Need to see if we can find versions of Brain and Owltools that use same OWL-API version.
 AI: DOS, Chris and Heiko to meet and work out the best options for this.
 
synonym generation:
Adding a sprintf field is fine.  Combinatorial synonym generation is potentially more problematic.
Heiko: Synonym generation can get very complicated.   We have rules for excluding particular
synonym types for consideration that vary with source ontology. 
DOS & CM:  Perhaps out of scope for design patterns.  Too complex.  Can we generalise rules for 
 what synonym types to include for each ontology and specify those outside of the design patterns?
Generating a complete set of patterns:
  May be able to pull some information out of TG JS specs.  But not all.  
  In the end, quite a bit of manual work is going to be required.
  Once this is done - design patterns should be central doc for TG templates.  
  Should be updated in TG templates are
 Proposal to auto-populate select branches using sets of related design patterns:  
   No objections in general.  May be more for concrete examples.
 AI: DOS to make some proposals along these lines for specific branches.  
  Eds can then decide if OK.

Adding logical defs to cellular response terms

DOS has started to do this. What's the syntax, and should we look into filling all gaps.

'cell proliferation' terms

Most have logical def, e.g. 'epithelial cell proliferation' is_a 'cell proliferation' that acts_on_population_of 'epithelial cell'. Did we have plans for a TG template? (Not in Jira.)

Note: this is a case where we have a semi-random distribution of terms in the ontology vs annotation extensions. Should we just make terms for all AE usage and retrofit AEs?

MAYBE: cell to cell mobile