2008-10 SAB Meeting Minutes
Introductions
Michael Schroeder, Richard Scheurmann, Barry Smith, Phil Bourne , Michael Tyers, Simon Tavare, Larry Hunter (not here)
Board Opening Comment:: would like to have seen old report from last SAB meeting and a report on how any suggestions have been addressed before today.
Professional software tools that track projects should be used?
We use Sourceforge which was actually designed for software projects; should we change?
Midori: We use sf for requests for terms , not for software development
Time/cost in switching. It works
Suzi: We consciously separate content development from software development
Board: base camp for project management.
Board: is there an evolving project tracker for specific polices (like how regulation is always handled, etc.)?
Midori did part of Chris's presentation on cross products. Building complex term defs from terms of lesser complexity.
- development of cross products to other ontologies such as Chebi, Cell, etc.
- Use of cross product within GO
- extend annotation with cross-product development
Board: suggestion of using x products to populate binding (chebi x binding)
David: chris has done similar thing with cell
Annotation cross products: compound annotation terms.
use of column 16 to point to other ontologies. Midori talks about extended one
relationships comes form relationship ontology and a term from another ontology
Board: could this be made available for RDF triple score (semantic web technology)
Suzi: technically feasible but to we have the time. RDF files are very large. GAF files were made at a time when tab delimited seemed like a good thing (RDF not available at the time).
Board: will you make cross products explicit to expand ontology logarithmically or keep it in the annotation.
Suzi: keep it in the annotations unless it comes up a lot, then perhaps make it a part of the ontology.
Board: other sw developers will need to be informed, as all of this will break tools
(concern with micro-array analysis that use GO, etc.).
David: trade off, make old stuff available.
Mike C: we have old flat files for people that use old tools, etc.
Regulates relationship: aided in finding missing terms (since there is a regulation of x, is there a process X, etc.).
OboEdit build in checks: spelling, spaces, is_a path must be there, reasoner check: disjoints in the ontology
QC Jane and Pamgo, Tanya and David: regulates relationship.
Up to date on report
OBO_Edit itself will do the QC eventually
QC reports include:
1. missing links (is_a, part_of), inferred links.
if qc doesn't make sense, it a problem in the ontology.
2. internal consistency reports: regulates part of a graph should mirror process part of the graph.
3. parsing report:
a. unexpected parse (wrong place in ontology)
b. hierarchy: parentage check
c. missing_links
d. terms that don't parse : the way biologists would find
4. all terms in GO process part of two parents: can this happen
distinguish has_part from part_of
279 terms , added 40, changed 270 names, 272 dbxrefs added.
jan: 969 new is_a based on the reasoner
see wiki
Board: does this help biologists
yes; information content increases (Suzi)
take old paper and redo (Barry wants to see if now we can do better)
Board: Proof to convince that all ontologies should be done like this
This HAS helped with DNA recombination that the multiple part of check fixed confusion.
Regulates went live, will further expand: first cross products will be regulates: processes that regulate MF, and BP, and MF's that regulate mfs and bp, etc.
Board: no attempt to order?
no we are not creating a pathway mirror, we are just flagging participants in a process
Do we incorporate sensu information???
DAVID: split: but some biologists don't want
Barry suggested to David and me if we should also have “occurs-before, etc. relationships to pathways in GO to can order the steps timewise
sometimes_has_parts
Long hanging fruit for part_of
David: people slim over relationships: NOT right!
single step process are easy (part_of)
note to me: get matacyc, reactome, and kegg
Barry: don't call it “sometimes part of” call it in_some_cases_part_of
Board: doesn't this belong in the annotation db?
Judy: the source of the ontology is actually due to annotation (experiments_
David: structured annotation? Make the best part_of between all of MF to BP
what if you have an MF where you don't know: David: change the link
Barry: annotate like this : “this MF “in_context of a pathway is part_of this process”
This is a 3 place statement vs a 2 place statement
Suzi: importance of an annotation aid
Board: Keep annotation for those things not ready to be promoted to the ontology
Mentions: feature freeze in April; 14 updates since april
Rule base reasoner, config manager (no command line)
Prior: feature requests from curators;
Obo-editors make the call
to be released at end of year
some users run off of Sourceforge (SEN)
Board: to gain new users to compare and contrast to Protege
Judy: write document as part of release.
Michael: get a third party to do.
Michael: no point making if isn't used
Help-Desk
average 50 calls /month
response time short for most
query types: annotation is consistently higher
Board: how do you handle tool requests?
Mike: tool pages
FAQ pages: big and can get out of date
Web documentation BIG
AmiGO help docs
Tools
Thinking about: RSS feed: new items, technical announcements, software/db release, new term additions, general stats?
Board agrees on RSS feed
Newsletter:
8 editor team; released quarterly.
Board: don't like “gene of the quarter”; not valuable
Judy: utility of the newsletter. Is it worth it. We can take a few issues for handouts, etc for exposure at meetings; can we outsource the formating, etc.
Board: time sensitive handouts are not good idea for meeting handouts
Board: have you had a measure of utility? Survey?
Timely info on web more useful
Board: who IS your user community:
Suzi:curators who annotate
jane: micro-array: no, they use a program that uses GO
board: tool developers
users (people who made the annotation) vs benefitors (larger scientific community)
How do you evaluate:
benifitors: how GOOD is GO
Board: how many pubs reference go in abstract and/or citation?
Spotfire users (that use GO)
Richard: for DNA repair, use amazon-like tool: “you visited this term, you might be interested in this other term”
IS there a public forum for GO
Go-friends, GO-list
Board: Simon: GO:new generation assuming there always was GO
web tutorial videos?
Eurie: no one cite go even if they use it, just like no one cite Blast when they use it.
Board: what is our relationship with tool developers
we publish their tools
DO we ask for feedback from them?
Mike C: people use genbank but ; we don't get contacted even if there are hundreds of tools.
Peter Good: how much info we provide for the tools?
we need to increase: ask developers to write documentation we can provide
Michael A: good way to get developers on, no way to get off: We will find out if-no replies remove tool
We are asked to review papers on tools: do they use go properly?
Judy we have a standard set of things to look for
GO in pubmed going up as a % of biomed ontologies used
by putting tool on go site we become the authority on the tool (recommended)
Board: allow user comments on tools as part of the display page?
Suzi: user meetings:
too unfocused (Midori)
mostly tool developers
mike: but depends on WHERE the meeting
Midori: first one was good;
first GOC paper : nature paper used 2608 cite
anyone dealing with > 10 genes seems to use go
stanford : cite of amigo, wiki, cvs, mailing lists, GO DB, GAF filters
Suzi group makes software for the loads
new: dual taxon features
db for Amigo gets loaded 2-3 days
web stats: google analytics (geneontolgy.org)
april to Oct. 101, 154 unique hits
amigo not included
amigo uses (oct 10-19)
22628 for term-details
9418 for amino search
http: index: 22722
9641 referred from google)
37563 from NCBI
Board: would like to see progress graphs as % of total genes/organism
Mike C:hard to do because some groups annotate transcripts, etc.
Judy: consistent gene indexes
Board: want to see a timeline within a species
> NDs: the community should look at. < (my idea)
Barry: IEA and connection to cross products.
we need to be careful (low hanging fruit ok).
Barry: complaints he hears about go: annotations are incomplete; is there more known that doesn't get in.
Barry: group comparisons of annotation progress?, correctness
byproduct of refgenome projects?
MikeC: training issues, background, experimental model, number of curators available to each database,
Suzi
Fully (breath AND depth) and reliably annotated genomes
- empower scientific research
- need for use in automatic inferences
comprehensively capture the exp. data from the most active research communities producing high-confidence functional descriptions to leverage the power of the comparative method for inference.12 genomes
deliverables of reference genome effort:
- Proteome sets
- annotation best practice documentation
- annotation software tool
- reference annotations for inference of function in other species.
Paul: seminar about pipelines to get equivalogs
largest part of the sequence of each gene (all of the exons).
Board: you need to check with other resources
inference is first made to common ancestor, then to extant
QC: gp2protien file has 1 representative per gene. fill in?
UniProt record includes all alternatively spliced exons (but not attributed to the canonical protein in all cases).
write white paper to describe needs; approach Amos with this
Board: How do you determine duplication node vs speciation node?
Paul: multiple copies beyond that point)
slower evolution: more conserved
lead refgenome curator and protein family curator work together to define sets of genes to be annotated concurrently
no need to review by modification
homology inference is actually 2 inferences
- common ancestor has same annotation
- another (unnotated) to downstream (propagated forward)
MF propagation easy, BP needs intervention.
Board: how sensitive to?
Rex: help annotate emerging genomes that have no resources.
Board: worry about multiple duplication that ends up with new/different function: need curatation
Richard: Is it foolhardy to generate tree with bacteria and human; how much could you propagate.
For MF, will be useful, other slowly evolving genes
Board: concept of “reference genome” (Richard).
Someone: propagation controlled , it's the organisms that you can easily DO experiments in
Amigo as best way to display this information; it is the web interface of the GO
Search and browse
blast
term enrichment and slimming
visualizations
friendly sql interface.
new features of 1.6
homolog sets (ref genome data)
visualization
integration
homology details
Community interactions
integration of wiki resources
homolog sets:
integrate new info with old info
gene product searchable
gene product details
who is the user: final version (don't show what is considered bad data).
anybody can make comments
amigo uses a live connection
users can affect go patesm in real time
new possibilities:
different info from different communities
make sure curators get feedback from community
see: e.g., ;post-transcriptional gene silencing in RNA GO:0035194 in Gonuts
Future:
term requires built into amigo (replace sf)
speed and clarity improvements
complex searches t
text based (when on amigo: google tool bar is now searching the GO )
and ontology based (shared parents, etc.)
filtering out what you don't want vs what you do want
amigo as a resource
term completion
Board: do you have web services? not yet
Board: what genes are missing vs present in the ref genome sets (list< important to know ; implied deletion)
Board: versioning: ?
date of GO, date of annotations
Discussed:
- generation of sets
- make literature based annotation
- annotator of protein families
- propagation of the annotations back
- consistency
monthly conference
electronic jamborees
identifies problem areas in ontology content as well as curator inconsistencies (interpretation of guidelines).
6. improving annotation documentation
7. new annotators
Board: has GO project helped to other things?
P: Has generated Biocurator Society
Board: publish more on
and then....... dum de dum dum.. dum de dum dum... dum de dum de dah.. (Dragnet):
We want to see:
1. want to see more effective (e.g. faster) adoption of best practices, representation improvements, etc.
e.g., cross product development is too slow
more active support from leadership
2. value in consistency check, sop development, etc. incorporate these checks into OBOedit itself; incorporate consistency checks in day-to-day editing system, for example in OBO-Edit
3. get a publications out of oboedit 2.0 development to establish community presence.; G et independent group on ontology developers to compare oboedit vs protege
4. get more human model organism community (human best model organism for disease) ; Would have liked to have seen the Human MOD represented. Could be better representation of Human MOD. They are the best model org for disease, need to encourage that community
CSI community cpsa programme, medical informatics, so may not be the best group, but there may be members that are more bioinformatics types that would be good to bring in.
( Emily mentions new kidney grant and london group.)
How about the Gates foundation? Maybe if could be related to malaria or similar.
5. outreach: involve graduate students in annotation as part of qualifying exams, etc.
They feel newsletters have limited value. Standard pamphlet that is more time proof would be better.
Opportunities to use statistics, esp. usage stats, to figure out who uses which GO things for what
Talk to journals to get GO into articles: : Community outreach through journal publishers would be good. Richard Schuermann thinks there are opportunities to ask journals to mark up their articles with GO terms.
How to get authors to supply go terms, etc;
how to get Journals to comply
Midori: Royal Soc. for chem. has come to us to mark up their journals with GO terms, SO, etc.
Board: get more journals doing what RSC does as they journals are all going online now. Might be a good thing.
Michael A thinks this might be a waste of our time as the landscape is changing quickly and there are tools coming online that would allow authors to mark up at the time of publication.
Board: do this instead of newsletter.
Board: automatic mark up , community mark up ?
8. On: MFxBP: start simple to get low hanging fruit; accept impact on all of the tools; all tools MUST examine relationships; This will need community outreach, and GO should provide help on how tools can be modify
They all agree that it is important to be able to make the links where the links are clear. Start simple with part_of universal non-context-dependent. Do that first before doing fancy context-dependent linkis. Also need to assess affect of these links on the hierarchical structure of the links on the graph and on tools. May really screw up the inferences that they make. tools need to distinguish between types of links. Assessing effects of these links on tools is going to be important community outreach. If we screw up tools then that will be bad community karma.
David mentions that we always provide the file without the links too. When we announce changes we should provide an example of software that does this properly.
Genome Biology brought this up. We also had major editorial in Nature the other week on good integration practices.
Board: integration of reactome (quick);
MA: joint GO Kegg Metacyc Reactome grant part got cut from grant
Suzi: need more continuity of SAB: Suzi - hoping for perspective on user communities; asks for suggestions for new SAB members
Phil suggests biologist [maybe "guest" member?? not sure I caught that exactly] in targeted area, e.g. for 2009 signal transduction
If we are looking at signal transduction then the next SAB should include a signal transduction person to comment on that. Would be able to give targeted feedback. integration of pathway databases should be pushed and used a consistency check. That was we could nudge them to get consistent.