2008-10 SAB Meeting Minutes

From GO Wiki
Jump to: navigation, search
SAB Thursday

Introductions

Michael Schroeder, Richard Scheurmann, Barry Smith, Phil Bourne , Michael Tyers, Simon Tavare, Larry Hunter (not here)

Board Opening Comment:: would like to have seen old report from last SAB meeting and a report on how any suggestions have been addressed before today.


Intro: Midori:

Professional software tools that track projects should be used?

We use Sourceforge which was actually designed for software projects; should we change?

Midori: We use sf for requests for terms , not for software development

Time/cost in switching. It works

Suzi: We consciously separate content development from software development


Board: base camp for project management.

Board: is there an evolving project tracker for specific polices (like how regulation is always handled, etc.)?

Midori did part of Chris's presentation on cross products. Building complex term defs from terms of lesser complexity.


  1. development of cross products to other ontologies such as Chebi, Cell, etc.
  2. Use of cross product within GO
  3. extend annotation with cross-product development

Board: suggestion of using x products to populate binding (chebi x binding)

David: chris has done similar thing with cell

Annotation cross products: compound annotation terms.

use of column 16 to point to other ontologies. Midori talks about extended one

relationships comes form relationship ontology and a term from another ontology


Board: could this be made available for RDF triple score (semantic web technology)

Suzi: technically feasible but to we have the time. RDF files are very large. GAF files were made at a time when tab delimited seemed like a good thing (RDF not available at the time).


Board: will you make cross products explicit to expand ontology logarithmically or keep it in the annotation.

Suzi: keep it in the annotations unless it comes up a lot, then perhaps make it a part of the ontology.


Board: other sw developers will need to be informed, as all of this will break tools

(concern with micro-array analysis that use GO, etc.).

David: trade off, make old stuff available.

Mike C: we have old flat files for people that use old tools, etc.


David and QC


Regulates relationship: aided in finding missing terms (since there is a regulation of x, is there a process X, etc.).

OboEdit build in checks: spelling, spaces, is_a path must be there, reasoner check: disjoints in the ontology

QC Jane and Pamgo, Tanya and David: regulates relationship.

Up to date on report

OBO_Edit itself will do the QC eventually

QC reports include:

1. missing links (is_a, part_of), inferred links.

if qc doesn't make sense, it a problem in the ontology.

2. internal consistency reports: regulates part of a graph should mirror process part of the graph.

3. parsing report:

a. unexpected parse (wrong place in ontology)

b. hierarchy: parentage check

c. missing_links

d. terms that don't parse : the way biologists would find

4. all terms in GO process part of two parents: can this happen

distinguish has_part from part_of


279 terms , added 40, changed 270 names, 272 dbxrefs added.

jan: 969 new is_a based on the reasoner

see wiki


Board: does this help biologists

yes; information content increases (Suzi)

take old paper and redo (Barry wants to see if now we can do better)

Board: Proof to convince that all ontologies should be done like this

This HAS helped with DNA recombination that the multiple part of check fixed confusion.


Regulates went live, will further expand: first cross products will be regulates: processes that regulate MF, and BP, and MF's that regulate mfs and bp, etc.


Harold on FXP Metabolism

Board: no attempt to order?

no we are not creating a pathway mirror, we are just flagging participants in a process


Do we incorporate sensu information???

DAVID: split: but some biologists don't want

Barry suggested to David and me if we should also have “occurs-before, etc. relationships to pathways in GO to can order the steps timewise


Jen on FXP

sometimes_has_parts


Long hanging fruit for part_of


David: people slim over relationships: NOT right!


single step process are easy (part_of)

note to me: get matacyc, reactome, and kegg


Barry: don't call it “sometimes part of” call it in_some_cases_part_of


Board: doesn't this belong in the annotation db?

Judy: the source of the ontology is actually due to annotation (experiments_


David: structured annotation? Make the best part_of between all of MF to BP

what if you have an MF where you don't know: David: change the link


Barry: annotate like this : ‚Äúthis MF ‚Äúin_context of a pathway is part_of this process‚Äù

This is a 3 place statement vs a 2 place statement


Suzi: importance of an annotation aid


Board: Keep annotation for those things not ready to be promoted to the ontology


Amina on OE2

Mentions: feature freeze in April; 14 updates since april

Rule base reasoner, config manager (no command line)


Prior: feature requests from curators;

Obo-editors make the call

to be released at end of year


some users run off of Sourceforge (SEN)


Board: to gain new users to compare and contrast to Protege

Judy: write document as part of release.


Michael: get a third party to do.


Jane on Advocacy

Michael: no point making if isn't used


Help-Desk

average 50 calls /month

response time short for most

query types: annotation is consistently higher


Board: how do you handle tool requests?

Mike: tool pages


FAQ pages: big and can get out of date

Web documentation BIG

AmiGO help docs

Tools

Thinking about: RSS feed: new items, technical announcements, software/db release, new term additions, general stats?

Board agrees on RSS feed


Newsletter:

8 editor team; released quarterly.


Board: don't like “gene of the quarter”; not valuable

Judy: utility of the newsletter. Is it worth it. We can take a few issues for handouts, etc for exposure at meetings; can we outsource the formating, etc.


Board: time sensitive handouts are not good idea for meeting handouts


Board: have you had a measure of utility? Survey?

Timely info on web more useful


Board: who IS your user community:

Suzi:curators who annotate

jane: micro-array: no, they use a program that uses GO

board: tool developers


users (people who made the annotation) vs benefitors (larger scientific community)


How do you evaluate:

benifitors: how GOOD is GO


Board: how many pubs reference go in abstract and/or citation?


Spotfire users (that use GO)


Richard: for DNA repair, use amazon-like tool: “you visited this term, you might be interested in this other term”


IS there a public forum for GO


Go-friends, GO-list


Board: Simon: GO:new generation assuming there always was GO

web tutorial videos?


Eurie: no one cite go even if they use it, just like no one cite Blast when they use it.


Board: what is our relationship with tool developers

we publish their tools

DO we ask for feedback from them?

Mike C: people use genbank but ; we don't get contacted even if there are hundreds of tools.

Peter Good: how much info we provide for the tools?

we need to increase: ask developers to write documentation we can provide


Michael A: good way to get developers on, no way to get off: We will find out if-no replies remove tool


We are asked to review papers on tools: do they use go properly?


Judy we have a standard set of things to look for


GO in pubmed going up as a % of biomed ontologies used

by putting tool on go site we become the authority on the tool (recommended)

Board: allow user comments on tools as part of the display page?


Suzi: user meetings:

too unfocused (Midori)

mostly tool developers

mike: but depends on WHERE the meeting

Midori: first one was good;


Mike on Annotation stats


first GOC paper : nature paper used 2608 cite


anyone dealing with > 10 genes seems to use go

stanford : cite of amigo, wiki, cvs, mailing lists, GO DB, GAF filters


Suzi group makes software for the loads


new: dual taxon features


db for Amigo gets loaded 2-3 days


web stats: google analytics (geneontolgy.org)

april to Oct. 101, 154 unique hits

amigo not included


amigo uses (oct 10-19)

22628 for term-details

9418 for amino search


http: index: 22722

9641 referred from google)

37563 from NCBI


Board: would like to see progress graphs as % of total genes/organism

Mike C:hard to do because some groups annotate transcripts, etc.


Judy: consistent gene indexes


Board: want to see a timeline within a species


> NDs: the community should look at. < (my idea)


Barry: IEA and connection to cross products.

we need to be careful (low hanging fruit ok).

Barry: complaints he hears about go: annotations are incomplete; is there more known that doesn't get in.


Barry: group comparisons of annotation progress?, correctness


byproduct of refgenome projects?


MikeC: training issues, background, experimental model, number of curators available to each database,


Reference Genome Project

Suzi

Fully (breath AND depth) and reliably annotated genomes

  • empower scientific research
  • need for use in automatic inferences

comprehensively capture the exp. data from the most active research communities producing high-confidence functional descriptions to leverage the power of the comparative method for inference.12 genomes


deliverables of reference genome effort:

  1. Proteome sets
  2. annotation best practice documentation
  3. annotation software tool
  4. reference annotations for inference of function in other species.

Paul: seminar about pipelines to get equivalogs


largest part of the sequence of each gene (all of the exons).


Board: you need to check with other resources


inference is first made to common ancestor, then to extant


QC: gp2protien file has 1 representative per gene. fill in?


UniProt record includes all alternatively spliced exons (but not attributed to the canonical protein in all cases).


write white paper to describe needs; approach Amos with this


Board: How do you determine duplication node vs speciation node?

Paul: multiple copies beyond that point)


slower evolution: more conserved


lead refgenome curator and protein family curator work together to define sets of genes to be annotated concurrently

no need to review by modification


homology inference is actually 2 inferences

  1. common ancestor has same annotation
  2. another (unnotated) to downstream (propagated forward)

MF propagation easy, BP needs intervention.


Board: how sensitive to?

Rex: help annotate emerging genomes that have no resources.


Board: worry about multiple duplication that ends up with new/different function: need curatation


Richard: Is it foolhardy to generate tree with bacteria and human; how much could you propagate.


For MF, will be useful, other slowly evolving genes


Board: concept of “reference genome” (Richard).


Someone: propagation controlled , it's the organisms that you can easily DO experiments in


Seth

Amigo as best way to display this information; it is the web interface of the GO

Search and browse

blast

term enrichment and slimming

visualizations

friendly sql interface.


new features of 1.6

homolog sets (ref genome data)

visualization

integration

homology details

Community interactions

integration of wiki resources


homolog sets:

integrate new info with old info

gene product searchable

gene product details


who is the user: final version (don't show what is considered bad data).


Community interaction with Wikis

anybody can make comments

amigo uses a live connection

users can affect go patesm in real time

new possibilities:

different info from different communities

make sure curators get feedback from community


see: e.g., ;post-transcriptional gene silencing in RNA GO:0035194 in Gonuts


Future:

term requires built into amigo (replace sf)

speed and clarity improvements

complex searches t

text based (when on amigo: google tool bar is now searching the GO )

and ontology based (shared parents, etc.)

filtering out what you don't want vs what you do want

amigo as a resource

term completion


Board: do you have web services? not yet

Board: what genes are missing vs present in the ref genome sets (list< important to know ; implied deletion)


Board: versioning: ?

date of GO, date of annotations


Pascale and Consistency and Documentation

Discussed:

  1. generation of sets
  2. make literature based annotation
  3. annotator of protein families
  4. propagation of the annotations back
  5. consistency

monthly conference

electronic jamborees

identifies problem areas in ontology content as well as curator inconsistencies (interpretation of guidelines).

6. improving annotation documentation

7. new annotators


Board: has GO project helped to other things?

P: Has generated Biocurator Society

Board: publish more on


and then....... dum de dum dum.. dum de dum dum... dum de dum de dah.. (Dragnet):


The Verdict:

We want to see:

1. want to see more effective (e.g. faster) adoption of best practices, representation improvements, etc.

e.g., cross product development is too slow

more active support from leadership

2. value in consistency check, sop development, etc. incorporate these checks into OBOedit itself; incorporate consistency checks in day-to-day editing system, for example in OBO-Edit

3. get a publications out of oboedit 2.0 development to establish community presence.; G et independent group on ontology developers to compare oboedit vs protege

4. get more human model organism community (human best model organism for disease) ; Would have liked to have seen the Human MOD represented. Could be better representation of Human MOD. They are the best model org for disease, need to encourage that community

CSI community cpsa programme, medical informatics, so may not be the best group, but there may be members that are more bioinformatics types that would be good to bring in.

( Emily mentions new kidney grant and london group.)

How about the Gates foundation? Maybe if could be related to malaria or similar.


5. outreach: involve graduate students in annotation as part of qualifying exams, etc.

They feel newsletters have limited value. Standard pamphlet that is more time proof would be better.

Opportunities to use statistics, esp. usage stats, to figure out who uses which GO things for what

Talk to journals to get GO into articles: : Community outreach through journal publishers would be good. Richard Schuermann thinks there are opportunities to ask journals to mark up their articles with GO terms.

How to get authors to supply go terms, etc;

how to get Journals to comply


Midori: Royal Soc. for chem. has come to us to mark up their journals with GO terms, SO, etc.


Board: get more journals doing what RSC does as they journals are all going online now. Might be a good thing.

Michael A thinks this might be a waste of our time as the landscape is changing quickly and there are tools coming online that would allow authors to mark up at the time of publication.

Board: do this instead of newsletter.


Board: automatic mark up , community mark up ?


8. On: MFxBP: start simple to get low hanging fruit; accept impact on all of the tools; all tools MUST examine relationships; This will need community outreach, and GO should provide help on how tools can be modify

They all agree that it is important to be able to make the links where the links are clear. Start simple with part_of universal non-context-dependent. Do that first before doing fancy context-dependent linkis. Also need to assess affect of these links on the hierarchical structure of the links on the graph and on tools. May really screw up the inferences that they make. tools need to distinguish between types of links. Assessing effects of these links on tools is going to be important community outreach. If we screw up tools then that will be bad community karma.

David mentions that we always provide the file without the links too. When we announce changes we should provide an example of software that does this properly.


Genome Biology brought this up. We also had major editorial in Nature the other week on good integration practices.


Board: integration of reactome (quick);

MA: joint GO Kegg Metacyc Reactome grant part got cut from grant

Suzi: need more continuity of SAB: Suzi - hoping for perspective on user communities; asks for suggestions for new SAB members

Phil suggests biologist [maybe "guest" member?? not sure I caught that exactly] in targeted area, e.g. for 2009 signal transduction


If we are looking at signal transduction then the next SAB should include a signal transduction person to comment on that. Would be able to give targeted feedback. integration of pathway databases should be pushed and used a consistency check. That was we could nudge them to get consistent.