Talk:2010 Bar Harbor Agenda

Minutes

Tuesday Sept 7, Day 1(Morning until lunch)

Introductions -Mindy Dwinell from RGD- taking over Simon's position
-Sven's first GO meeting

Grant Aims (Suzi)

Suzi went over the Specific Aims that the PIs have put together for the grant renewal. Please download the current draft of the specific aims from CVS to see the details.

Aim 1:
- Building the ontology- regulates relationship, is_a, connecting to external ontologies (CheBi, Cell ontology)
- Ref. genome- lot of progress
- General annotation, MODS, WGs coming together, coming up with standards- good progress.

So to some extent our aims will remain the same.

- How can we get detailed/rich annotations?
- Quality control group
- Rate limiting step- literature curation. Improve on item e. Richness is vague. Elaborate on that item.

Aim 2: Ref.genome
- improve software

Aim 3:
3b-Infrastructure to approach bacterial folks to make annotations. Do we have better relation with JGI now? Suzi- Yes. She has been invited to give a talk. Jonathan Eisen is involved.
How to find out who is doing what- lot of c.elegans-pathogens papers. Where do you send the pathogen information? No infrastructure for that. It is not that we are having difficulty identifying genes to annotate.
We want to build the ontology automatically. CheBI is not requesting terms from us. It is the use of other ontologies to build GO.

Aim 4:
How are we doing? Metrics? Are we meeting our goals? We should provide some sense of completeness, breadth of annotation etc. Scientist who use the GO should provide metrics on how GO worked. Some papers say GO is great, some papers use other resources and so on.
Just because the paper used KEGG, doesn't mean they did not like GO. They probably did not get the results they wanted with KEGG. If people are finding GO to be complex, we should provide cuts of GO (GO-slims). Two kinds of consumers (MODs and groups like Reactome and then consumers who stay at a distance and plug in their data into GO). Many people who go away are because they don't find the data in GO (not annotated). Karen's comment- user said 'i can't remember if the term is in the MF or BP'.
Mary-metrics with 'us'. How do we use it (Mods), metrics on how many papers have been curated by the MODs, these metrics are important as well.
How about we pair up with a specific group (say cancer biologist). Do analysis with them (now and after 3 months)
If we got more mappings to ext. databases, then users don't have to learn a new system.
Talk about IEA mappings (parking lot discussion)
leverage existing tools (PAINT and PANTHR families) and what we have in GO and do a first pass. Get broad coverage. Depth coverage is costly.
Some genes have too many GO annotations, hard to say which is the main function etc.
identify areas that are underannotated based on which parts of the ontology is well developed and curators work on those areas.
NOTCH signaling pathway IEA- from Kmberly. Lot of organismal annotations, but not a single one talks about Notch signaling

Tanya- We need to give an idea on % of effort for each item.

Another thought - ask the program officer if we get funding from 2 different agencies.

Going through list of Tasks from the Cambridge meeting (the long table on http://gocwiki.geneontology.org/index.php/Cambridge_GO_Consortium_Meeting).
- talk about Gold standard set so that we can compare annotations
- did not do a usability study

- ACTION ITEM:
  HOMEWORK-everybody should write specific aims for this project. Download the Word document, turn on "track changes", add your comments, and email back to the GO-tops.

Annotation Advocacy Group Report (Rama)

Slides- File:BHB2010.pdf
Rama went through the recent accomplishments of the annotation advocacy group. One of the main activities of this group was the annotation camp. Conclusions from the working groups were presented just as an FYI to all consortium members.

Ruth gave report on binding working group (slides are part of Rama's presentation)

Discussion on cross-species expression. Should reciprocal annotations be made for these? Discussion on how to annotate these, what should be in column 8 versus 16.
Discussion of whether reciprocal annotations are mandatory, even within the same species, e.g. calmodulin and myosin.

- should GO still be making protein binding annotations - David brought up idea that "protein binding" is isolated within GO, and cannot be linked to specific processes. Kara brought up idea that maybe protein-protein interactions should be left to some other database (IMEX, BioGrid, etc) rather than directly manually annotated by GO curators. There was debate about whether GO should automatically make "protein binding" annotations (either to that term directly or to children) or just leave them separate to be incorporated via a separate resource for enrichment analysis. This remains a topic of discussion. In many cases while there is experimental evidence for protein binding, that might not be the main function of the protein. One should look at annotating the biological process the protein is involved in and not just the experiment.

Rama gave report on SGD's analysis of IPI for catalytic activity terms.

- chains of evidence - Peter gave a hypothetical example and Karen a real example. It came down to the fact that often the conclusion the researchers make, which the annotation is trying to represent, is not based on any single piece of evidence, but a combination of multiple lines of evidence. Everybody agreed that in order to increase the annotation expressivity, we should come up with a way to represent the chaing of evidence.

ACTION ITEM:
Rama et al will form a working group and come up with a proposal to move forward with this idea. She will consult/include Michelle.

Rachael's report - downstream effects (slides are part of Rama's presentation)

Pascale's report - response_to terms (slides are part of Rama's presentation)

Kimberly's report - regulation vs the process itself (slides are part of Rama's presentation)

Question about whether you can distinguish between whether a curator judged that a gp was part of the process, or whether they couldn't tell whether it was part of the process or just regulated it.
Caution should be applied when annotating to regulating terms (Judy's comment that annotators need to be skilled and highly trained biologists)
David suggested that when a curator comes across a case where they know that a function also regulates a process, that they should request an ontology change to capture that.
A ligand in a signaling pathway should not be annotated to regulation.

Rama - summarized some maintenance details

consolidation of mailing lists
preview of changes to GO website (separation of documentation for manual vs IEA annotation)
QC checks (hard means automatically removed)
- InterPro is considering the proposed removal of annotations to "protein binding" (proposed for Hard QC)
plans
- QC's
  - Hard QC's will be run every week as part of regular filtering script
  - Questions:
    - What is plan on soft QC's?
    - Could we keep some stats on these (good for grant to have this documentation)
- new evidence system that allows composite lines of evidence, chains, etc.

Rachael demo on new [www.ebi.ac.uk/QuickGO QuickGO interface]. Please give it try, just went live and they'd love feedback on it.

Rama's initial thoughts for grant renewal

There are so many new genomes being sequenced. We should some how use orthology information and enable annotations for all these genomes in some speedy way/
idea that users could submit suggestions for papers to be annotated; Rachael mentioned that GOA is working on having a form where researchers can submit annotations via a web interface.

Ref.genome Project (Pascale)

Pascale's slides: http://gocwiki.geneontology.org/index.php/File:2010-09-BHB-RefG-progress-report-Pascale.pdf
Kara slides: http://gocwiki.geneontology.org/index.php/File:GO_BarHarbor2010Kara.pdf

12 families are in the CVS repository, 48 species
Wnt signaling- there was data for worm, fly and mouse genes that were not in some sub-families
2 week thing is little tricky; Susan- if I have one dedicated person doing ref.genome curation, I could make faster progress.
Pascale- can you get to data from a review? Sometimes papers say mammals and don't mention the species.
lot of details get discussed. But very hard to find it. Can we put these details in GONuts?
Where to post comments- Wiki or SF? Monitoring?
Tanya-After MikeL sends an email out, for a given family, I go to the Wiki, how do i easily identify what are the genes and annotations? When? Both at the beginning and after the inferences are made. For example, if you have 4 genes in the family, then you want to see the annotations for all the 4 or have check boxes to select the genes, in one click, to see annotations . Sven's family view just lists the genes. Such feature requests, please add them to the PAINT SF tracker.

rather than picking gene families, pick couple of papers to discuss for electronic jamborees- part of the Annotation monthly conference call.

alternate site for PAINT files- directly available from GOC website- http://www.geneontology.org/gene-associations/submission
Do we reuse these Homolset pages for Panther families? OR do we need a different visualization? Panther families are already available - need to discuss
No way to download the Ref.genome annotations. QuickGO has a filter for these genes.
Pascale- these are random set of genes- why create a filter? What do you want to do with it?

Rachael- publicity?
Judy- comprehensiveness of the annotations across species. We have put these genes through the inferencing pipeline. Look at it at the genome level.

Publicity- Publicize it and put it at a confidence level next to the manual annotations. Does the end user need to know this came out of the ref.genome/Paint project? The user needs to know the function of the gp.

MikeL's slides- http://gocwiki.geneontology.org/index.php/File:Livstone_Bar_Harbor_Sept_2010.pdf
Judy- there was lot of concern about MikeL spending lot of time in inferring via PAINT. But looks like MikeL has figured out a pipeline and things are faster now. This is a tremendous advance since Geneva camp.
Li- sometimes the annotations have to go just one level down. This might not be high priority for the MODS. Is it valuable to go one level down. Make recommendations only if annotating one level down is important for the tree annotation.
Suzi- these annotations should be sent out irrespective.

Harold- do you disregard IMP for process?
MikeL- it depends. I judge depending on what is available.

Amigo slides: http://gocwiki.geneontology.org/index.php/File:Amigo.pdf
P-POD slides: http://gocwiki.geneontology.org/index.php/File:Ppod.pdf
Panther families/subfamilies annotation status report (Mary) Slides: http://gocwiki.geneontology.org/index.php/File:pantherGOreport.pdf

ACTION ITEM:
Put some priority on the suggestions- Critical/Non-critical. MikeL has already been doing this judgment in some ways.

ACTION ITEM for Sven:

Add link in his reporting tool to say which genes have exp. annotations and so on: 'Has Exp annotations/ # of EXP'.

Wednesday Sept 8, Day 2

Ontology (Morning)

Aligning GO with ChEBI (David)

Noticed that in addition to not alignining with CHEBi, there are inconsistencies in GO where the structure in one area is not parallel to the structure supposedly representing the same thing in another area
created a chemical ontology within GO called GOche. Chris created this ontology from what already exists in GO.
Made sure everything is complete. Representation is based on chemical structure and not on the roles of the chemicals
Complex chemicals, e.g. a nucleotide is a phosphatidic ester with has_part nucleoside and has_part 1-3 phosphates and species position of the phosphate attachment.
GO will merge the representation of acids and conjugate basis (ascorbate and ascorbic acid). Biologists don't care about acids/bases, but chemists do. Chemists can get to the actual form by going from GO to CHeBI.
There will not be a transport cross-product term for every biosynthesis cross-product term.
Amelia is working with EC and KEGG about aligning reactions with GOCHe etc.

TermGenie demo (Jane)

Ontology editors are already using the TermGenie.
Terms entered via this method will be in the obo file within a day.
For the example that Jane demo'd there is an issue with the placement in the ontology. It was too high. Chris, David et al will look into it
What happens if you just check postive and negative regulation terms (without checking the regulation term)? Genie is smart. It can do some checks.
Next type of terms to be incorporated into TermGenie should be the 'involved_in' terms that Mike L often requests.

ACTION ITEMS:

Look into, and if needed fix, the placement of the term Jane used for the Term Genie demo (Ontology Dev: Chris, David)

Signaling group update (Becky)

slides: Media:Signaling.pdf.pdf
Signaling process- This term used to be the highest level grouping term, but now there is a higher one and a sibling term for signaling pathway. Add a comment to this term to consider the other 'signaling pathway' term. What criteria do curators look at for annotating to signaling terms? The evidence presented in papers is sometimes/usually tricky to interpret.
Mary (RGD) should we concentrate on annotating one pathway at a time?
Fix documentation on signaling, the part that Mary read out
Receptor ligand slide- add synonyms with 'ligand' in it for the receptor agonist and antagonist terms
cytokines can be both agonist and antagonists. Jane and Becky will take care of this.
Peter- agonist is not the natural ligand?
EGF receptor activity- it is actually the receptor that binds to the EGF. The way the term is worded is not super clear. make sure the logic is right in the computable definitions for these terms.

ACTION ITEMS:

Add a comment to the signaling term to also consider annotating to its sibling term signaling pathway (Signaling group)
Update documentation on signaling (Signaling group)
Add synonyms using word 'ligand' for the receptor agonist and antagonist terms (Signaling group)
cytokines can be both agonist and antagonists (Signaling group: Jane and Becky)
reword the term epidermal growth factor receptor activity so that it is clear (Signalling group)

Transcription overhaul (Karen)

TxnOverhaulKChristie.ppt
Are you going to give related synonyms for many of these terms? The term names are complex and it will be easy to identify terms if there are as many synonyms
Suzi- can we link pictures to these terms, display on AmiGO? because it is easier to understand.
How do you annotate to sigma factor activity? Do they have to show binding to both RNAP and the promoter? There might not be a single experiment that shows both at the same time. We need to redefine our annotating paradigm. Composite evidence.
Paul - like how the Role is described and also the mechanism by which it does its role
David - mechanism of functions-this serves as a model for how things should be defined in GO and I hope Karen will want to write this up as a paper
Li- Asked about the fact that the transcription factor activity term (GO:0003700 ) has been renamed and redefined and about how this will affect annotations - annotations to this term are not accurate currently. Should be reviewed anyway.

ACTION ITEMS:

Make sure that common use phrases get put in as synonyms for complex transcription factor terms (txnOH group)
Decide whether or not this refinement to how GO defines functions, using transcription as a model, should be written up as a paper (?)

Software

Slides- Media:software-group-report-bar-harbor-2010.pdf
Chris started with a summary of software over the history of GO, nicely broken down by decades

Development in progress
- MIREOT (Amina) - bring in subsets of other ontologies so you can make cross-product terms
- Becoming Obo1.4 OWL compatible
Ontology checks
- Run the taxon triggers and F-->P weekly (on a cron tab)
- MGI and Val already have loaded the M-P inferences.
Judy- have we outgrown the GAF file? In the context of increased expressivity with annotations (col-16, chain of evidence etc). May be! If we are going to create a new Annotation paradigm, we need to define a new format.
Paul presented ideas on LEGO - building "stories" (complex annotations) from simple ones
- Build simple statements from annotations.
- Kimberly - have some simple descriptions for genes, hard to maintain. LEGO will work to keep these up-to-date, to come up with statements.
IndiGO - How would curators use it? How will it work? WIll it go into a common DB? Then the annotations can be taken out by each MOD..details need to be worked out.
Galaxy environment - nice workflow and we should look into integrating/wrapping tools with galaxy and allowing a pipeline for multiple analyses.
deployment - machine systems are different, so deployment is hard. Cloud computing. Use Amazon cloud resources or VM for displays.

ACTION ITEMS:

Develop new annotation paradigm (chains of evidence, LEGO, etc.) in order to determine needs for (new) file format (?)

Inter-Operational issues

On the white board with possible stories/priorties of next 6 months File:Goals Sheet1.pdf

ChEBI align story - paper is circulating

Broad coverage via phylogenetic annotation methods

ACTION ITEM: come up with a couple pages on what would be needed to support this (Pascale, Kara, Suzi, Mike, Paul)
need to understand how much person-power would be needed to be effective in making progress in this

Annotation Paradigm (Chain of Evidence)

ACTION ITEM: Come up with a couple pages on new paradigm for next grant (Rama, Paul, Alex, Suzi)
- GO has to go from describing experiments to describing stories

- ideas for how to implement this:
- Alex's idea to give each annotation an ID, and then reference these in composite annotations, i.e. "story" annotations
- can these links be inferred automatically from the existing annotations to terms like "DNA binding" and "regulation of transcription", e.g. MGI fox3A
- when you are doing ref genome type annotations (comprehensive view of literature), curator is in a good place to understand story and know what type of "story" annotation should be made

Progress to highlight completion of/progress in ontology development

transcription factor story; derives from MF-BP links and interconnects with need for chain of evidence
write a paper on this?
signaling can follow some if the methods developed for transcription factors

Software priorities

Term Genie - will be needed in order to support increased requests for composite terms, so that ontology editors don't have to manually do all of these

PAINT GAF files

stability of Panther IDs -
- operationally, people will need to be able to construct an url using a Panther ID in order to go to a web page

- first step of evidence is for a Panther family, e.g. PANTHER:PTHR10046_AN2

- Paul will work out having stable IDs

- we'll need to be able to identify when a tree/node changes and there are no longer experimental annotations to support some of the inferences that were made and transmit to annotation groups

- what do we need to do to help groups that are not importing these files, many groups have already implemented pipeline and others are in progress; most groups will probably be able to do this by the end of the year
reference issue PAINT_REF ID goes to a stable file that is in cvs, file can be prettified if that is needed; for groups that are not incorporating the PAINT_REF's into their databases, the files can be processed out of the database in order to include the PAINT_REF's in the full GAF files those MODs submit and which get loaded into AmiGO

User issues

How do we get all this data to users in ways that make sense
ways to get files (OBO or OWL), to see things (DB-web, AmiGO, Galaxy)
What do "regular biologist" users need?
- GO Term Finder (via GO help), e.g. enrichment,
Use cases for users
- gene set evaluation

- e.g. GWAS + annotations (for genes and their orthologs)

- Paul's question: What would they do next with this file
ID mapping
- want this to be automatic without having to do some extra pre-processing
slims vs anti-slims
types of users
- experimentalists
- computational/systems biologists
- ontology/semantic web/NLP
- new genome annotations based on new sequencing (NextGen etc)
- Predictions

How do we assess success of GO?

Survey?
testimonials: "The GO is the cornerstone of modern computational biology" - Matt Hibbs
disable services and measure how many emails we get in a 24 hour period, after we get support letters ;)
via GO help desk

Grant mechanism

assuming P41
assuming 30 pages

ACTION ITEMS:

develop plan for broad coverage via phylogenetic annotaiton methods (Pascale, Kara, Suzi, Mike, Paul)
develop plan for new annotation paradigm for next grant (Rama, Paul, Alex, Suzi)
decide whether to write a paper on transcription overhaul and development of new method for representing functions (?)
further development of TermGenie (software group)
development of stable IDs for Panther families (Paul)

Thursday Sept 9, Day 3

Highlights of the last 5 years

Paul's LEGO

Modular GO annotations
- Relations in annotations rather than only in ontology. Add relations in annotation - process to process or component and so on.
- Visualization needs
- It is post-composing. You are not creating GO ids for this complex story. You are computing on these terms to come up with this story.
- Judy- keep these thoughts in mind while we are developing new Annotation tools.
- Tool- Curator can markup the annotations somehow to make the connections and a tool will put things together to make a story.
- Standardization of generic modules is role for ontology development.

Alex's proposal

Chain of evidence. Breakdown annotations. Each annotation gets its own ID and then link them up using relationships.
SUzi- these 2 systems are complementary. There is a nesting.
Chris- they are equivalent?

Discussion

All these story telling systems need some manual annotation. Yes, curator should have control about which blocks go in which order. When do these become a term in the ontology?

Hope is these annotations will be fed into a central system, connections made, and then fed back to the MODs rather than having each tool to make a tool.

Karen's statement- functions are different in different species. [fill in]. One function involved in multiple processes. You don't want to be annotating to all the multiple processes. You want to show the user something simpler rather than the whole thing.

Chris-Are we embracing post-composing? Are we creating too many IDs by precomposing these terms? Is it simpler for a curator to deal with smaller # of terms and they use it make a story? More pieces of biology to capture, more longer it takes for curators.
Mary (RGD)- post-composed terms that have already been made shd be available for curators to see so modules can be reused

Developing To Do List

Managers: Aims thoughts (next manager's call)
Term Genie extensions (team: Chris, Seth)
Goche/ChEBI story (team: David, Jane, Amelia, Chris, Harold)
Chain of Evidence proposal (team?, LEGO, annotation ID, ...)
Broad coverage proposal (team: Pascale, Mike, Kara, Paul, Suzi, Mary, Judy)
Wnt/PAINT

Timelines

Annotation timeline

change annotation jamborees
1st phase of QC checks is almost in place, by October probably
still working out with Chris how to do soft QC checks, e.g. communicating with groups what they need to do, including keeping track of things that have already been checked; tracking things that have actually been removed (so that we can give some stats for the grant showing actual improvement)
improvements to documentation on the website
evidence codes: proof of principle for chain of evidence
dealing with column 16 in AmiGO
loading GAF files - how can we help groups identify appropriate files, procedures
educate annotators; new monthly annotation conference call

Reference genome timeline

PAINT is now usable, but no programmer on it specifically
Panther protein families - last one came out in June
MOD curation - need to determine next family, and how to choose
Protein family annotation - MIke showed progress yesterday
curation report tool - almost working, no discussion yet on where this will live, whether it will be available to users
Papers - hope to have a paper(s) submitted before end of year

Ontology timeline

ChEBI - DAvid showed yesterday, in process of writing paper
internal cross products, put in as sets, regulates, cell component part of done, next: involved in; more sets slated for 2011
F-P links - now ongoing, get added as part of various projects
Generic GO slim - almost ready to be released, planning to write paper by end of year
taxon checks - done, paper accepted
virus terms - draft done, hopefully terms in by end of year
signalling - Becky discussed yesterday
txn
kidney terms - nearly done
PAMGO - paper later this year
SF items - ongoing
Cognition - done
protein complexes in GO- ongoing (Harold) - going in for mouse and human in a pathway specific manner
neurological processes and components - in progress, some people have agreed to work on annotation
protein modification - new effort, with John Garavelli

Software timeline

MIREOT - closure for what Amina's been working on for OBO-Edit
transitioning to OWL - ongoing for year or so
full text indexing
column 16 - still need to hire someone with ARRA funding
column 17 - need better relations filtering in AmiGO before we load there
integration of AmiGO & QuickGO - still working out details on how far this will go on how much this will be integrated
Ref genome data management reporting - Seth & Mary, saw some earlier at the meeting
IndiGO annotation editor - will be doing feasibility study in a few months
GO Galaxy - work on map2slip, should come up wiht formal spec for slimming so that users aren't confused by getting diff results at diff MOD tools
Taxon - heard about already
increased expressivity of annotations -
generalized rule engine - to integrate taxon constraints, F-P inferences etc.
incremental database loading - so that users have access to more up-to-date info
virtualization - Seth will be working on this
Schema overhaul - will be proceeding with this now that we have ARRA funding
OBO-Edit - currently on beta6; 2.1 is planned for Oct 29 with a release candidate 3 weeks before; dates are already on OBO-Edit page (links to page will be made more directly)
Rama asked question about whether there will be any future demand that all groups use the full file. Chris felt that we will continue to support/provide many cuts of the file for different purposes

Comments

Some groups have put links on their pages to info about Ref Genome. Rama was wondering if others have ideas for how to publicize this project better. Discussion on perception Reference Genome, whether it is perceived as a separate project, or whether the tag is conveying any value to users. No dispute on the importance of the annotations, but multiple perspectives on how to convey it. Pascale wants to focus on entire genomes, rather than labelling specific genes as part of the project.

We're going in for another grant renewal for 5 more years. We'll need to be clear on what resources we'll need and what we can realistically do in the given time frame, so that we can write the grant to justify what we're asking for.

Talk:2010 Bar Harbor Agenda

Contents

Minutes

Tuesday Sept 7, Day 1(Morning until lunch)

Grant Aims (Suzi)

Annotation Advocacy Group Report (Rama)

Ref.genome Project (Pascale)

Wednesday Sept 8, Day 2

Ontology (Morning)

Aligning GO with ChEBI (David)

TermGenie demo (Jane)

Signaling group update (Becky)

Transcription overhaul (Karen)

Software

Inter-Operational issues

Thursday Sept 9, Day 3

Paul's LEGO

Alex's proposal

Discussion

Developing To Do List

Timelines

Comments

Navigation menu