Talk:2010 Bar Harbor Agenda: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
Line 261: Line 261:
* assuming P41
* assuming P41
* assuming 30 pages
* assuming 30 pages
==Day 3==

Revision as of 08:42, 9 September 2010

Minutes

Day 1

-Mindy Dwinell from RGD- taking over Simon's position
-Sven's first GO meeting

Grant Aims (Suzi)

  • Aim 1:
    • building the ontology- regulates relationship, is_a, connecting to external ontologies (CheBi, Cell ontology)
    • ref. genome- lot of progress
    • general annotation, MODS, WGs coming together, coming up with standards- good progress.

So to some extent our aims will remain the same.

    • How can we get detailed/rich annotations
    • Quality control group
    • rate limiting step- literature curation. Improve on item e. Richness is vague. Elaborate on that item.
  • Aim 2: Ref.genome
    • improve software
  • Aim 3:
  • 2b-Infrastructure to approach these bacterial folks to make annotations. Do we have better relation with JGI now? Suzi- Yes. She has been invited to give a talk. Jonathan Eisen is involved.
  • How to find out who is doing what- lot of c.elegans-pathogens papers. Where do you send the pathogen information? No infrastructure for that. It is not that we are having difficulty identifying genes to annotate.
  • All these items available on the wiki. Please use the tracker and add your thoughts.
  • We want to build the ontology automatically. CheBI is not requesting terms from us. It is the use of other ontologies to build GO.
  • How are we doing? Metrics? are we meeting our goals? If we can provide some sense of completeness, breadth of annotation etc. Scientist who use the GO should provide metrics on how GO worked for us. Some papers say GO is great, some papers use other resources and so on. * Just because the paper used KEGG, doesn't mean they did not like GO. They probably did not get the results they wanted with KEGG. If people are finding GO to be complex, we shd provide cuts of GO (GO-slims). 2 kinds of consumers (MODs and groups like Reactome and then consumers who stay at a distance and plug in their data into GO). Many people who go away are because they don't find the data in GO (not annotated). Karens comment- user said 'i can't remember if the term is in the MF or BP'.
  • Mary-metrics with 'us'. How do we use it (Mods), metrics on how many papers have been curated by the MODs, these metrics are important as well.
  • How about we pair up with a specific group (say cancer biologist). Do analysis with them (now and after 3 months)
  • if we got more mappings to ext. databases, then users don't have to learn a new system. they could say use just metacyc
  • Talk about IEA mappings (parking lot discussion)
  • leverage existing tools (PAINT and PANTHR families) and what we have in GO and do a first pass. Get broad coverage. Depth coverage is costly.
  • so many GO annotations, hard to say which is the main function etc.
  • identify areas that are underannotated based on which parts of the ontology is well developed and curators work on those areas.
  • NOTCH signaling pathway IEA- from Kmberly. Lot of organismal annotations, but not a single one talks about Notch signaling
  • Tanya- Idea on % of effort for each item.
  • One of the ideas - ask the program officer if we get funding from 2 different agencies.
  • Going through list of Tasks from the Cambridge meeting (the long table).
    • talk about Gold standard set so that we can compare annotations
    • did not do a usability study
    • HOMEWORK-everybody should write specific aims for this project. Download the Word document, turn on "track changes", add your comments, and email back to the GO-tops.

Annotation Advocacy Group Report

Minutes

-Mindy Dwinell from RGD- taking over Simon's position
-Sven's first GO meeting

Grant Aims (Suzi)

  • Aim 1:
    • building the ontology- regulates relationship, is_a, connecting to external ontologies (CheBi, Cell ontology)
    • ref. genome- lot of progress
    • general annotation, MODS, WGs coming together, coming up with standards- good progress.

So to some extent our aims will remain the same.

    • How can we get detailed/rich annotations
    • Quality control group
    • rate limiting step- literature curation. Improve on item e. Richness is vague. Elaborate on that item.
  • Aim 2: Ref.genome
    • improve software
  • Aim 3:
  • 2b-Infrastructure to approach these bacterial folks to make annotations. Do we have better relation with JGI now? Suzi- Yes. She has been invited to give a talk. Jonathan Eisen is involved.
  • How to find out who is doing what- lot of c.elegans-pathogens papers. Where do you send the pathogen information? No infrastructure for that. It is not that we are having difficulty identifying genes to annotate.
  • All these items available on the wiki. Please use the tracker and add your thoughts.
  • We want to build the ontology automatically. CheBI is not requesting terms from us. It is the use of other ontologies to build GO.
  • How are we doing? Metrics? are we meeting our goals? If we can provide some sense of completeness, breadth of annotation etc. Scientist who use the GO should provide metrics on how GO worked for us. Some papers say GO is great, some papers use other resources and so on. * Just because the paper used KEGG, doesn't mean they did not like GO. They probably did not get the results they wanted with KEGG. If people are finding GO to be complex, we shd provide cuts of GO (GO-slims). 2 kinds of consumers (MODs and groups like Reactome and then consumers who stay at a distance and plug in their data into GO). Many people who go away are because they don't find the data in GO (not annotated). Karens comment- user said 'i can't remember if the term is in the MF or BP'.
  • Mary-metrics with 'us'. How do we use it (Mods), metrics on how many papers have been curated by the MODs, these metrics are important as well.
  • How about we pair up with a specific group (say cancer biologist). Do analysis with them (now and after 3 months)
  • if we got more mappings to ext. databases, then users don't have to learn a new system. they could say use just metacyc
  • Talk about IEA mappings (parking lot discussion)
  • leverage existing tools (PAINT and PANTHR families) and what we have in GO and do a first pass. Get broad coverage. Depth coverage is costly.
  • so many GO annotations, hard to say which is the main function etc.
  • identify areas that are underannotated based on which parts of the ontology is well developed and curators work on those areas.
  • NOTCH signaling pathway IEA- from Kmberly. Lot of organismal annotations, but not a single one talks about Notch signaling
  • Tanya- Idea on % of effort for each item.
  • One of the ideas - ask the program officer if we get funding from 2 different agencies.
  • Going through list of Tasks from the Cambridge meeting (the long table).
    • talk about Gold standard set so that we can compare annotations
    • did not do a usability study
    • HOMEWORK-everybody should write specific aims for this project. Download the Word document, turn on "track changes", add your comments, and email back to the GO-tops.

Annotation Advocacy Group Report

Ruth gave report on binding working group.

  • Discussion on cross-species expression. Should reciprocal annotations be made for these? Discussion on how to annotate these, what should be in column 8 versus 16.
  • Discussion of whether reciprocal annotations are mandatory, even within the same species, e.g. calmodulin and myosin.
    • ** should GO still be making protein binding annotations - David brought up idea that "protein binding" is isolated within GO, and cannot be linked to specific processes. Kara brought up idea that maybe protein-protein interactions should be left to some other database (IMEX, BioGrid, etc) rather than directly manually annotated by GO curators. There was debate about whether GO should automatically make "protein binding" annotations (either to that term directly or to children) or just leave them separate to be incorporated via a separate resource for enrichment analysis. This remains a topic of discussion.

Rama gave report on SGD's analysis of IPI for catalytic activity terms.

    • ** chains of evidence - Peter gave a hypothetical example and Karen a real example. It came down to the fact that often the conclusion the researchers make, which the annotation is trying to represent, is not based on any single piece of evidence, but a combination of multiple lines of evidence.

Rachael's report -

Pascale's report - downstream effects

Kimberly's report - regulation vs the process itself

  • Question about whether you can distinguish between whether a curator judged that a gp was part of the process, or whether they couldn't tell whether it was part of the process or just regulated it.
  • Caution should be applied when annotating to regulating terms (Judy's comment that annotators need to be skilled and highly trained biologists)
  • David suggested that when a curator comes across a case where they know that a function also regulates a process, that they should request an ontology change to capture that.
  • A ligand in a signalling pathway should not be annotated to regulation.

Rama - summarizing some maintenance details

  • consolidation of mailing lists
  • preview of changes to GO website (separation of documentation for manual vs IEA annotation)
  • QC checks (hard means automatically removed)
    • InterPro is considering teh proposed removal of annotations to "protein binding" (proposed for Hard QC)
  • plans
    • QC's
      • Hard QC's will be run every week as part of regular filtering script
      • Questions:
        • What is plan on soft QC's?
        • Could we keep some stats on these (good for grant to have this documentation)
    • new evidence system that allows composite lines of evidence, chains, etc.

Rachael demo on new [www.ebi.ac.uk/QuickGO QuickGO interface]. Please give it try, just went live and they'd love feedback on it.

Rama's initial thoughts for grant renewal

  • idea that users could submit suggestions for papers to be annotated; Rachael mentioned that GOA is working on having a form where researchers can submit annotations via a web interface.

Ref.genome Project (Pascale)

  • 12 families are in the CVS repository, 48 species
  • Wnt signaling- there was data for worm, fly and mouse genes that were not in some sub-families
  • 2 week thing is little tricky; Susan- if I have one dedicated person doing ref.genome curation, i could make faster progress.
  • Pascale- can you get to data from a review? Sometimes papers say mammals and don't mention the species.
  • lot of details get discussed. But very hard to find it. Can we put these details in GONuts?
  • Where to post comments- Wiki or SF? Monitoring?
  • Tanya-After MikeL sends an email out, for a given family, I go to the Wiki, how do i easily identify what are the genes and annotations? When? Both at the beginning and after the inferences are made. For example, if you have 4 genes in the family, then you want to see the annotations for all the 4 or have check boxes to select the genes, in one click, to see annotations . Sven's family view just lists the genes. Add link to say 'Has Exp annotations/ # of EXP'.

Such feature requests, please add them to the PAINT SF tracker.

  • rather than picking gene families, pick couple of papers to discuss for electronic jamborees- part of the Annotation monthly conference call.
  • alternate site for PAINT files- directly available from GOC website- geneontology.org/go/gene_association/submission
  • do we reuse these Homolset pages for Panther families? OR do we need a different visualization? Panther families are already available - need to discuss
  • no way to download the Ref.genome annotations. QuickGO has a filter for these genes.
    Pascale- these are random set of genes- why create a filter? what do you want to do with it?


Rachael- publicity?
Judy- comprehensiveness of the annotations across species. We have put these genes through the inferencing pipeline. look at it at the genome level.

  • Publicity- Publicize it and put it at a confidence level next to the manual annotations. Does the end user need to know this came out of the ref.genome/Paint project? The user needs to know the function of the gp.
  • Judy- there was lot of concern about MikeL spending lot of time in inferring via PAINT. But looks like MikeL has figured out a pipeline and things are faster now. This is a tremendous advance since Geneva camp.
    Li- sometimes the annotations have to go just one level down. This might not be high priority for the MODS. is it valuable to go one level down. Make recommendations only if annotating one level down is important for the tree annotation.
    Suzi- these annotations shd be sent out irrespective. Put some priority on the suggestions- Critical/Non-critical. MikeL has already been doing this judgment in some ways.


Harold- do you disregard IMP for process?
MikeL- it depends. I judge depending on what is available.

Day 2

Ontology

  • David- CheBI
    • noticed that while even not alignining with CHEBi, there are inconsistencies in GO.
    • created a chemical ontology within GO called GOche. Chris created this ontology from what already exists in GO.
    • made sure everything is complete. Representation is based on structure and not the roles of the chemcials
    • complex chemicals-nucleotide is a phosphatidic ester with has_part nucleoside and has_part 1-3 phosphates and species position of the phosphate attachment.
    • GO will merge the representation of acids and conjugate basis (ascorbate and ascorbic acid). biologists don't care about acids/bases, but chemists do. Chemists can get to the actual form by going from GO to CHeBI.
    • there will not be a transport cross-product term for every biosynthesis cross-product term.
    • Amelia is working with EC and KEGG about aligning reactions with GOCHe etc.
  • Jane
    • ontology editors are already using the TermGenie.
    • will be in the obo file within a day.
    • For the example that jane demo'd there is an issue with the placement in the ontology. It was too high. Chris, David et al will look into it
    • what happens if you just check postive and negative regulation terms (without checking the regulation term). Genie is smart. It can do some checks.
    • next set of terms shd be the 'involved_in' terms that MikeL usually requests.
  • Becky-Signaling
    • signaling process- high level grouping term. Add a comment to this term to consider the other 'signaling pathway' term. What criteria do curators look at for annotating to signaling terms? The evidence presented in papers is sometimes/usually tricky to interpret.
    • Mary (RGD)shd we concentrate on annotating one pathway at a time?
    • Fix documentation on signaling, the part that Mary read out
    • receptor ligand slide- add synonyms with 'ligand' in it for the receptor agonist and antagonist terms
    • cytokines can be both agonist and antagonists. Jane and Becky will take care of this.
    • Peter- agonist is not the natural ligand?
    • EGF receptor activity- it is actually the receptor that binds to the EGF. The way the term is worded is not super clear. make sure the logic is right in the computable definitions for these terms.
  • Karen- Transcription
    • are you going to give related synonyms for many of these terms? The term names are complex and it will be easy to identify terms if there are as many synonyms
    • Suzi- can we link pictures to these terms, display on AmiGO? because it is easier to understand.
    • How do you annotate to sigma factor activity? Do they have to show binding to both RNAp and the promoter? May not be possible. We need to redefine our annotating paradigm. COmposite evidence.
    • Paul-Role and mechanism how it does its role
    • david-mechanism of functions-this serves as a model for how things should be defined in GO
    • Li- GO:3700. transcription factor activity term- annotations to this term are not accurate currently. Should be reviewed anyway.

Software

  • Chris- nicely broken down by decades
    • MIREOT- bring in subsets of other ontologies so you can make terms.
    • Obo1.4 OWL compatible.
    • Run the taxon triggers and F-->P weekly (on a cron tab). MGI and Val already have loaded the M-P inferences.
    • Judy- have we outgrown the GAF file? In the context of increased expressivity with annotations (col-16, chain of evidence etc). May be! If we are going to create a new Annotation paradigm, we need to define a new format.
    • LEGO-build simple statements from annotations.
    • Kimberly- have some simple descriptions for genes, hard to maintain. LEGO will work to keep these up-to-date, to come up with statements.
    • IndiGO- How would curators use it? How will it work? WIll it go into a common DB? Then the annotations can be taken out by each MOD..details need to be worked out.
    • Chris- Galaxy environment- nice workflow and we should look into integrating/wrapping tools with galaxy and allowing a pipeline for multiple analyses.
    • deployment- machine systems are different, so deployment is hard. Cloud computing. Use Amazon cloud resources or VM for displays.

Inter-Operational issues

board with possible stories/priorties of next 6 months

ChEBI align story - paper is circulating

Broad coverage via phylogenetic annotation methods

  • action item come up with a couple pages on what would be needed to support this (Pascale, Kara, Suzi, Mike, Paul)
    need to understand how much person-power would be needed to be effective in making progress in this

Annotation Paradigm (Chain of Evidence)

  • action item come up with a couple pages on new paradigm for next grant (Rama, Paul, Alex, Suzi)
    - GO has to go from describing experiments to describing stories
    - ideas for how to implement this:
    • Alex's idea to give each annotation an ID, and then reference these in composite annotations, i.e. "story" annotations
    • can these links be inferred automatically from the existing annotations to terms like "DNA binding" and "regulation of transcription", e.g. MGI fox3A
    • when you are doing ref genome type annotations (comprehensive view of literature), curator is in a good place to understand story and know what type of "story" annotation should be made

Progress to highlight completion of/progress in ontology development

  • transcription factor story; derives from MF-BP links and interconnects with need for chain of evidence
    write a paper on this?
  • signalling can follow some if the methods developed for transcription factors

Software priorities

  • Term Genie - will be needed in order to support increased requests for composite terms, so that ontology editors don't have to manually do all of these

PAINT GAF files

  • stability of Panther IDs -
    - operationally, people will need to be able to construct an url using a Panther ID in order to go to a web page
    - first step of evidence is for a Panther family, e.g. PANTHER:PTHR10046_AN2
    - Paul will work out having stable IDs
    - we'll need to be able to identify when a tree/node changes and there are no longer experimental annotations to support some of the inferences that were made and transmit to annotation groups
    - what do we need to do to help groups that are not importing these files, many groups have already implemented pipeline and others are in progress; most groups will probably be able to do this by the end of the year
  • reference issue PAINT_REF ID goes to a stable file that is in cvs, file can be prettified if that is needed; for groups that are not incorporating the PAINT_REF's into their databases, the files can be processed out of the database in order to include the PAINT_REF's in the full GAF files those MODs submit and which get loaded into AmiGO


User issues

  • How do we get all this data to users in ways that make sense
  • ways to get files (OBO or OWL), to see things (DB-web, AmiGO, Galaxy)
  • What do "regular biologist" users need?
    - GO Term Finder (via GO help), e.g. enrichment,
  • Use cases for users
    - gene set evaluation
    - e.g. GWAS + annotations (for genes and their orthologs)
    - Paul's question: What would they do next with this file
  • ID mapping
    - want this to be automatic without having to do some extra pre-processing
  • slims vs anti-slims
  • types of users
    • experimentalists
    • computational/systems biologists
    • ontology/semantic web/NLP
    • new genome annotations based on new sequencing (NextGen etc)
    • Predictions

How do we assess success of GO?

  • Survey?
  • testimonials: "The GO is the cornerstone of modern computational biology" - Matt Hibbs
  • disable services and measure how many emails we get in a 24 hour period, after we get support letters ;)
  • via GO help desk

Grant mechanism

  • assuming P41
  • assuming 30 pages

Day 3