Talk:2010 Bar Harbor Agenda: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 194: Line 194:
**Kimberly- have some simple descriptions for genes, hard to maintain. LEGO will work to keep these up-to-date, to come up with statements.
**Kimberly- have some simple descriptions for genes, hard to maintain. LEGO will work to keep these up-to-date, to come up with statements.
**IndiGO- How would curators use it? How will it work? WIll it go into a common DB? Then the annotations can be taken out by each MOD..details need to be worked out.
**IndiGO- How would curators use it? How will it work? WIll it go into a common DB? Then the annotations can be taken out by each MOD..details need to be worked out.
**Chris- Galaxy environment- nice workflow and we should look into integrating tools with galxy and allowing a pipeline for multiple analyses.
** deployment- machine systems are different, so deployment is hard. Cloud computing. Use Amazon cloud resources or VM for displays.

Revision as of 11:55, 8 September 2010

Minutes

Day 1

-Mindy Dwinell from RGD- taking over Simon's position
-Sven's first GO meeting

Grant Aims (Suzi)

  • Aim 1:
    • building the ontology- regulates relationship, is_a, connecting to external ontologies (CheBi, Cell ontology)
    • ref. genome- lot of progress
    • general annotation, MODS, WGs coming together, coming up with standards- good progress.

So to some extent our aims will remain the same.

    • How can we get detailed/rich annotations
    • Quality control group
    • rate limiting step- literature curation. Improve on item e. Richness is vague. Elaborate on that item.
  • Aim 2: Ref.genome
    • improve software
  • Aim 3:
  • 2b-Infrastructure to approach these bacterial folks to make annotations. Do we have better relation with JGI now? Suzi- Yes. She has been invited to give a talk. Jonathan Eisen is involved.
  • How to find out who is doing what- lot of c.elegans-pathogens papers. Where do you send the pathogen information? No infrastructure for that. It is not that we are having difficulty identifying genes to annotate.
  • All these items available on the wiki. Please use the tracker and add your thoughts.
  • We want to build the ontology automatically. CheBI is not requesting terms from us. It is the use of other ontologies to build GO.
  • How are we doing? Metrics? are we meeting our goals? If we can provide some sense of completeness, breadth of annotation etc. Scientist who use the GO should provide metrics on how GO worked for us. Some papers say GO is great, some papers use other resources and so on. * Just because the paper used KEGG, doesn't mean they did not like GO. They probably did not get the results they wanted with KEGG. If people are finding GO to be complex, we shd provide cuts of GO (GO-slims). 2 kinds of consumers (MODs and groups like Reactome and then consumers who stay at a distance and plug in their data into GO). Many people who go away are because they don't find the data in GO (not annotated). Karens comment- user said 'i can't remember if the term is in the MF or BP'.
  • Mary-metrics with 'us'. How do we use it (Mods), metrics on how many papers have been curated by the MODs, these metrics are important as well.
  • How about we pair up with a specific group (say cancer biologist). Do analysis with them (now and after 3 months)
  • if we got more mappings to ext. databases, then users don't have to learn a new system. they could say use just metacyc
  • Talk about IEA mappings (parking lot discussion)
  • leverage existing tools (PAINT and PANTHR families) and what we have in GO and do a first pass. Get broad coverage. Depth coverage is costly.
  • so many GO annotations, hard to say which is the main function etc.
  • identify areas that are underannotated based on which parts of the ontology is well developed and curators work on those areas.
  • NOTCH signaling pathway IEA- from Kmberly. Lot of organismal annotations, but not a single one talks about Notch signaling
  • Tanya- Idea on % of effort for each item.
  • One of the ideas - ask the program officer if we get funding from 2 different agencies.
  • Going through list of Tasks from the Cambridge meeting (the long table).
    • talk about Gold standard set so that we can compare annotations
    • did not do a usability study
    • HOMEWORK-everybody should write specific aims for this project. Download the Word document, turn on "track changes", add your comments, and email back to the GO-tops.

Annotation Advocacy Group Report

Minutes

-Mindy Dwinell from RGD- taking over Simon's position
-Sven's first GO meeting

Grant Aims (Suzi)

  • Aim 1:
    • building the ontology- regulates relationship, is_a, connecting to external ontologies (CheBi, Cell ontology)
    • ref. genome- lot of progress
    • general annotation, MODS, WGs coming together, coming up with standards- good progress.

So to some extent our aims will remain the same.

    • How can we get detailed/rich annotations
    • Quality control group
    • rate limiting step- literature curation. Improve on item e. Richness is vague. Elaborate on that item.
  • Aim 2: Ref.genome
    • improve software
  • Aim 3:
  • 2b-Infrastructure to approach these bacterial folks to make annotations. Do we have better relation with JGI now? Suzi- Yes. She has been invited to give a talk. Jonathan Eisen is involved.
  • How to find out who is doing what- lot of c.elegans-pathogens papers. Where do you send the pathogen information? No infrastructure for that. It is not that we are having difficulty identifying genes to annotate.
  • All these items available on the wiki. Please use the tracker and add your thoughts.
  • We want to build the ontology automatically. CheBI is not requesting terms from us. It is the use of other ontologies to build GO.
  • How are we doing? Metrics? are we meeting our goals? If we can provide some sense of completeness, breadth of annotation etc. Scientist who use the GO should provide metrics on how GO worked for us. Some papers say GO is great, some papers use other resources and so on. * Just because the paper used KEGG, doesn't mean they did not like GO. They probably did not get the results they wanted with KEGG. If people are finding GO to be complex, we shd provide cuts of GO (GO-slims). 2 kinds of consumers (MODs and groups like Reactome and then consumers who stay at a distance and plug in their data into GO). Many people who go away are because they don't find the data in GO (not annotated). Karens comment- user said 'i can't remember if the term is in the MF or BP'.
  • Mary-metrics with 'us'. How do we use it (Mods), metrics on how many papers have been curated by the MODs, these metrics are important as well.
  • How about we pair up with a specific group (say cancer biologist). Do analysis with them (now and after 3 months)
  • if we got more mappings to ext. databases, then users don't have to learn a new system. they could say use just metacyc
  • Talk about IEA mappings (parking lot discussion)
  • leverage existing tools (PAINT and PANTHR families) and what we have in GO and do a first pass. Get broad coverage. Depth coverage is costly.
  • so many GO annotations, hard to say which is the main function etc.
  • identify areas that are underannotated based on which parts of the ontology is well developed and curators work on those areas.
  • NOTCH signaling pathway IEA- from Kmberly. Lot of organismal annotations, but not a single one talks about Notch signaling
  • Tanya- Idea on % of effort for each item.
  • One of the ideas - ask the program officer if we get funding from 2 different agencies.
  • Going through list of Tasks from the Cambridge meeting (the long table).
    • talk about Gold standard set so that we can compare annotations
    • did not do a usability study
    • HOMEWORK-everybody should write specific aims for this project. Download the Word document, turn on "track changes", add your comments, and email back to the GO-tops.

Annotation Advocacy Group Report

Ruth gave report on binding working group.

  • Discussion on cross-species expression. Should reciprocal annotations be made for these? Discussion on how to annotate these, what should be in column 8 versus 16.
  • Discussion of whether reciprocal annotations are mandatory, even within the same species, e.g. calmodulin and myosin.
    • ** should GO still be making protein binding annotations - David brought up idea that "protein binding" is isolated within GO, and cannot be linked to specific processes. Kara brought up idea that maybe protein-protein interactions should be left to some other database (IMEX, BioGrid, etc) rather than directly manually annotated by GO curators. There was debate about whether GO should automatically make "protein binding" annotations (either to that term directly or to children) or just leave them separate to be incorporated via a separate resource for enrichment analysis. This remains a topic of discussion.

Rama gave report on SGD's analysis of IPI for catalytic activity terms.

    • ** chains of evidence - Peter gave a hypothetical example and Karen a real example. It came down to the fact that often the conclusion the researchers make, which the annotation is trying to represent, is not based on any single piece of evidence, but a combination of multiple lines of evidence.

Rachael's report -

Pascale's report - downstream effects

Kimberly's report - regulation vs the process itself

  • Question about whether you can distinguish between whether a curator judged that a gp was part of the process, or whether they couldn't tell whether it was part of the process or just regulated it.
  • Caution should be applied when annotating to regulating terms (Judy's comment that annotators need to be skilled and highly trained biologists)
  • David suggested that when a curator comes across a case where they know that a function also regulates a process, that they should request an ontology change to capture that.
  • A ligand in a signalling pathway should not be annotated to regulation.

Rama - summarizing some maintenance details

  • consolidation of mailing lists
  • preview of changes to GO website (separation of documentation for manual vs IEA annotation)
  • QC checks (hard means automatically removed)
    • InterPro is considering teh proposed removal of annotations to "protein binding" (proposed for Hard QC)
  • plans
    • QC's
      • Hard QC's will be run every week as part of regular filtering script
      • Questions:
        • What is plan on soft QC's?
        • Could we keep some stats on these (good for grant to have this documentation)
    • new evidence system that allows composite lines of evidence, chains, etc.

Rachael demo on new [www.ebi.ac.uk/QuickGO QuickGO interface]. Please give it try, just went live and they'd love feedback on it.

Rama's initial thoughts for grant renewal

  • idea that users could submit suggestions for papers to be annotated; Rachael mentioned that GOA is working on having a form where researchers can submit annotations via a web interface.

Ref.genome Project (Pascale)

  • 12 families are in the CVS repository, 48 species
  • Wnt signaling- there was data for worm, fly and mouse genes that were not in some sub-families
  • 2 week thing is little tricky; Susan- if I have one dedicated person doing ref.genome curation, i could make faster progress.
  • Pascale- can you get to data from a review? Sometimes papers say mammals and don't mention the species.
  • lot of details get discussed. But very hard to find it. Can we put these details in GONuts?
  • Where to post comments- Wiki or SF? Monitoring?
  • Tanya-After MikeL sends an email out, for a given family, I go to the Wiki, how do i easily identify what are the genes and annotations? When? Both at the beginning and after the inferences are made. For example, if you have 4 genes in the family, then you want to see the annotations for all the 4 or have check boxes to select the genes, in one click, to see annotations . Sven's family view just lists the genes. Add link to say 'Has Exp annotations/ # of EXP'.

Such feature requests, please add them to the PAINT SF tracker.

  • rather than picking gene families, pick couple of papers to discuss for electronic jamborees- part of the Annotation monthly conference call.
  • alternate site for PAINT files- directly available from GOC website- geneontology.org/go/gene_association/submission
  • do we reuse these Homolset pages for Panther families? OR do we need a different visualization? Panther families are already available - need to discuss
  • no way to download the Ref.genome annotations. QuickGO has a filter for these genes.
    Pascale- these are random set of genes- why create a filter? what do you want to do with it?


Rachael- publicity?
Judy- comprehensiveness of the annotations across species. We have put these genes through the inferencing pipeline. look at it at the genome level.

  • Publicity- Publicize it and put it at a confidence level next to the manual annotations. Does the end user need to know this came out of the ref.genome/Paint project? The user needs to know the function of the gp.
  • Judy- there was lot of concern about MikeL spending lot of time in inferring via PAINT. But looks like MikeL has figured out a pipeline and things are faster now. This is a tremendous advance since Geneva camp.
    Li- sometimes the annotations have to go just one level down. This might not be high priority for the MODS. is it valuable to go one level down. Make recommendations only if annotating one level down is important for the tree annotation.
    Suzi- these annotations shd be sent out irrespective. Put some priority on the suggestions- Critical/Non-critical. MikeL has already been doing this judgment in some ways.


Harold- do you disregard IMP for process?
MikeL- it depends. I judge depending on what is available.

Day 2

Ontology

  • David- CheBI
    • noticed that while even not alignining with CHEBi, there are inconsistencies in GO.
    • created a chemical ontology within GO called GOche. Chris created this ontology from what already exists in GO.
    • made sure everything is complete. Representation is based on structure and not the roles of the chemcials
    • complex chemicals-nucleotide is a phosphatidic ester with has_part nucleoside and has_part 1-3 phosphates and species position of the phosphate attachment.
    • GO will merge the representation of acids and conjugate basis (ascorbate and ascorbic acid). biologists don't care about acids/bases, but chemists do. Chemists can get to the actual form by going from GO to CHeBI.
    • there will not be a transport cross-product term for every biosynthesis cross-product term.
    • Amelia is working with EC and KEGG about aligning reactions with GOCHe etc.
  • Jane
    • ontology editors are already using the TermGenie.
    • will be in the obo file within a day.
    • For the example that jane demo'd there is an issue with the placement in the ontology. It was too high. Chris, David et al will look into it
    • what happens if you just check postive and negative regulation terms (without checking the regulation term). Genie is smart. It can do some checks.
    • next set of terms shd be the 'involved_in' terms that MikeL usually requests.
  • Becky-Signaling
    • signaling process- high level grouping term. Add a comment to this term to consider the other 'signaling pathway' term. What criteria do curators look at for annotating to signaling terms? The evidence presented in papers is sometimes/usually tricky to interpret.
    • Mary (RGD)shd we concentrate on annotating one pathway at a time?
    • Fix documentation on signaling, the part that Mary read out
    • receptor ligand slide- add synonyms with 'ligand' in it for the receptor agonist and antagonist terms
    • cytokines can be both agonist and antagonists. Jane and Becky will take care of this.
    • Peter- agonist is not the natural ligand?
    • EGF receptor activity- it is actually the receptor that binds to the EGF. The way the term is worded is not super clear. make sure the logic is right in the computable definitions for these terms.
  • Karen- Transcription
    • are you going to give related synonyms for many of these terms? The term names are complex and it will be easy to identify terms if there are as many synonyms
    • Suzi- can we link pictures to these terms, display on AmiGO? because it is easier to understand.
    • How do you annotate to sigma factor activity? Do they have to show binding to both RNAp and the promoter? May not be possible. We need to redefine our annotating paradigm. COmposite evidence.
    • Paul-Role and mechanism how it does its role
    • david-mechanism of functions-this serves as a model for how things should be defined in GO
    • Li- GO:3700. transcription factor activity term- annotations to this term are not accurate currently. Should be reviewed anyway.

Software

  • Chris- nicely broken down by decades
    • MIREOT- bring in subsets of other ontologies so you can make terms.
    • Obo1.4 OWL compatible.
    • Run the taxon triggers and F-->P weekly (on a cron tab). MGI and Val already have loaded the M-P inferences.
    • Judy- have we outgrown the GAF file? In the context of increased expressivity with annotations (col-16, chain of evidence etc). May be! If we are going to create a new Annotation paradigm, we need to define a new format.
    • LEGO-build simple statements from annotations.
    • Kimberly- have some simple descriptions for genes, hard to maintain. LEGO will work to keep these up-to-date, to come up with statements.
    • IndiGO- How would curators use it? How will it work? WIll it go into a common DB? Then the annotations can be taken out by each MOD..details need to be worked out.
    • Chris- Galaxy environment- nice workflow and we should look into integrating tools with galxy and allowing a pipeline for multiple analyses.
    • deployment- machine systems are different, so deployment is hard. Cloud computing. Use Amazon cloud resources or VM for displays.