RefG Bar Harbor May 20-21 2010

From GO Wiki
Jump to: navigation, search

Action items from May BHB-RefGen meeting

Ref genome Annotation priorities

  • see suggestions for 'annotation projects': Strategy_for_establishing_RefG_annotation_priorities: conserved, general processes of wide interest to the research community; to which many groups can contribute
  • Ideas to rapidly increase the number of families annotated:
    • One suggestion is conserved genes (includes genes involved in cellular metabolism, but it can be wider than that)
    • Another suggestion is to do the genes with no data - we can probably compute that : if there is 0 or 1 member with EXP, that family should be considered done.
    • Perform automated triangulation?

Action items for Ontology development

  • Transcription factor activity part of the ontology is wrong. High priority.
  • Ontology check- David will look at the dual specificity of phosphatase. serine/threonine and tyrosine phosphatase activity

Action items for PAINT developers

  • Can PAINT give a Summary report (Stats on #_of_genes/species #_of_annotations)? (added to the PAIN SF tracker - PG May 25-2010)
  • ref. genome SF tracker- we need to send digests to the ref.genome mailing lists
  • PAINT- improve FIND feature
  • Use the EXT obo file
  • Propagation to process terms- MF involved in a particular BP. If there are multiple BPs how do you propagate? think about it.
  • PAINT -sometimes when you click on a protein name on the left, the display doesn't refresh on the right side.
  • Issues with protein names in the PAINT TABLE. These names are the UniProt names. It shd display the names from in GO database. Make sure the IDs are right.

Action items for Annotation group

  • Improve "NOT guidelines"
    • NOT annotations- How to propagate from a NOT MF to BP terms? When PAINT curators see NOT annotations, ask the respective MOD curators to confirm/clarify.
    • When a paper reports for example- MSH gene is NOT involved in GACC mismatch repair, you have to be careful about making a NOT annotation. In this case a specific term shd be requested to high light that substrate specificity.
  • QC check for references: RGD has "RGD:null" as reference. We shd have a check to make sure there is a valid reference.
    • Push for PubMed IDs in the ref.column. Of course if Pubmed is not available, then other IDs are okay. There shd be a check to make sure there is some legit ID.
    • Internal References- for example the ND reference. Make sure those references are in the GO_REF collection. Educate groups about this process.

Agenda

May 20th, morning

Introduction

  • 8:15 - 8:45 Arrive: coffee
    • Rm ERB 2535
  • 8:45 - 9:00 Welcome: Judy
    • Introductions
  • 9:00 - 9:30 Overview: Pascale

Goals for this Meeting

  1. This meeting is the first result from a 'topic' approach to selection of ref genome target sets
  2. We want a method that results in a 'set' of annotations that could result in a 'story'with potential new information and 'publication'
  3. We want to evaluate confirm the PAINT Annotation process that generates inferential annotations based on the MOD experimental annotations
    • Description of the rationale and strategy for carrying out integrated biocuration projects

(see http://gocwiki.geneontology.org/index.php/Strategy_for_establishing_RefG_annotation_priorities)


How the GO represents lung branching morphogenesis

  • 9:30 - 10:15 David
    • Will include live review of GO structures in several areas
    • Including epithelial-mesenchymal transition

Seminar: Zena Werb

  • 10:30 - 11:30
  • CC Little Auditorium
  • '"Of mice and women: How studying mammary development gives us insights into breast cancer"


Rationale for target selection

  • 11:30 - 12:15 Target Selection: Carol
    • Review of basic biology involved in branching morphogenesis, specifically in the lung. .
    • Why/How was this particular set of genes chosen?
    • How do the selected targets relate to one another biologically? (viz. a diagram showing their relationships to one another)

Lunch 12:15 - 1:30 Roscoe's

May 20th, afternoon

Status of the completed MOD experiment-based annotations

  • 1:30 - 2:15 Li, wiki pages
    • Question from Mike L.: How do we interpret lung-specific annotations when it comes to propagating annotations to zebrafish?

Action item We need a lung page similar to the heart page.

Tree review: FOXF1A & related members

Mike

Possible topics for discussion:

Action Items:

  • Is there TPV for transcription factor complex? How are transcription factors represented in bacteria? Is there a mitochondrial TF complex?
  • Arjun's Q- how do you know a TF complex is in the nucleus. Arjun is questioning the experimental data. If there is inconsistency in the exp data, MikeL contacts the curators.
  • Hernan (KrasnowLab)-These are stronger inferences than based on Sequence similarity. Should you come up with a new evidence called 'Inferred from Phylogenetic analysis'? Good question.
  • Transcription factor activity part of the ontology is wrong. High priority.
  • if you use external information (example-posum), then document it.
  • Chris- taxon triggers- while propagating, look at the taxon data to check which term is applicable for which species, and if you find something new that is not already in the taxon file, feed it back to Chris.
  • Perhaps incorporate these taxon constraints into PAINT.
  • Make sure bugs in Multiple sequence alignment is fixed.
  • Ossification- Mike will ask for clarification
  • Typos in IDs (pubmed, GO termID etc) are common type of errors observed so far.
  • Was there a term/granular term added since an annotation was made? can this check be automated?
  • Priority: How do we speed this process of inferring? Hopefully MikeL and others will works out the details in PAINT and then we can figure out how to be in steady-state for inferences. Do we need to give large number of annotations to the community? May be curators at MODs can spend time doing inferences than reading papers (just a proposal!)?
  • Should we look at genes that don't have lot of experimental data and run through the inference quickly? Just so we have coverage?
  • Can PAINT give a Summary report (Stats on #_of_genes/species #_of_annotations)?

May 21th, morning

Minutes

  • David Hill- Karen and he are working on transcription factor branch of ontology. Karen is talking to Jim Hu et al about prokaryotic TFs. Karen is going to the transcription meeting end of July. They will have a rough draft of the ontology by then and any unresolved issues will be taken care of there. TF related propagation issues will be taken care of after July.
  • Lessons learnt from Lung targets- Very wide set of genes- we wish for a smaller module- smaller set of genes- all connected hopefully.
  • What about Heart targets? Should we focus on a set of genes within this list?
  • Harold- Can we use IEAs to direct us? SGD did a pilot project- will send numbers to the group
  • Ontology check- David will look at the dual specificity of phosphatase. serine/threonine and tyrosine phosphatase activity
  • ref. genome SF tracker- we need to send digests to the ref.genome mailing lists
  • PAINT- improve FIND feature
  • Use the EXT obo file
  • Propagation to process terms- MF involved in a particular BP. If there are multiple BPs how do you propagate? think about it.
  • PAINT -sometimes when you click on a protein name on the left, the display doesn't refresh on the right side.
  • Move bugs to SF PAINT bug tracker
  • Issues with protein names in the PAINT TABLE. These names are the UniProt names. It shd display the names from in GO database. Make sure the IDs are right.
  • NOT annotations- How to propagate from a NOT MF to BP terms? When PAINT curators see NOT annotations, ask the respective MOD curators to confirm/clarify.
  • When a paper reports for example- MSH gene is NOT involved in GACC mismatch repair, you have to be careful about making a NOT annotation. In this case a specific term shd be requested to high light that substrate specificity.
  • ACTION ITEMs for Annotation group
    • Improve "NOT guidelines"
    • Likely QC check- RGD has "RGD:null" as reference. We shd have a check to make sure there is a valid reference.
    • Push for PubMed IDs in the ref.column. Of course if Pubmed is not available, then other IDs are okay. There shd be a check to make sure there is some legit ID.
    • Internal References- for example the ND reference. Make sure those references are in the GO_REF collection. Educate groups about this process.
  • ABC transporters- 2 subunits- ATP binding subunit and one transporting subunit? annotating complexes and functions of subunits.
  • protein binding- won't propagate using PAINT. Not worth the time and this is in alignment with what we decided for not making protein binding annotations with ISS
  • Long term solution- curators shd request for specific terms such as Ras-binding etc

Tree review: DUSP6 & related members

PTHR10159

Pascale

Tree review: TAP2 & related members

Li

May 21th, afternoon

Summary

Paul

  • If this process is to be effective then what changes and improvements do we need to make?
  • File structure must be simpler for MODs to import (ie, not one GAF per family - but concatenate one file per MOD)
  • New evidence codes must be accepted
  • References must be publicly available
  • It must be clear what data each MOD needs to import
  • Reach out beyond MODs for importing PAINT-generated GAF files

Minutes

  • We need some kind of way to track status of a proposal (matrix). This will allow to reallocate resources.
  • Genes that are conserved in most of the organisms or processes in vertebrates (olfactory response)? Something that happens within a single cell?
  • we shd demonstrate that this project works, we can say more, how we leverage the ref.genome project and hence we should pick some old established pathway like metabolism. We will do this until we establish the project.
  • What do we need to show for the grant?
  • we will make sure any conserved process is represented in vertebrates as well.
  • Metabolic diseases?
  • For every topic listed on the Strategies page, the person associated with that topic shd write a proposal similar to the one Varsha has put together for heart targets
  • Arjun and Hernan suggested retinoic acid pathway, Wnt, EGF and FGF pathways which are part of lung and heart development. pulmonary HP, asthma are all related to lung development. Consult pulmonologists.
  • Arjun- you shd also look into the front end for these data. AmiGO? For a given set of genes I would like to know what are the GO annotations?

Preparatory materials

Background reading

Attendees

from away

  • Pascale Gaudet
  • Suzi Lewis
  • Mike Livstone
  • Paul Thomas
  • Chris Mungall
  • Rama Balakrishnan
  • Hernan Espinoza [Krasnow lab]
  • Arjun Guha [Cardoso lab]


local

  • Li Ni
  • Judy Blake
  • Carol Bult
  • David Hill
  • Mary Dolan
  • Randy Babiuk

others from Jax have been invited to attend as they wish.