RefG Princeton April 12-13 2010: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 88: Line 88:
* Should be used not just to manage PAINT annotation, but to manage the monthly RefGenome lists
* Should be used not just to manage PAINT annotation, but to manage the monthly RefGenome lists
* Would be great to get the number of papers per gene, can MODs  
* Would be great to get the number of papers per gene, can MODs  
* Should link tree icon directly to PANTHER beta web site


Work Flow/usage
Work Flow/usage

Revision as of 14:21, 13 April 2010

Propagation Rules/SOP

  • Very useful to spend a few minutes looking at a review, geneWiki, etc for an overview of the family when PAINT curators are not familiar.
  • Look at the tree topology to see if it makes sense. For example, use OrthoMCL mapping to do a reality check on the tree. If it does not, contact Paul and the tree will be edited as appropriate.
  • Generally easiest to start with Mol. Function, then Cell. Component, then Biol. Process
  • In general, we will annotate to the most specific term possible and propagate as far back as possible.
  • We will curate exhaustively by examining every experimental annotation
  • Can be useful (leads to improvements to GO structure) downloads terms and views the DAG for all terms (possible future feature request)
  • Every NOT must have an manual note added in the Evidence pane. Add notes below the generic paragraph that pops up.
  • When a PAINT curator finds a possible experimental annotation that has not yet been added, the SOP is to contact the MOD curator to request that the annotation be added, but they do not need to wait to do the PAINT curation. They can just add the note to the Evidence entry that the annotation exists and the tree will be revisited.
  • NOT + rapid divergence = the line will not be in the GAF provided to the MOD but will be retained in the PAINT GAF. This will enable the ability to say "do not propagate" to a particular clade, distinguished from adding an explicit NOT. For a "real" NOT, we will use a different qualifier; these will be exported in the GAF. This SOP was discussed for quite some time--alternative solutions that we did not like as well: 1) "Do Not Propagate" pruning automatically based on branch length. 2) manually examine the Ref Genome proteins, but do not look at every single other proteins for other species.
  • Use common sense and keep the big picture of the tree and knowledge about the family in mind (eg. LON family: propagation of mito., light strand promoter anti-sense binding annotation to base of euks) ie. we should not always limit ourselves to the bare minimal triangulation. Always include an evidence note when doing so.
  • Treat closely related genes with opposite annotations: look at PMIDs and see if they are really contradictory, if so, don't propagate. If not, contact the MODs to correct the annotation.
  • Still do the multiple annotations in cases where we make sourceforge requests for new links in the ontology.
  • Do not propagate GO:0005515 protein binding (will be suppressed from PAINT), GO:0005488 binding, and enzyme binding.
  • We will only propagate children of protein binding when the terms are specific enough to indicate a specific protein family and/or it provides useful biological information to the biologist wanting to learn more about this term ie. that molecular function is related to the biological process(es) that are annotated in this family.
  • We will propagate small molecule binding terms.
  • We can expand the scope of a BP term to reflect that of related MF and/or CC terms. E.G. (LONP1): the MF and CC mitochondrial terms apply to the entire LONP1 euk. clade, so we can apply mitochondrial organization to the entire euk. clade.
  • For something that looks indirect, is there something that looks more direct? (IMP's may be more indirect.) We look for something that could be explaining it and use it if we can.
  • If a process (e.g. a "response to x stimulus") involves a mechanism and a target, the scope of an inferred annotation should not extend beyond the phylogenetic distribution of the target.
  • For questionable terms ("ATP catabolic process"), do not use to annotate, and send a question to see if the terms should be fixed.

Misc Notes/Action items/Still pending questions

  • Missing MOD annotation to 'sequence-specific DNA binding', will request this: LON family
  • ser-dependent (parent) -> atp-dependent peptidase (child), need this link, check up to endopeptidase-> sourceforge item: LON family.
  • Read document of proposal about binding terms
  • write to Emily, remove ADP binding from human annotation in LON family
  • DNA polymerase binding: ask for new term: DNA polymerase gamma binding. And, human changes annotation to new term. LON family
  • Request Sequence Specific RNA Binding as a new term, and request annotation is changed. LON family.
  • Request that RGD change rat LONP1 annotation (Q924S5) from peroxisome to mitochondrion
  • Ask MOD for an opinion regarding a human IMP annotation (in addition to or instead of the IEP) for LONP1 for "response to hypoxia." Also, should the annotation be to the more specific term "cellular response to hypoxia" instead of "response to hypoxia"?
  • Request to change name and lineage of GO:0070407 to be a child of GO:0006515 with new name "protein catabolic process of proteins misfolded due to oxidative damage" per PMID 12198491 . Request that this new annotation be made for human and cow.
  • Question for Annotation Camp: under what circumstances should "ATP catabolic process" really be "ATPase activity"? see PMID 12657466. Note that "ATP hydrolysis" is an exact synonym. Pascale has sent this question to Rama and Emily.
  • Consider downgrading annotations based only on IEP's.
  • Annotations topic: aging

PAINT feature requests/bugs

  • Down the road feature: be able to launch a DAG viewer to see all annotations in context of GO structure
  • Add domain information
  • Radio buttons color coded based on GO aspect
  • Scrolling in MSA view alters the residue number (bug), enable search to go to specific residues
  • Remove GO:0005515 (protein binding) from the list of terms we see in PAINT

Quick tour for new PAINT users (Li and Mary)

Ed gave a quick tour of the latest version of PAINT.

Review protein families, see: http://wiki.geneontology.org/index.php/GAFs_for_trees-based_annotations While reviewing protein families, we can generate a list of propagation rules. We can pick up lunch in our cafe, and work through lunch.

LONP1/2

  • Annotate root to 'ATP-dependent peptidase activity' based on experimental annotation span across species
  • NOT to radA clade, we know that they do not have this activity, use the missing_residues qualifier
  • Scrolled through rest of alignment to identify others that do not have the active site
  • Missing MOD annotation to 'sequence-specific DNA binding', will request this of MOD, and annotate to root
  • Annotate mito., light strand promoter anti-sense binding annotation to base of eukaryotes. Based simply on data, would go to human-mouse base, but when given some thought about where this happened, should go to the base of eukaryotes.
  • see notes in the abstract generated by this family for more details


CPS

HPRT

GO annotation camp discussion

SOP will be presented at GO annotation camp. LONP family we be used as an illustrative, "easy" example, and perhaps a "hard" one, for example PGM5, duplication at base of vertebrates, good example of NOT annotation. Advance discussion with Alan Bridge, Compara.

misc. discussion items

  • (Mike): Is the PANTHER to P-POD OrthMCL mapping using the most recent data? Can we add InParanoid results soon, too?
  • (Mike): Should we fix the dates in the new GAF files to reflect when the annotations were actually made?
  • (Mike): Could we re-generate the statistics from the GAF files using a script (rather than manually)?
    • Probably easy enough, once we discuss what stats we're interested in capturing (Ed)
  • (Mike): Pascale noticed a problem with the literature linkouts to Wormbase, and I just had some trouble with ZFIN
    • Fixed (Ed)

Annotation tracker

Sven was able to join us Tuesday at 1:30

  • Need "Date comprehensively annotated" from MOD's
  • How do we deal with pending protein families?
  • Subfamilies: can we/should we deal with these?
  • Should be used not just to manage PAINT annotation, but to manage the monthly RefGenome lists
  • Would be great to get the number of papers per gene, can MODs
  • Should link tree icon directly to PANTHER beta web site

Work Flow/usage

  • Pascale/Kara are given a set of Uniprot IDs as monthly targets from a MOD, go to AnnotationTracker form that takes a list of Uniprot IDs, and returns the list of PANTHER families
  • Pascale/Kara sends the list of PANTHER families to MODs as monthly target
  • Idea: Give each MOD a list of genes in a target family and ask them to decide which are important to curate. This would help uncouple primary annotation from PAINT annotation. This also makes "Date comprehensively annotated" less important than "Date most recent members annotated."

Paper

  • Title: ???
  • Authors: as on this mailing list, and possibly adding CJM, Seth, and Sven if we add a section on the DB and the other GO-top PIs
  • Affiliations: obviously
  • Abstract: Paul?
  • Author Summary: Suzi will take a crack at this
  • Introduction: Pascale&Paul
  • Results: as below, with possible addition of DB section and web interface, although this could be a different paper. Ed and Suzi can write #1, Paul&Mike for #2?
  • Discussion: #3 (Mike & Kara)
  • Materials and Methods: cut and dry, write at the end
  • Acknowledgments: all the curators at the MODs, the grant...
  • References, Figure Legends, Tables: as they fall out from the above.