RefG Princeton April 12-13 2010: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
 
(50 intermediate revisions by 4 users not shown)
Line 1: Line 1:
[[Category:Workshops]]
=Propagation Rules/SOP=
=Propagation Rules/SOP=


General Rules
* In general, we will annotate to the most specific term possible and propagate as far back as possible.
* We will curate exhaustively. For every experimental annotation, either:
** Use it for a propagation
** Write to the MOD to either refute it or suggest a better term, or
** Explain why you didn't use it
Initial Steps
* Look at the tree topology to see if it makes sense.  For example, use OrthoMCL mapping to do a reality check on the tree.  If it does not, contact Paul and the tree will be edited as appropriate.
* Very useful to spend a few minutes looking at a review, geneWiki, etc for an overview of the family when PAINT curators are not familiar.
* Very useful to spend a few minutes looking at a review, geneWiki, etc for an overview of the family when PAINT curators are not familiar.
* Look at the tree topology to see if it makes sense.  For example, use OrthoMCL mapping to do a reality check on the tree.  If it does not, contact Paul and the tree will be edited as appropriate.
* Generally easiest to start with Mol. Function, then Cell. Component, then Biol. Process
* Generally easiest to start with Mol. Function, then Cell. Component, then Biol. Process
* In general, we will annotate to the most specific term possible and propagate as far back as possible.
* Can be useful (leads to improvements to GO structure) to download terms and view the DAG for all terms (possible future feature request)
* We will curate exhaustively by examining every experimental annotation
 
* Can be useful (leads to improvements to GO structure) downloads terms and views the DAG for all terms (possible future feature request)
 
* Every NOT must have an manual note added in the Evidence pane.  Add notes below the generic paragraph that pops up.
Annotation Rules
* When a PAINT curator finds a possible experimental annotation that has not yet been added, the SOP is to contact the MOD curator to request that the annotation be added, but they do not need to wait to do the PAINT curation.  They can just add the note to the Evidence entry that the annotation exists and the tree will be revisited.
* When a PAINT curator finds a possible experimental annotation that has not yet been added, the SOP is to contact the MOD curator to request that the annotation be added, but they do not need to wait to do the PAINT curation.  They can just add the note to the Evidence entry that the annotation exists and the tree will be revisited.
* NOT + rapid divergence = the line will not be in the GAF provided to the MOD but will be retained in the PAINT GAF.  This will  enable the ability to say "do not propagate" to a particular clade, distinguished from adding an explicit NOT.  For a "real" NOT, we will use a different qualifier; these will be exported in the GAF.  This SOP was discussed for quite some time--alternative solutions that we did not like as well: 1) "Do Not Propagate" pruning automatically based on branch length.  2)  manually examine the Ref Genome proteins, but do not look at every single other proteins for other species.
* For closely related genes with opposite annotations, look at the papers and see if they are really contradictory, if so, don't propagate.  If not, contact the MODs to correct the annotation.
* Use common sense and keep the big picture of the tree and knowledge about the family in mind (eg. LON family: propagation of mito., light strand promoter anti-sense binding annotation to base of euks) ie. we should not always limit ourselves to the bare minimal triangulation.  Always include an evidence note when doing so.
* Treat closely related genes with opposite annotations: look at PMIDs and see if they are really contradictory, if so, don't propagate.  If not, contact the MODs to correct the annotation.
* Still do the multiple annotations in cases where we make sourceforge requests for new links in the ontology.
* Still do the multiple annotations in cases where we make sourceforge requests for new links in the ontology.
* Do not propagate GO:0005515 protein binding (will be suppressed from PAINT), GO:0005488 binding, and enzyme binding.  
* For something that looks indirect, is there something that looks more direct?  (IMP's may be more indirect.)  We look for something that could be explaining it and use it if we can.
* Scoping
** Use common sense and keep the big picture of the tree and knowledge about the family in mind (eg. LON family: propagation of mito., light strand promoter anti-sense binding annotation to base of euks) ie. we should not always limit ourselves to the bare minimal triangulation.  Always include an evidence note when doing so.
** We can expand the scope of a BP term to reflect that of related MF and/or CC terms.  E.G. (LONP1): the MF and CC mitochondrial terms apply to the entire LONP1 euk. clade, so we can apply mitochondrial organization to the entire euk. clade.
** If a process (e.g. a "response to x stimulus") involves a mechanism and a target, the scope of an inferred annotation should not extend beyond the phylogenetic distribution of the target.
 
 
Term-specific notes
* Do not propagate GO:0005515 protein binding (will be suppressed from PAINT), GO:0005488 binding, and enzyme binding.
* We will only propagate children of protein binding when the terms are specific enough to indicate a specific protein family and/or it provides useful biological information to the biologist wanting to learn more about this term ie. that molecular function is related to the biological process(es) that are annotated in this family.
* We will only propagate children of protein binding when the terms are specific enough to indicate a specific protein family and/or it provides useful biological information to the biologist wanting to learn more about this term ie. that molecular function is related to the biological process(es) that are annotated in this family.
* We will propagate small molecule binding terms.
* We will propagate small molecule binding terms.
* For questionable terms ("ATP catabolic process"), do not use to annotate, and send a question to see if the terms should be fixed.
Dealing with NOTs
* Every NOT must have an manual note added in the Evidence pane.  Add notes below the generic paragraph that pops up.
* NOT + rapid divergence = the line will not be in the GAF provided to the MOD but will be retained in the PAINT GAF.  This will  enable the ability to say "do not propagate" to a particular clade, distinguished from adding an explicit NOT.  For a "real" NOT, we will use a different qualifier; these will be exported in the GAF.  This SOP was discussed for quite some time--alternative solutions that we did not like as well: 1) "Do Not Propagate" pruning automatically based on branch length.  2)  manually examine the Ref Genome proteins, but do not look at every single other proteins for other species.


=Misc Notes/Action items/Still pending questions=
=Misc Notes/Action items/Still pending questions=


* Read document of proposal about binding terms
* Consider downgrading annotations based only on IEP's.
* Make sure the PANTHER to P-POD OrthoMCL mapping is using the most recent data sets on each end.  Also Add P-POD InParanoid info and a column in PAINT to show it.
Ontology requests
* ser-dependent (parent) -> atp-dependent peptidase (child), need this link, check up to endopeptidase-> sourceforge item: LON family.
* DNA polymerase binding: ask for new term: DNA polymerase gamma binding.  And, human changes annotation to new term.  LON family
* Request Sequence Specific RNA Binding as a new term, and request annotation is changed.  LON family.
* Request to change name and lineage of GO:0070407 to be a child of GO:0006515 with new name "protein catabolic process of proteins misfolded due to oxidative damage" per PMID 12198491 .  Request that this new annotation be made for human and cow.
MOD requests
* Missing MOD annotation to 'sequence-specific DNA binding', will request this: LON family
* Missing MOD annotation to 'sequence-specific DNA binding', will request this: LON family
* ser-dependent (parent) -> atp-dependent peptidase (child), need this link, check up to endopeptidase-> sourceforge item: LON family.
* Read document of proposal about binding terms
* write to Emily, remove ADP binding from human annotation in LON family
* write to Emily, remove ADP binding from human annotation in LON family
* DNA polymerase binding: ask for new term: DNA polymerase gamma bindingAnd, human changes annotation to new term.  LON family
* Request that RGD change rat LONP1 annotation (Q924S5) from peroxisome to mitochondrion
* Request Sequence Specific RNA Binding as a new term, and request annotation is changed.  LON family.
* Ask MOD for an opinion regarding a human IMP annotation (in addition to or instead of the IEP) for LONP1 for "response to hypoxia." Also, should the annotation be to the more specific term "cellular response to hypoxia" instead of "response to hypoxia"?
* Request that RGD change rat LONP1 annotation (Q924S5) from peroxisome to mitochondrion
 
 
Annotation tracker
* Whole family vs. subfamily issue: Bring up the file that we talked about before the Bar Harbor meeting
* Get template urls to generate links to PANTHER beta.


=PAINT feature requests/bugs=
=PAINT feature requests/bugs=
Line 58: Line 95:


=GO annotation camp discussion=
=GO annotation camp discussion=
SOP will be presented at GO annotation camp.  LONP family we be used as an illustrative, "easy" example, and perhaps a "hard" one, for example PGM5, duplication at base of vertebrates, good example of NOT annotation.
Advance discussion with Alan Bridge, Compara.
* Question for Annotation Camp: under what circumstances should "ATP catabolic process" really be "ATPase activity"?  see PMID 12657466.  Note that "ATP hydrolysis" is an exact synonym.  Pascale has sent this question to Rama and Emily.
* Annotation topic: aging
* Annotation topic: "Response to" terms


=misc. discussion items=
=misc. discussion items=


* (Mike): Is the PANTHER to P-POD OrthMCL mapping using the most recent data?  Can we add InParanoid results soon, too?
* (Mike): Is the PANTHER to P-POD OrthMCL mapping using the most recent data?  Can we add InParanoid results soon, too?
* (Mike): Should we fix the dates in the new GAF files to reflect when the annotations were actually made?
* (Mike): Should we fix the dates in the new GAF files to reflect when the annotations were actually made?
**Do not need to do: we'll update these and the date will reflect these.
* (Mike): Could we re-generate the statistics from the GAF files using a script (rather than manually)?
* (Mike): Could we re-generate the statistics from the GAF files using a script (rather than manually)?
** Probably easy enough, once we discuss what stats we're interested in capturing (Ed)
** Probably easy enough, once we discuss what stats we're interested in capturing (Ed)
** Ed is working on this, based on Mike's slide #3 from the GO meeting
* (Mike): Pascale noticed a problem with the literature linkouts to Wormbase, and I just had some trouble with ZFIN
* (Mike): Pascale noticed a problem with the literature linkouts to Wormbase, and I just had some trouble with ZFIN
** Fixed (Ed)
** Fixed (Ed)
Line 70: Line 116:
=Annotation tracker=  
=Annotation tracker=  


Sven, CJM, Seth
Sven was able to join us Tuesday at 1:30
* Current issues/roadblocks
* Need "Date comprehensively annotated" from MOD's
* Subfamily issue: can we deal?
** Maybe not.  See note below.
* Feedback
* How do we deal with pending protein families?
* Can we leverage the annotation reporter to generate the monthly lists in an easier way?
* Subfamilies: can we/should we deal with these?
* Should be used not just to manage PAINT annotation, but to manage the monthly RefGenome lists
* Would be great to get the number of papers per gene, can MODs
* Should link tree icon directly to PANTHER beta web site


=Paper=
Work Flow/usage


* Pascale/Kara are given a set of Uniprot IDs as monthly targets from a MOD, go to AnnotationTracker form that takes a list of Uniprot IDs, and returns the list of PANTHER families
* Pascale/Kara sends the list of PANTHER families to MODs as monthly target
* Idea to propose: Do not use subfamilies any more.  Give each MOD the complete list of genes in a target family and ask them to decide which are important to curate.  This would help uncouple primary annotation from PAINT annotation. This also makes "Date comprehensively annotated" less important than "Date most recent members annotated."
=Paper(s)=
Topics:
*PAINT Software
*DB Infrastructure, Tracking
*Annotation
**Catches errors in annotation
**Corrections to ontology
**Family annotation itself: what can we learn with the evolutionary context
Outline
* Title: ???
* Title: ???
* Authors: as on this mailing list, and possibly adding CJM, Seth, and Sven if we add a section on the DB and the other GO-top PIs
* Authors: as on this mailing list, and possibly adding CJM, Seth, and Sven if we add a section on the DB and the other GO-top PIs
Line 90: Line 157:
* References, Figure Legends, Tables: as they fall out from the above.
* References, Figure Legends, Tables: as they fall out from the above.
----
----
# PAINT app note
# How PAINT can be used to make phylogenomics high throuput
# How this can improve the GO itself
Should 2 and 3 be combined?  3 could potentially be a paper by itself
Possible future papers on individual families?  MSH2 as an example.
1 paper to PLoS Comp Bio, plus an app note to Bioinformatics "so it doesn't get lost."
#2 should include SOP's
3 papers - final thoughts to summarize above:
# Phylogenomics as laid out in Paul's abstract (<em>Paul</em>, Pascale, Chris, Stan, & Suzi)
# App note (<em>Ed</em>, Chris, Mike, Pascale, Paul, & Suzi)
## Very techy, add Sven & Seth if we do tracking db/amigo stuff
# Central GO Curation paper (<em>Pascale</em>, Mike, Kara, ref genome folks)

Latest revision as of 09:53, 15 April 2019

Propagation Rules/SOP

General Rules

  • In general, we will annotate to the most specific term possible and propagate as far back as possible.
  • We will curate exhaustively. For every experimental annotation, either:
    • Use it for a propagation
    • Write to the MOD to either refute it or suggest a better term, or
    • Explain why you didn't use it


Initial Steps

  • Look at the tree topology to see if it makes sense. For example, use OrthoMCL mapping to do a reality check on the tree. If it does not, contact Paul and the tree will be edited as appropriate.
  • Very useful to spend a few minutes looking at a review, geneWiki, etc for an overview of the family when PAINT curators are not familiar.
  • Generally easiest to start with Mol. Function, then Cell. Component, then Biol. Process
  • Can be useful (leads to improvements to GO structure) to download terms and view the DAG for all terms (possible future feature request)


Annotation Rules

  • When a PAINT curator finds a possible experimental annotation that has not yet been added, the SOP is to contact the MOD curator to request that the annotation be added, but they do not need to wait to do the PAINT curation. They can just add the note to the Evidence entry that the annotation exists and the tree will be revisited.
  • For closely related genes with opposite annotations, look at the papers and see if they are really contradictory, if so, don't propagate. If not, contact the MODs to correct the annotation.
  • Still do the multiple annotations in cases where we make sourceforge requests for new links in the ontology.
  • For something that looks indirect, is there something that looks more direct? (IMP's may be more indirect.) We look for something that could be explaining it and use it if we can.
  • Scoping
    • Use common sense and keep the big picture of the tree and knowledge about the family in mind (eg. LON family: propagation of mito., light strand promoter anti-sense binding annotation to base of euks) ie. we should not always limit ourselves to the bare minimal triangulation. Always include an evidence note when doing so.
    • We can expand the scope of a BP term to reflect that of related MF and/or CC terms. E.G. (LONP1): the MF and CC mitochondrial terms apply to the entire LONP1 euk. clade, so we can apply mitochondrial organization to the entire euk. clade.
    • If a process (e.g. a "response to x stimulus") involves a mechanism and a target, the scope of an inferred annotation should not extend beyond the phylogenetic distribution of the target.


Term-specific notes

  • Do not propagate GO:0005515 protein binding (will be suppressed from PAINT), GO:0005488 binding, and enzyme binding.
  • We will only propagate children of protein binding when the terms are specific enough to indicate a specific protein family and/or it provides useful biological information to the biologist wanting to learn more about this term ie. that molecular function is related to the biological process(es) that are annotated in this family.
  • We will propagate small molecule binding terms.
  • For questionable terms ("ATP catabolic process"), do not use to annotate, and send a question to see if the terms should be fixed.


Dealing with NOTs

  • Every NOT must have an manual note added in the Evidence pane. Add notes below the generic paragraph that pops up.
  • NOT + rapid divergence = the line will not be in the GAF provided to the MOD but will be retained in the PAINT GAF. This will enable the ability to say "do not propagate" to a particular clade, distinguished from adding an explicit NOT. For a "real" NOT, we will use a different qualifier; these will be exported in the GAF. This SOP was discussed for quite some time--alternative solutions that we did not like as well: 1) "Do Not Propagate" pruning automatically based on branch length. 2) manually examine the Ref Genome proteins, but do not look at every single other proteins for other species.

Misc Notes/Action items/Still pending questions

  • Read document of proposal about binding terms
  • Consider downgrading annotations based only on IEP's.
  • Make sure the PANTHER to P-POD OrthoMCL mapping is using the most recent data sets on each end. Also Add P-POD InParanoid info and a column in PAINT to show it.


Ontology requests

  • ser-dependent (parent) -> atp-dependent peptidase (child), need this link, check up to endopeptidase-> sourceforge item: LON family.
  • DNA polymerase binding: ask for new term: DNA polymerase gamma binding. And, human changes annotation to new term. LON family
  • Request Sequence Specific RNA Binding as a new term, and request annotation is changed. LON family.
  • Request to change name and lineage of GO:0070407 to be a child of GO:0006515 with new name "protein catabolic process of proteins misfolded due to oxidative damage" per PMID 12198491 . Request that this new annotation be made for human and cow.


MOD requests

  • Missing MOD annotation to 'sequence-specific DNA binding', will request this: LON family
  • write to Emily, remove ADP binding from human annotation in LON family
  • Request that RGD change rat LONP1 annotation (Q924S5) from peroxisome to mitochondrion
  • Ask MOD for an opinion regarding a human IMP annotation (in addition to or instead of the IEP) for LONP1 for "response to hypoxia." Also, should the annotation be to the more specific term "cellular response to hypoxia" instead of "response to hypoxia"?


Annotation tracker

  • Whole family vs. subfamily issue: Bring up the file that we talked about before the Bar Harbor meeting
  • Get template urls to generate links to PANTHER beta.

PAINT feature requests/bugs

  • Down the road feature: be able to launch a DAG viewer to see all annotations in context of GO structure
  • Add domain information
  • Radio buttons color coded based on GO aspect
  • Scrolling in MSA view alters the residue number (bug), enable search to go to specific residues
  • Remove GO:0005515 (protein binding) from the list of terms we see in PAINT

Quick tour for new PAINT users (Li and Mary)

Ed gave a quick tour of the latest version of PAINT.

Review protein families, see: http://wiki.geneontology.org/index.php/GAFs_for_trees-based_annotations While reviewing protein families, we can generate a list of propagation rules. We can pick up lunch in our cafe, and work through lunch.

LONP1/2

  • Annotate root to 'ATP-dependent peptidase activity' based on experimental annotation span across species
  • NOT to radA clade, we know that they do not have this activity, use the missing_residues qualifier
  • Scrolled through rest of alignment to identify others that do not have the active site
  • Missing MOD annotation to 'sequence-specific DNA binding', will request this of MOD, and annotate to root
  • Annotate mito., light strand promoter anti-sense binding annotation to base of eukaryotes. Based simply on data, would go to human-mouse base, but when given some thought about where this happened, should go to the base of eukaryotes.
  • see notes in the abstract generated by this family for more details


CPS

HPRT

GO annotation camp discussion

SOP will be presented at GO annotation camp. LONP family we be used as an illustrative, "easy" example, and perhaps a "hard" one, for example PGM5, duplication at base of vertebrates, good example of NOT annotation. Advance discussion with Alan Bridge, Compara.

  • Question for Annotation Camp: under what circumstances should "ATP catabolic process" really be "ATPase activity"? see PMID 12657466. Note that "ATP hydrolysis" is an exact synonym. Pascale has sent this question to Rama and Emily.
  • Annotation topic: aging
  • Annotation topic: "Response to" terms


misc. discussion items

  • (Mike): Is the PANTHER to P-POD OrthMCL mapping using the most recent data? Can we add InParanoid results soon, too?
  • (Mike): Should we fix the dates in the new GAF files to reflect when the annotations were actually made?
    • Do not need to do: we'll update these and the date will reflect these.
  • (Mike): Could we re-generate the statistics from the GAF files using a script (rather than manually)?
    • Probably easy enough, once we discuss what stats we're interested in capturing (Ed)
    • Ed is working on this, based on Mike's slide #3 from the GO meeting
  • (Mike): Pascale noticed a problem with the literature linkouts to Wormbase, and I just had some trouble with ZFIN
    • Fixed (Ed)

Annotation tracker

Sven was able to join us Tuesday at 1:30

  • Need "Date comprehensively annotated" from MOD's
    • Maybe not. See note below.
  • How do we deal with pending protein families?
  • Subfamilies: can we/should we deal with these?
  • Should be used not just to manage PAINT annotation, but to manage the monthly RefGenome lists
  • Would be great to get the number of papers per gene, can MODs
  • Should link tree icon directly to PANTHER beta web site

Work Flow/usage

  • Pascale/Kara are given a set of Uniprot IDs as monthly targets from a MOD, go to AnnotationTracker form that takes a list of Uniprot IDs, and returns the list of PANTHER families
  • Pascale/Kara sends the list of PANTHER families to MODs as monthly target
  • Idea to propose: Do not use subfamilies any more. Give each MOD the complete list of genes in a target family and ask them to decide which are important to curate. This would help uncouple primary annotation from PAINT annotation. This also makes "Date comprehensively annotated" less important than "Date most recent members annotated."

Paper(s)

Topics:

  • PAINT Software
  • DB Infrastructure, Tracking
  • Annotation
    • Catches errors in annotation
    • Corrections to ontology
    • Family annotation itself: what can we learn with the evolutionary context


Outline

  • Title: ???
  • Authors: as on this mailing list, and possibly adding CJM, Seth, and Sven if we add a section on the DB and the other GO-top PIs
  • Affiliations: obviously
  • Abstract: Paul?
  • Author Summary: Suzi will take a crack at this
  • Introduction: Pascale&Paul
  • Results: as below, with possible addition of DB section and web interface, although this could be a different paper. Ed and Suzi can write #1, Paul&Mike for #2?
  • Discussion: #3 (Mike & Kara)
  • Materials and Methods: cut and dry, write at the end
  • Acknowledgments: all the curators at the MODs, the grant...
  • References, Figure Legends, Tables: as they fall out from the above.

  1. PAINT app note
  2. How PAINT can be used to make phylogenomics high throuput
  3. How this can improve the GO itself

Should 2 and 3 be combined? 3 could potentially be a paper by itself

Possible future papers on individual families? MSH2 as an example.

1 paper to PLoS Comp Bio, plus an app note to Bioinformatics "so it doesn't get lost."

  1. 2 should include SOP's

3 papers - final thoughts to summarize above:

  1. Phylogenomics as laid out in Paul's abstract (Paul, Pascale, Chris, Stan, & Suzi)
  2. App note (Ed, Chris, Mike, Pascale, Paul, & Suzi)
    1. Very techy, add Sven & Seth if we do tracking db/amigo stuff
  3. Central GO Curation paper (Pascale, Mike, Kara, ref genome folks)