Difference between revisions of "9 FEB 2010 RefGen Phone Conference (Archived)"

From GO Wiki
Jump to: navigation, search
(Present)
 
(27 intermediate revisions by 8 users not shown)
Line 1: Line 1:
 +
[[Category:Reference Genome]][[Category:Archived]]
 
==ACTION ITEMS==
 
==ACTION ITEMS==
 
'''ALL GROUPS: '''
 
'''ALL GROUPS: '''
Line 7: Line 8:
  
 
==Present==
 
==Present==
* GOA:  
+
* GOA: Emily, Yasmin
* BHF-UCL: Varsha
+
* BHF-UCL: Ruth, Varsha
* MGI:
+
* MGI: Li, Mary
* RGD:  
+
* RGD: Stan
 
* Flybase:  
 
* Flybase:  
* dictyBase:  
+
* dictyBase: Pascale, Petra
* TAIR:  
+
* TAIR: Donghui
 
* AgBase: Lakshmi Pillai
 
* AgBase: Lakshmi Pillai
* SGD:
+
* SGD: Julie, Stacia, Rama
 
* pombe
 
* pombe
 
* Ecoli:  
 
* Ecoli:  
* Wormbase: Kimberly Van Auken  
+
* Wormbase: Kimberly Van Auken, Ranjana Kishore
* ZFIN:
+
* ZFIN: Doug
 
* PPOD: Kara Dolinski, Mike Livstone
 
* PPOD: Kara Dolinski, Mike Livstone
* Berkeley:
+
* Berkeley: Suzi, Ed, Seth
 
----
 
----
  
Line 36: Line 37:
 
* AgBase:From the ones that have not already been signed off as completed in the google document spreadsheet,  we will be able to annotate PLEKHA2, BMP4, CTNNB1, FOXF1, WNT2, WNT2B.
 
* AgBase:From the ones that have not already been signed off as completed in the google document spreadsheet,  we will be able to annotate PLEKHA2, BMP4, CTNNB1, FOXF1, WNT2, WNT2B.
 
* SGD:The HMGB2 targets may not be done by then. The curators who are working on them have said that even if they are not done by March 1st, they will be done within a week or two after that.
 
* SGD:The HMGB2 targets may not be done by then. The curators who are working on them have said that even if they are not done by March 1st, they will be done within a week or two after that.
* pombe
+
* pombe Not sure, b ut I don't think there are many lung targets
 
* Ecoli: EcoliWiki didn't have any lung targets, so we are catching up on our other RefGenome genes.
 
* Ecoli: EcoliWiki didn't have any lung targets, so we are catching up on our other RefGenome genes.
 
* Wormbase: We expect that we will be able to complete the lung targets by March 1st.
 
* Wormbase: We expect that we will be able to complete the lung targets by March 1st.
 
* ZFIN: It is unlikely that I will get ALL the curation done by the end of Feb.  I will most likely be able to have a reasonable amount done on all the genes in the set though if I cherry pick the papers carefully.
 
* ZFIN: It is unlikely that I will get ALL the curation done by the end of Feb.  I will most likely be able to have a reasonable amount done on all the genes in the set though if I cherry pick the papers carefully.
 +
 +
'''Minutes'''
 +
 +
'''Lung Curation Targets'''
 +
 +
Checking curation status of different groups finds that a few families are problematic for some groups.
 +
 +
*General Strategy:
 +
 +
Suzi - only annotate papers up until there was no new information
 +
 +
Pascale - get GO terms that are most relevant for that gene
 +
 +
not everything needs to be done on March 1st, some genes will be of higher priority so the tree curators can get started
 +
 +
more difficult genes can be given more time
 +
 +
Li and Pascale will check lists on wiki and indicate which genes will be of higher/lower priority
 +
 +
*Group-specific concerns
 +
 +
MGI will try to get papers that at least cover lung branching morphogenesis for genes with     
 +
hundreds of pubs
 +
 +
FLYBASE - upload schedule may prevent annotations from being available in PAINT
 +
 +
Emily - Susan can put together a file for PAINT if needed, so it's possible to get a FLYBASE GA file earlier for this project
 +
 +
this course of action is applicable to any group
 +
 +
GO database gets updated once a week
  
 
==PAINT demo==
 
==PAINT demo==
 
Ed Lee, Berkeley
 
Ed Lee, Berkeley
  
==Plan for annotation status reporting tool==
+
'''Minutes'''
Kara and her group have started to work on a tool that would display the annotation status of the Panther families. There will be a regularly updated web page that will show the following information:
+
 
 +
'''PAINT demo'''
 +
 
 +
Ed gave a demo of tool
 +
 
 +
annotations for gene families
 +
 
 +
family info comes from Panther datbase
 +
 
 +
links out to protein pages in UniProKB
 +
 
 +
tab to MSA - multiple species alignment
  
# Panther family ID
+
look at one ontology at a time, radio buttons to change view
# Date selected for concurrent annotation
+
 
# Date Panther family last annotated (tree annotations)
+
bold, red gene names indicates experimental data
# Number of members
+
 
# Number of RefG members
+
click on gene, get annotations in bottom window
# Number of members with EXP
+
 
# Date more recent member last annotated
+
'''Annotating ancestral nodes'''
# We could add 'date comprehensively annotated' for groups that can provide this information:  
+
 
 +
annotation matrix tab in bottom window brings up overall view of all associations
 +
 
 +
GO terms and ancestors associated with leaves
 +
 +
each row represents a gene in the list
 +
 
 +
black squares - annotation, white squares - indirect associations (parents)
 +
 
 +
association tab show view of specific annotations
 +
 
 +
red = annotations from GO database
 +
 
 +
clicking on ancestral node, automatically see children, bold in table and annotation matrix
 +
 
 +
select annotation term you want, drag directly on to the tree
 +
 
 +
if annotating to ancestral node, node will be black
 +
 
 +
PAINT annotations show in blue background
 +
 
 +
possible to change view of annotation matrix, use layout option to customize appearance
 +
 
 +
nodes start as white - orange = direct annotation, black = propagated annotation to descendants
 +
 
 +
can now see new ISS annotations in associations tab
 +
 
 +
'''Use of NOT qualifier'''
 +
 
 +
trash can allows for removal of any annotation, can only delete terms on node you annotated to
 +
 
 +
remove annotations from subfamilies - use qualifier option (e.g., rapid divergence)
 +
 
 +
Kara: how will rapid_divergence be handled in the GAF?
 +
 
 +
Suzi: included in GAF but uses the NOT qualifier
 +
 
 +
Emily: would be good to keep the notes on why the NOT annotation was made
 +
 
 +
will there be a huge number of NOT annotations from this kind of annotation?
 +
 
 +
Kara: did a small number of these, found only one case where she didn't want to transfer the annotation
 +
 
 +
move forward with this version, but if curators find examples where the annotation shouldn't even be included without a NOT, we can then think about revising the software
 +
 
 +
triangles in annotation matrix - quickly visualize what has been annotated with NOT
 +
 
 +
'''GO_Refs - How many?'''
 +
 
 +
Evidence tab - for curators to record method of annotation, will be associated with every annotated tree
 +
 
 +
can use this to explain rationale for annotations
 +
 
 +
will be public in the GO_Ref file
 +
 
 +
this is a free text field
 +
 
 +
Emily: how many GO_Refs?
 +
 
 +
Suzi: could use generic GO_Ref
 +
 
 +
Pascale: create a new reference if generic GO_Ref has been edited?
 +
 
 +
Suzi: thought processes should be captured, space is not a problem
 +
 
 +
Pascale: every time you curate a family, you want to record comments
 +
 
 +
Emily: use generic GO_Ref, use WITH column to link out to method
 +
 
 +
Suzi: as long as we agree to capture this information, we can decide later where the links should go
 +
 
 +
id'd through the family name
 +
 
 +
will be recorded in the Reference column of the GAF
 +
 
 +
Emily: might be nice for users to pull together all of these annotations via common GO_Ref
 +
 
 +
Rama: technical issues vs are we agreed that we want to create a separate GO_Ref   
 +
   
 +
Pascale: PAINT annotators really felt there was more information they wanted to capture
 +
 
 +
'''Evidence Codes'''
 +
 
 +
Emily: why using ISS vs ISO?
 +
 
 +
Pascale: because some of the proteins are paralogs
 +
 
 +
transferring amongst family members - some may be paralogs
 +
 
 +
on tree, paralogs are indicated by squares, orthologs by circles
 +
 
 +
Paul advocates an evidence of Inferred by Family Relationship, Inferred from Evolutionary Relationship
 +
 
 +
will this method only use a single evidence code?
 +
 
 +
one code makes changes/updates easy, multiple codes would be trickier
 +
 
 +
Emily: want something better than ISS
 +
 
 +
'''WITH column'''
 +
 +
Suzi: two identifiers in WITH column, protein with experimental annotation, ancestor node
 +
 
 +
Rama: if multiple proteins with experimental evidence, which is chosen?
 +
 
 +
Pascale: all, multiple IDs will go in WITH column, will be visible on AmiGO page
 +
 
 +
'''Save and Export'''
 +
 
 +
Can Save annotations and Export
 +
 
 +
Save - can come back and see all work
 +
 
 +
Export - export GAF file for MODs to import with annotations to leaves
 +
 
 +
Where does the GAF file go after export?
 +
 
 +
==Date comprehensively annoated==
  
 
<font color = "red">Please answer prior to the call</font>: who currently captures the 'date comprehensively annotated' and could generate a file with such information.  
 
<font color = "red">Please answer prior to the call</font>: who currently captures the 'date comprehensively annotated' and could generate a file with such information.  
Line 66: Line 226:
 
* AgBase:Yes
 
* AgBase:Yes
 
* SGD: We capture this date, but cannot generate a file at this time. We would be able to provide this date eventually.
 
* SGD: We capture this date, but cannot generate a file at this time. We would be able to provide this date eventually.
* pombe
+
* pombe not yet, but plan to capture shortly
 
* Ecoli: EcoliWiki keeps track of the complete list of RefGenome genes in E. coli, the number of annotations for BP, MF & CC separately as well as the completion status ([http://ecoliwiki.net/colipedia/index.php/RefGenome_GO_Annotation_Records List]).  We have a category tag that goes on all genes from the RefGenome list as well ([http://ecoliwiki.net/colipedia/index.php/Category:RefGenome_Annotated_Gene Category]).
 
* Ecoli: EcoliWiki keeps track of the complete list of RefGenome genes in E. coli, the number of annotations for BP, MF & CC separately as well as the completion status ([http://ecoliwiki.net/colipedia/index.php/RefGenome_GO_Annotation_Records List]).  We have a category tag that goes on all genes from the RefGenome list as well ([http://ecoliwiki.net/colipedia/index.php/Category:RefGenome_Annotated_Gene Category]).
 
* Wormbase:We capture the date at which we've comprehensively annotated experimental data for RefGenome genes.  
 
* Wormbase:We capture the date at which we've comprehensively annotated experimental data for RefGenome genes.  
Line 77: Line 237:
  
 
* Annotations to nodes:  All the previous part of the script is just a report based on the trees and the GAF files. However in some cases we annotate to nodes rather to entire trees. There needs to be a way to request a report for a specific tree node. This will be done as a second step.
 
* Annotations to nodes:  All the previous part of the script is just a report based on the trees and the GAF files. However in some cases we annotate to nodes rather to entire trees. There needs to be a way to request a report for a specific tree node. This will be done as a second step.
 +
 +
'''Minutes'''
 +
 +
Not enough time to discuss this issue on this call. 
 +
Will discuss next month.
  
 
==Branding Ref.genome annotation data in GOC and within each group==
 
==Branding Ref.genome annotation data in GOC and within each group==
Line 84: Line 249:
  
 
http://gocwiki.geneontology.org/index.php/Ideas_for_publicizing_Ref.Genome_Annotation_Data
 
http://gocwiki.geneontology.org/index.php/Ideas_for_publicizing_Ref.Genome_Annotation_Data
 +
 +
'''Minutes'''
 +
 +
'''Branding Ref Genome Genes'''
 +
 +
Rama started a wiki page
 +
 +
please enter what your group is doing
  
 
==GO annotation camp==
 
==GO annotation camp==
Line 93: Line 266:
 
#Registration:  If you already know whether you'll be able to attend or not, please fill the wiki meeting logistics page: [[2010_GO_camp_Meeting_Logistics]]
 
#Registration:  If you already know whether you'll be able to attend or not, please fill the wiki meeting logistics page: [[2010_GO_camp_Meeting_Logistics]]
  
 +
'''Minutes'''
 +
 +
'''GO Annotation Camp'''
 +
 +
3 days in Geneva
 +
 +
focused camp where expert curators can discuss curation issues - goal is to get solid documentation
 +
 +
page on wiki for suggestions of issues to discuss, e.g. protein binding, development
 +
 +
first morning might be more general annotation issues
 +
 +
have three focused discussions - working groups can prepare agendas in advance
 +
     
  
 
----
 
----
 
Back to Reference Genome [[Conference_Calls]] Page
 
Back to Reference Genome [[Conference_Calls]] Page

Latest revision as of 08:20, 16 January 2018

ACTION ITEMS

ALL GROUPS: Please fill three sections here:

  1. Annotation Status of lung curation targets9_FEB_2010_RefGen_Phone_Conference#Lung_curation_targets
  2. Who captures the 'comprehensive annotation' status?9_FEB_2010_RefGen_Phone_Conference#Plan_for_annotation_status_reporting_tool
  3. Please describe How_is_your_group_Publicizing_Ref.Genome annotation here: Ideas_for_publicizing_Ref.Genome_Annotation_Data#How_is_your_group_Publicizing_Ref.Genome_Data

Present

  • GOA: Emily, Yasmin
  • BHF-UCL: Ruth, Varsha
  • MGI: Li, Mary
  • RGD: Stan
  • Flybase:
  • dictyBase: Pascale, Petra
  • TAIR: Donghui
  • AgBase: Lakshmi Pillai
  • SGD: Julie, Stacia, Rama
  • pombe
  • Ecoli:
  • Wormbase: Kimberly Van Auken, Ranjana Kishore
  • ZFIN: Doug
  • PPOD: Kara Dolinski, Mike Livstone
  • Berkeley: Suzi, Ed, Seth

Lung curation targets

  1. Discuss the priorities (Dec, Jan, and fgf10 from November) and goal (March 1st)
  2. Please answer prior to the call: Can everyone have the curation targets annotated by then? If not, can you write down which families are problematic?
  • GOA: We will have a good stab at getting most of them done, the most problematic will be those with many publications; DAG1, GSTT1, SerpineE1 and E2. We will try to get at least some annotations for each aspect on all of them by March 1st though.
  • BHF-UCL:
  • MGI: We will try our best to complete the Dec lung targets by March 1st. The Dec genes Bmp2, Bmp4 and Ctnnb1 have about 800 pubs, they may not be done by then.
  • RGD: The genes on the lists have been curated.
  • FlyBase: The Dec genes BMP4, CTNNB1 and HOXA5 are a problem for me as the fly orthologs (decapentaplegic, Antennapedia, bicoid and armadillo) are among the most intensively studied fly genes having 500 to <1000 pubs each. I'll do my best to curate the key features. I haven't started the Jan targets yet but there doesn't seem to be much info for most of the 32 genes so I hope it will possible to complete these by March 1st The other problem we have is that it takes a month or two for my annotations to make it into the public version of FlyBase and then into the ga file so even if the annotations are done by March 1st they won't necessarily be visible to the paint annotators.
  • dictyBase: We expect that we will be able to complete the lung targets by March 1st.
  • TAIR: We expect that we will be able to complete the lung targets by March 1st.
  • AgBase:From the ones that have not already been signed off as completed in the google document spreadsheet, we will be able to annotate PLEKHA2, BMP4, CTNNB1, FOXF1, WNT2, WNT2B.
  • SGD:The HMGB2 targets may not be done by then. The curators who are working on them have said that even if they are not done by March 1st, they will be done within a week or two after that.
  • pombe Not sure, b ut I don't think there are many lung targets
  • Ecoli: EcoliWiki didn't have any lung targets, so we are catching up on our other RefGenome genes.
  • Wormbase: We expect that we will be able to complete the lung targets by March 1st.
  • ZFIN: It is unlikely that I will get ALL the curation done by the end of Feb. I will most likely be able to have a reasonable amount done on all the genes in the set though if I cherry pick the papers carefully.

Minutes

Lung Curation Targets

Checking curation status of different groups finds that a few families are problematic for some groups.

  • General Strategy:

Suzi - only annotate papers up until there was no new information

Pascale - get GO terms that are most relevant for that gene

not everything needs to be done on March 1st, some genes will be of higher priority so the tree curators can get started

more difficult genes can be given more time

Li and Pascale will check lists on wiki and indicate which genes will be of higher/lower priority

  • Group-specific concerns

MGI will try to get papers that at least cover lung branching morphogenesis for genes with hundreds of pubs

FLYBASE - upload schedule may prevent annotations from being available in PAINT

Emily - Susan can put together a file for PAINT if needed, so it's possible to get a FLYBASE GA file earlier for this project

this course of action is applicable to any group

GO database gets updated once a week

PAINT demo

Ed Lee, Berkeley

Minutes

PAINT demo

Ed gave a demo of tool

annotations for gene families

family info comes from Panther datbase

links out to protein pages in UniProKB

tab to MSA - multiple species alignment

look at one ontology at a time, radio buttons to change view

bold, red gene names indicates experimental data

click on gene, get annotations in bottom window

Annotating ancestral nodes

annotation matrix tab in bottom window brings up overall view of all associations

GO terms and ancestors associated with leaves

each row represents a gene in the list

black squares - annotation, white squares - indirect associations (parents)

association tab show view of specific annotations

red = annotations from GO database

clicking on ancestral node, automatically see children, bold in table and annotation matrix

select annotation term you want, drag directly on to the tree

if annotating to ancestral node, node will be black

PAINT annotations show in blue background

possible to change view of annotation matrix, use layout option to customize appearance

nodes start as white - orange = direct annotation, black = propagated annotation to descendants

can now see new ISS annotations in associations tab

Use of NOT qualifier

trash can allows for removal of any annotation, can only delete terms on node you annotated to

remove annotations from subfamilies - use qualifier option (e.g., rapid divergence)

Kara: how will rapid_divergence be handled in the GAF?

Suzi: included in GAF but uses the NOT qualifier

Emily: would be good to keep the notes on why the NOT annotation was made

will there be a huge number of NOT annotations from this kind of annotation?

Kara: did a small number of these, found only one case where she didn't want to transfer the annotation

move forward with this version, but if curators find examples where the annotation shouldn't even be included without a NOT, we can then think about revising the software

triangles in annotation matrix - quickly visualize what has been annotated with NOT

GO_Refs - How many?

Evidence tab - for curators to record method of annotation, will be associated with every annotated tree

can use this to explain rationale for annotations

will be public in the GO_Ref file

this is a free text field

Emily: how many GO_Refs?

Suzi: could use generic GO_Ref

Pascale: create a new reference if generic GO_Ref has been edited?

Suzi: thought processes should be captured, space is not a problem

Pascale: every time you curate a family, you want to record comments

Emily: use generic GO_Ref, use WITH column to link out to method

Suzi: as long as we agree to capture this information, we can decide later where the links should go

id'd through the family name

will be recorded in the Reference column of the GAF

Emily: might be nice for users to pull together all of these annotations via common GO_Ref

Rama: technical issues vs are we agreed that we want to create a separate GO_Ref

Pascale: PAINT annotators really felt there was more information they wanted to capture

Evidence Codes

Emily: why using ISS vs ISO?

Pascale: because some of the proteins are paralogs

transferring amongst family members - some may be paralogs

on tree, paralogs are indicated by squares, orthologs by circles

Paul advocates an evidence of Inferred by Family Relationship, Inferred from Evolutionary Relationship

will this method only use a single evidence code?

one code makes changes/updates easy, multiple codes would be trickier

Emily: want something better than ISS

WITH column

Suzi: two identifiers in WITH column, protein with experimental annotation, ancestor node

Rama: if multiple proteins with experimental evidence, which is chosen?

Pascale: all, multiple IDs will go in WITH column, will be visible on AmiGO page

Save and Export

Can Save annotations and Export

Save - can come back and see all work

Export - export GAF file for MODs to import with annotations to leaves

Where does the GAF file go after export?

Date comprehensively annoated

Please answer prior to the call: who currently captures the 'date comprehensively annotated' and could generate a file with such information.

  • GOA: Yes
  • BHF-UCL: Yes (via GOA)
  • MGI: We capture this status and can generate a file with such information.
  • RGD: We don't capture that information.
  • Flybase: At present I only have dates for the ref genome genes. I have requested a database slot to store this info for all genes but this hasn't been implemented yet.
  • dictyBase: Yes (we capture curation status for the entire 'gene' curation, but could use that for now)
  • TAIR: We have just started to capture this status but do not yet generate a file at this time.
  • AgBase:Yes
  • SGD: We capture this date, but cannot generate a file at this time. We would be able to provide this date eventually.
  • pombe not yet, but plan to capture shortly
  • Ecoli: EcoliWiki keeps track of the complete list of RefGenome genes in E. coli, the number of annotations for BP, MF & CC separately as well as the completion status (List). We have a category tag that goes on all genes from the RefGenome list as well (Category).
  • Wormbase:We capture the date at which we've comprehensively annotated experimental data for RefGenome genes.
  • ZFIN: I have 'comprehensively curated' dates for the ref. genome genes. However, this is becoming a less useful date since we recently stopped trying to curate ALL the papers we enter into our DB..so until we get a literature indexer (job search on now) who will associate pubs with genes, I can't really know the true status of the literature for a gene...I can only know the status of the literature we have already associated with the gene. It's still a useful date I guess, just not as air tight as it used to be.


  • Alerts:
    • Tree curators should be alerted when the Date in Column 7 (protein annotations by MODs) is more recent than the date in Column 3 (tree annotation).
    • When the Panther families are modified (once-twice per year), there should be a check to verify that the members are still the same. If there are additions or deletions: there should be an alert for curators annotations. We may be able to automate part of the required modifications to annotations.
  • Annotations to nodes: All the previous part of the script is just a report based on the trees and the GAF files. However in some cases we annotate to nodes rather to entire trees. There needs to be a way to request a report for a specific tree node. This will be done as a second step.

Minutes

Not enough time to discuss this issue on this call. Will discuss next month.

Branding Ref.genome annotation data in GOC and within each group

Rama started a wiki page to collect ideas. Use this page to describe how your group is branding reference genomes and to provide suggestions.
Please also take a look at ref.genome annotation data in AmiGO and provide feedback to improve those.

http://gocwiki.geneontology.org/index.php/Ideas_for_publicizing_Ref.Genome_Annotation_Data

Minutes

Branding Ref Genome Genes

Rama started a wiki page

please enter what your group is doing

GO annotation camp

  1. Overview: The annotation camp will take place over 3 days, from June 16-18 (Wednesday-Friday).
  2. Goals: This annotation camp will be focused on updating and refining skills of existing GO biocurators including new GO biocurators in existing annotation groups and including the Swiss-Prot curation team. We hope members of each MOD will be represented.
  3. Structure: There will be three (3) ‘focused annotation sessions’ where specific annotation issues will be discussed. Suggestion for discussion topics should be added to the GO camp agenda: 2010_GO_camp_Meeting_Logistics#Suggestions_for_annotation_issues_to_be_discussed
  4. Deliverables: (1) final annotation documentation for each of the three annotation topics.
    • There will be a special emphasis on the reference genome project. Deliverable: (2) annotation propagation rules for the reference genome project.
  5. Registration: If you already know whether you'll be able to attend or not, please fill the wiki meeting logistics page: 2010_GO_camp_Meeting_Logistics

Minutes

GO Annotation Camp

3 days in Geneva

focused camp where expert curators can discuss curation issues - goal is to get solid documentation

page on wiki for suggestions of issues to discuss, e.g. protein binding, development

first morning might be more general annotation issues

have three focused discussions - working groups can prepare agendas in advance



Back to Reference Genome Conference_Calls Page