9 FEB 2010 RefGen Phone Conference (Archived)
ALL GROUPS: Please fill three sections here:
- Annotation Status of lung curation targets9_FEB_2010_RefGen_Phone_Conference#Lung_curation_targets
- Who captures the 'comprehensive annotation' status?9_FEB_2010_RefGen_Phone_Conference#Plan_for_annotation_status_reporting_tool
- Please describe How_is_your_group_Publicizing_Ref.Genome annotation here: Ideas_for_publicizing_Ref.Genome_Annotation_Data#How_is_your_group_Publicizing_Ref.Genome_Data
- GOA: Emily, Yasmin
- BHF-UCL: Ruth, Varsha
- MGI: Li, Mary
- RGD: Stan
- dictyBase: Pascale, Petra
- TAIR: Donghui
- AgBase: Lakshmi Pillai
- SGD: Julie, Stacia, Rama
- Wormbase: Kimberly Van Auken, Ranjana Kishore
- ZFIN: Doug
- PPOD: Kara Dolinski, Mike Livstone
- Berkeley: Suzi, Ed, Seth
Lung curation targets
- Discuss the priorities (Dec, Jan, and fgf10 from November) and goal (March 1st)
- Please answer prior to the call: Can everyone have the curation targets annotated by then? If not, can you write down which families are problematic?
- GOA: We will have a good stab at getting most of them done, the most problematic will be those with many publications; DAG1, GSTT1, SerpineE1 and E2. We will try to get at least some annotations for each aspect on all of them by March 1st though.
- MGI: We will try our best to complete the Dec lung targets by March 1st. The Dec genes Bmp2, Bmp4 and Ctnnb1 have about 800 pubs, they may not be done by then.
- RGD: The genes on the lists have been curated.
- FlyBase: The Dec genes BMP4, CTNNB1 and HOXA5 are a problem for me as the fly orthologs (decapentaplegic, Antennapedia, bicoid and armadillo) are among the most intensively studied fly genes having 500 to <1000 pubs each. I'll do my best to curate the key features. I haven't started the Jan targets yet but there doesn't seem to be much info for most of the 32 genes so I hope it will possible to complete these by March 1st The other problem we have is that it takes a month or two for my annotations to make it into the public version of FlyBase and then into the ga file so even if the annotations are done by March 1st they won't necessarily be visible to the paint annotators.
- dictyBase: We expect that we will be able to complete the lung targets by March 1st.
- TAIR: We expect that we will be able to complete the lung targets by March 1st.
- AgBase:From the ones that have not already been signed off as completed in the google document spreadsheet, we will be able to annotate PLEKHA2, BMP4, CTNNB1, FOXF1, WNT2, WNT2B.
- SGD:The HMGB2 targets may not be done by then. The curators who are working on them have said that even if they are not done by March 1st, they will be done within a week or two after that.
- pombe Not sure, b ut I don't think there are many lung targets
- Ecoli: EcoliWiki didn't have any lung targets, so we are catching up on our other RefGenome genes.
- Wormbase: We expect that we will be able to complete the lung targets by March 1st.
- ZFIN: It is unlikely that I will get ALL the curation done by the end of Feb. I will most likely be able to have a reasonable amount done on all the genes in the set though if I cherry pick the papers carefully.
Lung Curation Targets
Checking curation status of different groups finds that a few families are problematic for some groups.
- General Strategy:
Suzi - only annotate papers up until there was no new information
Pascale - get GO terms that are most relevant for that gene
not everything needs to be done on March 1st, some genes will be of higher priority so the tree curators can get started
more difficult genes can be given more time
Li and Pascale will check lists on wiki and indicate which genes will be of higher/lower priority
- Group-specific concerns
MGI will try to get papers that at least cover lung branching morphogenesis for genes with hundreds of pubs
FLYBASE - upload schedule may prevent annotations from being available in PAINT
Emily - Susan can put together a file for PAINT if needed, so it's possible to get a FLYBASE GA file earlier for this project
this course of action is applicable to any group
GO database gets updated once a week
Ed Lee, Berkeley
Ed gave a demo of tool
annotations for gene families
family info comes from Panther datbase
links out to protein pages in UniProKB
tab to MSA - multiple species alignment
look at one ontology at a time, radio buttons to change view
bold, red gene names indicates experimental data
click on gene, get annotations in bottom window
Annotating ancestral nodes
annotation matrix tab in bottom window brings up overall view of all associations
GO terms and ancestors associated with leaves
each row represents a gene in the list
black squares - annotation, white squares - indirect associations (parents)
association tab show view of specific annotations
red = annotations from GO database
clicking on ancestral node, automatically see children, bold in table and annotation matrix
select annotation term you want, drag directly on to the tree
if annotating to ancestral node, node will be black
PAINT annotations show in blue background
possible to change view of annotation matrix, use layout option to customize appearance
nodes start as white - orange = direct annotation, black = propagated annotation to descendants
can now see new ISS annotations in associations tab
Use of NOT qualifier
trash can allows for removal of any annotation, can only delete terms on node you annotated to
remove annotations from subfamilies - use qualifier option (e.g., rapid divergence)
Kara: how will rapid_divergence be handled in the GAF?
Suzi: included in GAF but uses the NOT qualifier
Emily: would be good to keep the notes on why the NOT annotation was made
will there be a huge number of NOT annotations from this kind of annotation?
Kara: did a small number of these, found only one case where she didn't want to transfer the annotation
move forward with this version, but if curators find examples where the annotation shouldn't even be included without a NOT, we can then think about revising the software
triangles in annotation matrix - quickly visualize what has been annotated with NOT
GO_Refs - How many?
Evidence tab - for curators to record method of annotation, will be associated with every annotated tree
can use this to explain rationale for annotations
will be public in the GO_Ref file
this is a free text field
Emily: how many GO_Refs?
Suzi: could use generic GO_Ref
Pascale: create a new reference if generic GO_Ref has been edited?
Suzi: thought processes should be captured, space is not a problem
Pascale: every time you curate a family, you want to record comments
Emily: use generic GO_Ref, use WITH column to link out to method
Suzi: as long as we agree to capture this information, we can decide later where the links should go
id'd through the family name
will be recorded in the Reference column of the GAF
Emily: might be nice for users to pull together all of these annotations via common GO_Ref
Rama: technical issues vs are we agreed that we want to create a separate GO_Ref
Pascale: PAINT annotators really felt there was more information they wanted to capture
Emily: why using ISS vs ISO?
Pascale: because some of the proteins are paralogs
transferring amongst family members - some may be paralogs
on tree, paralogs are indicated by squares, orthologs by circles
Paul advocates an evidence of Inferred by Family Relationship, Inferred from Evolutionary Relationship
will this method only use a single evidence code?
one code makes changes/updates easy, multiple codes would be trickier
Emily: want something better than ISS
Suzi: two identifiers in WITH column, protein with experimental annotation, ancestor node
Rama: if multiple proteins with experimental evidence, which is chosen?
Pascale: all, multiple IDs will go in WITH column, will be visible on AmiGO page
Save and Export
Can Save annotations and Export
Save - can come back and see all work
Export - export GAF file for MODs to import with annotations to leaves
Where does the GAF file go after export?
Date comprehensively annoated
Please answer prior to the call: who currently captures the 'date comprehensively annotated' and could generate a file with such information.
- GOA: Yes
- BHF-UCL: Yes (via GOA)
- MGI: We capture this status and can generate a file with such information.
- RGD: We don't capture that information.
- Flybase: At present I only have dates for the ref genome genes. I have requested a database slot to store this info for all genes but this hasn't been implemented yet.
- dictyBase: Yes (we capture curation status for the entire 'gene' curation, but could use that for now)
- TAIR: We have just started to capture this status but do not yet generate a file at this time.
- SGD: We capture this date, but cannot generate a file at this time. We would be able to provide this date eventually.
- pombe not yet, but plan to capture shortly
- Ecoli: EcoliWiki keeps track of the complete list of RefGenome genes in E. coli, the number of annotations for BP, MF & CC separately as well as the completion status (List). We have a category tag that goes on all genes from the RefGenome list as well (Category).
- Wormbase:We capture the date at which we've comprehensively annotated experimental data for RefGenome genes.
- ZFIN: I have 'comprehensively curated' dates for the ref. genome genes. However, this is becoming a less useful date since we recently stopped trying to curate ALL the papers we enter into our DB..so until we get a literature indexer (job search on now) who will associate pubs with genes, I can't really know the true status of the literature for a gene...I can only know the status of the literature we have already associated with the gene. It's still a useful date I guess, just not as air tight as it used to be.
- Tree curators should be alerted when the Date in Column 7 (protein annotations by MODs) is more recent than the date in Column 3 (tree annotation).
- When the Panther families are modified (once-twice per year), there should be a check to verify that the members are still the same. If there are additions or deletions: there should be an alert for curators annotations. We may be able to automate part of the required modifications to annotations.
- Annotations to nodes: All the previous part of the script is just a report based on the trees and the GAF files. However in some cases we annotate to nodes rather to entire trees. There needs to be a way to request a report for a specific tree node. This will be done as a second step.
Not enough time to discuss this issue on this call. Will discuss next month.
Branding Ref.genome annotation data in GOC and within each group
Rama started a wiki page to collect ideas. Use this page to describe how your group is branding reference genomes and to provide suggestions.
Please also take a look at ref.genome annotation data in AmiGO and provide feedback to improve those.
Branding Ref Genome Genes
Rama started a wiki page
please enter what your group is doing
GO annotation camp
- Overview: The annotation camp will take place over 3 days, from June 16-18 (Wednesday-Friday).
- Goals: This annotation camp will be focused on updating and refining skills of existing GO biocurators including new GO biocurators in existing annotation groups and including the Swiss-Prot curation team. We hope members of each MOD will be represented.
- Structure: There will be three (3) ‘focused annotation sessions’ where specific annotation issues will be discussed. Suggestion for discussion topics should be added to the GO camp agenda: 2010_GO_camp_Meeting_Logistics#Suggestions_for_annotation_issues_to_be_discussed
- Deliverables: (1) final annotation documentation for each of the three annotation topics.
- There will be a special emphasis on the reference genome project. Deliverable: (2) annotation propagation rules for the reference genome project.
- Registration: If you already know whether you'll be able to attend or not, please fill the wiki meeting logistics page: 2010_GO_camp_Meeting_Logistics
GO Annotation Camp
3 days in Geneva
focused camp where expert curators can discuss curation issues - goal is to get solid documentation
page on wiki for suggestions of issues to discuss, e.g. protein binding, development
first morning might be more general annotation issues
have three focused discussions - working groups can prepare agendas in advance
Back to Reference Genome Conference_Calls Page