9 FEB 2010 RefGen Phone Conference (Archived)

From GO Wiki
Revision as of 11:20, 16 January 2018 by Pascale (talk | contribs) (Pascale moved page 9 FEB 2010 RefGen Phone Conference to 9 FEB 2010 RefGen Phone Conference (Archived))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


ALL GROUPS: Please fill three sections here:

  1. Annotation Status of lung curation targets9_FEB_2010_RefGen_Phone_Conference#Lung_curation_targets
  2. Who captures the 'comprehensive annotation' status?9_FEB_2010_RefGen_Phone_Conference#Plan_for_annotation_status_reporting_tool
  3. Please describe How_is_your_group_Publicizing_Ref.Genome annotation here: Ideas_for_publicizing_Ref.Genome_Annotation_Data#How_is_your_group_Publicizing_Ref.Genome_Data


  • GOA: Emily, Yasmin
  • BHF-UCL: Ruth, Varsha
  • MGI: Li, Mary
  • RGD: Stan
  • Flybase:
  • dictyBase: Pascale, Petra
  • TAIR: Donghui
  • AgBase: Lakshmi Pillai
  • SGD: Julie, Stacia, Rama
  • pombe
  • Ecoli:
  • Wormbase: Kimberly Van Auken, Ranjana Kishore
  • ZFIN: Doug
  • PPOD: Kara Dolinski, Mike Livstone
  • Berkeley: Suzi, Ed, Seth

Lung curation targets

  1. Discuss the priorities (Dec, Jan, and fgf10 from November) and goal (March 1st)
  2. Please answer prior to the call: Can everyone have the curation targets annotated by then? If not, can you write down which families are problematic?
  • GOA: We will have a good stab at getting most of them done, the most problematic will be those with many publications; DAG1, GSTT1, SerpineE1 and E2. We will try to get at least some annotations for each aspect on all of them by March 1st though.
  • BHF-UCL:
  • MGI: We will try our best to complete the Dec lung targets by March 1st. The Dec genes Bmp2, Bmp4 and Ctnnb1 have about 800 pubs, they may not be done by then.
  • RGD: The genes on the lists have been curated.
  • FlyBase: The Dec genes BMP4, CTNNB1 and HOXA5 are a problem for me as the fly orthologs (decapentaplegic, Antennapedia, bicoid and armadillo) are among the most intensively studied fly genes having 500 to <1000 pubs each. I'll do my best to curate the key features. I haven't started the Jan targets yet but there doesn't seem to be much info for most of the 32 genes so I hope it will possible to complete these by March 1st The other problem we have is that it takes a month or two for my annotations to make it into the public version of FlyBase and then into the ga file so even if the annotations are done by March 1st they won't necessarily be visible to the paint annotators.
  • dictyBase: We expect that we will be able to complete the lung targets by March 1st.
  • TAIR: We expect that we will be able to complete the lung targets by March 1st.
  • AgBase:From the ones that have not already been signed off as completed in the google document spreadsheet, we will be able to annotate PLEKHA2, BMP4, CTNNB1, FOXF1, WNT2, WNT2B.
  • SGD:The HMGB2 targets may not be done by then. The curators who are working on them have said that even if they are not done by March 1st, they will be done within a week or two after that.
  • pombe Not sure, b ut I don't think there are many lung targets
  • Ecoli: EcoliWiki didn't have any lung targets, so we are catching up on our other RefGenome genes.
  • Wormbase: We expect that we will be able to complete the lung targets by March 1st.
  • ZFIN: It is unlikely that I will get ALL the curation done by the end of Feb. I will most likely be able to have a reasonable amount done on all the genes in the set though if I cherry pick the papers carefully.


Lung Curation Targets

Checking curation status of different groups finds that a few families are problematic for some groups.

  • General Strategy:

Suzi - only annotate papers up until there was no new information

Pascale - get GO terms that are most relevant for that gene

not everything needs to be done on March 1st, some genes will be of higher priority so the tree curators can get started

more difficult genes can be given more time

Li and Pascale will check lists on wiki and indicate which genes will be of higher/lower priority

  • Group-specific concerns

MGI will try to get papers that at least cover lung branching morphogenesis for genes with hundreds of pubs

FLYBASE - upload schedule may prevent annotations from being available in PAINT

Emily - Susan can put together a file for PAINT if needed, so it's possible to get a FLYBASE GA file earlier for this project

this course of action is applicable to any group

GO database gets updated once a week

PAINT demo

Ed Lee, Berkeley


PAINT demo

Ed gave a demo of tool

annotations for gene families

family info comes from Panther datbase

links out to protein pages in UniProKB

tab to MSA - multiple species alignment

look at one ontology at a time, radio buttons to change view

bold, red gene names indicates experimental data

click on gene, get annotations in bottom window

Annotating ancestral nodes

annotation matrix tab in bottom window brings up overall view of all associations

GO terms and ancestors associated with leaves

each row represents a gene in the list

black squares - annotation, white squares - indirect associations (parents)

association tab show view of specific annotations

red = annotations from GO database

clicking on ancestral node, automatically see children, bold in table and annotation matrix

select annotation term you want, drag directly on to the tree

if annotating to ancestral node, node will be black

PAINT annotations show in blue background

possible to change view of annotation matrix, use layout option to customize appearance

nodes start as white - orange = direct annotation, black = propagated annotation to descendants

can now see new ISS annotations in associations tab

Use of NOT qualifier

trash can allows for removal of any annotation, can only delete terms on node you annotated to

remove annotations from subfamilies - use qualifier option (e.g., rapid divergence)

Kara: how will rapid_divergence be handled in the GAF?

Suzi: included in GAF but uses the NOT qualifier

Emily: would be good to keep the notes on why the NOT annotation was made

will there be a huge number of NOT annotations from this kind of annotation?

Kara: did a small number of these, found only one case where she didn't want to transfer the annotation

move forward with this version, but if curators find examples where the annotation shouldn't even be included without a NOT, we can then think about revising the software

triangles in annotation matrix - quickly visualize what has been annotated with NOT

GO_Refs - How many?

Evidence tab - for curators to record method of annotation, will be associated with every annotated tree

can use this to explain rationale for annotations

will be public in the GO_Ref file

this is a free text field

Emily: how many GO_Refs?

Suzi: could use generic GO_Ref

Pascale: create a new reference if generic GO_Ref has been edited?

Suzi: thought processes should be captured, space is not a problem

Pascale: every time you curate a family, you want to record comments

Emily: use generic GO_Ref, use WITH column to link out to method

Suzi: as long as we agree to capture this information, we can decide later where the links should go

id'd through the family name

will be recorded in the Reference column of the GAF

Emily: might be nice for users to pull together all of these annotations via common GO_Ref

Rama: technical issues vs are we agreed that we want to create a separate GO_Ref

Pascale: PAINT annotators really felt there was more information they wanted to capture

Evidence Codes

Emily: why using ISS vs ISO?

Pascale: because some of the proteins are paralogs

transferring amongst family members - some may be paralogs

on tree, paralogs are indicated by squares, orthologs by circles

Paul advocates an evidence of Inferred by Family Relationship, Inferred from Evolutionary Relationship

will this method only use a single evidence code?

one code makes changes/updates easy, multiple codes would be trickier

Emily: want something better than ISS

WITH column

Suzi: two identifiers in WITH column, protein with experimental annotation, ancestor node

Rama: if multiple proteins with experimental evidence, which is chosen?

Pascale: all, multiple IDs will go in WITH column, will be visible on AmiGO page

Save and Export

Can Save annotations and Export

Save - can come back and see all work

Export - export GAF file for MODs to import with annotations to leaves

Where does the GAF file go after export?

Date comprehensively annoated

Please answer prior to the call: who currently captures the 'date comprehensively annotated' and could generate a file with such information.

  • GOA: Yes
  • BHF-UCL: Yes (via GOA)
  • MGI: We capture this status and can generate a file with such information.
  • RGD: We don't capture that information.
  • Flybase: At present I only have dates for the ref genome genes. I have requested a database slot to store this info for all genes but this hasn't been implemented yet.
  • dictyBase: Yes (we capture curation status for the entire 'gene' curation, but could use that for now)
  • TAIR: We have just started to capture this status but do not yet generate a file at this time.
  • AgBase:Yes
  • SGD: We capture this date, but cannot generate a file at this time. We would be able to provide this date eventually.
  • pombe not yet, but plan to capture shortly
  • Ecoli: EcoliWiki keeps track of the complete list of RefGenome genes in E. coli, the number of annotations for BP, MF & CC separately as well as the completion status (List). We have a category tag that goes on all genes from the RefGenome list as well (Category).
  • Wormbase:We capture the date at which we've comprehensively annotated experimental data for RefGenome genes.
  • ZFIN: I have 'comprehensively curated' dates for the ref. genome genes. However, this is becoming a less useful date since we recently stopped trying to curate ALL the papers we enter into our DB..so until we get a literature indexer (job search on now) who will associate pubs with genes, I can't really know the true status of the literature for a gene...I can only know the status of the literature we have already associated with the gene. It's still a useful date I guess, just not as air tight as it used to be.

  • Alerts:
    • Tree curators should be alerted when the Date in Column 7 (protein annotations by MODs) is more recent than the date in Column 3 (tree annotation).
    • When the Panther families are modified (once-twice per year), there should be a check to verify that the members are still the same. If there are additions or deletions: there should be an alert for curators annotations. We may be able to automate part of the required modifications to annotations.
  • Annotations to nodes: All the previous part of the script is just a report based on the trees and the GAF files. However in some cases we annotate to nodes rather to entire trees. There needs to be a way to request a report for a specific tree node. This will be done as a second step.


Not enough time to discuss this issue on this call. Will discuss next month.

Branding Ref.genome annotation data in GOC and within each group

Rama started a wiki page to collect ideas. Use this page to describe how your group is branding reference genomes and to provide suggestions.
Please also take a look at ref.genome annotation data in AmiGO and provide feedback to improve those.



Branding Ref Genome Genes

Rama started a wiki page

please enter what your group is doing

GO annotation camp

  1. Overview: The annotation camp will take place over 3 days, from June 16-18 (Wednesday-Friday).
  2. Goals: This annotation camp will be focused on updating and refining skills of existing GO biocurators including new GO biocurators in existing annotation groups and including the Swiss-Prot curation team. We hope members of each MOD will be represented.
  3. Structure: There will be three (3) ‘focused annotation sessions’ where specific annotation issues will be discussed. Suggestion for discussion topics should be added to the GO camp agenda: 2010_GO_camp_Meeting_Logistics#Suggestions_for_annotation_issues_to_be_discussed
  4. Deliverables: (1) final annotation documentation for each of the three annotation topics.
    • There will be a special emphasis on the reference genome project. Deliverable: (2) annotation propagation rules for the reference genome project.
  5. Registration: If you already know whether you'll be able to attend or not, please fill the wiki meeting logistics page: 2010_GO_camp_Meeting_Logistics


GO Annotation Camp

3 days in Geneva

focused camp where expert curators can discuss curation issues - goal is to get solid documentation

page on wiki for suggestions of issues to discuss, e.g. protein binding, development

first morning might be more general annotation issues

have three focused discussions - working groups can prepare agendas in advance

Back to Reference Genome Conference_Calls Page