RefGenome12Feb08 Phone Conference (Archived)

From GO Wiki
Jump to: navigation, search

Tuesday February 12, 10 AM CDT (8 AM PDT, 4 PM BST)

Present

  • Pascale
  • Seth
  • Susan
  • Judy
  • David
  • Stacia
  • Emily
  • Rachael
  • Rex
  • Kimberly
  • Val
  • Suzi
  • Chris
  • Tanya
  • Mary

Next Reference Genome Meeting

April 20-21, Salt Lake City

This will be followed by a GO Consortium Meeting on April 22 and 23 in Salt Lake City.

Karen Eilbeck: host

Orthology determination

  • Kara: update:

as of Wed Feb 12: I have the following species still to go:

    • gp2protein_input/gp2protein.rgd
    • gp2protein_input/gp2protein.sgd
    • gp2protein_input/gp2protein.tair
    • gp2protein_input/gp2protein.wb
    • gp2protein_input/gp2protein.zfin

We did manage to get a local copy of uniprot downloaded and installed, so going forward, things should speed up a bit....

For the ones that have finished, I've gotten a few errors (bad IDs) but not a huge number, so I think we're ok.

  • Should we re-run Stan's reports? Chris says no; she will find problems important for her to load the sequences

Curation tool update

  • Chris, Siddhartha, Seth, Mary, Pascale, David, Doug
  • Met last week to define the requirements; good meeting
  • programers are now working on the log in screen
  • expect progress to be faster
  • Berkeley: loaded P-POD locally so they can look at data structure

Updated graphs

Mary

  • I have posted new refG graphs at:

http://www.geneontology.org/images/RefGenomeGraphs/

To simplify comparison of organism annotations (and following Chris' lead on detecting outliers) I have modified the comparison matrix to show only high order GO terms, e.g. PEX1 http://www.geneontology.org/images/RefGenomeGraphs/5189.html#Slim or POLA http://www.geneontology.org/images/RefGenomeGraphs/5422.html#Slim

Please review the graphs and let me know if you notice missing orthologs or annotations -- I am still doing a lot of data editing manually  :( [ACTION ITEM]: All: please check and comment

Annotation Quality Control

  • Issues:
    • We have no QC measures
    • Nobody follows up on annotation issues brought up on the ref genome or annotation email lists.
  • See Annotation_QC

Suzi, Pascale, Val, Emily propose that each curator will be assigned one orthology set to check curation status and possible mistakes in annotations, and make sure the ortholog set get completely curated. There is a new SF tracker https://sourceforge.net/tracker/?group_id=36855&atid=1040173 where each ref genome gene will be assigned to a curator. As a first step, each curator will do one gene as an experiment and we'll discuss at the next call and at the Salt Lake City meeting how things went and how to improve the process.

  • We suggest to document the conclusions of any discussions in an 'Annotation Handbook'

New action items

[ACTION ITEM]: All: please check and comment new version of the graphs http://proto.informatics.jax.org/prototypes/GOgraphEX/RefGenomeGraphs/

[ACTION ITEM]: All: Annotation Quality control: Please pick an ortholog set from the Curation Targets table http://spreadsheets.google.com/ccc?key=pwOksMOra5uq4vIYjPgefPw

Enter your name in Column K, and open a new item in the SF tracker http://sourceforge.net/tracker/?group_id=36855&atid=1040173

Contact Suzi if you need to be added to this tracker.

More instructions will follow by email.

Review action items

[ACTION ITEM] Suzi, Pascale, Emily, Val [and others]: [in progress] Go over the issues relating to quality control. We have set up a SF tracker where each curator will examine an ortholog set and comment on whether it's completely curates, and what problems they might find in the annotations.

[ACTION ITEM]: (Chris/AmiGO) Look into loading IEAs for reference genome set into AmiGO [in progress]

  • The new loading cycle will incorporate IEAs from everything except GOA/Uniprot. Human is loaded separately.

[ACTION ITEM]: (Amelia): Fix web page where the number of annotations are to give an estimated number of protein-coding genes; problems: unmapped genes; splice variants; etc. Maybe this should also be on the ref genome page. USE count from gp2protein file-- then it's all consistent.

in progress. Amelia had some questions: what should be taken as the correct number, the number of unique IDs in the first column [the db that produced the file], or the number in the second column [the UniProt or NCBI ID]? I just checked with Dan and he says that the mapping may not necessarily be one to one.

  • Chris/Judy: that may not be a reliable number anyway. At least for human, the proteome is not well documented.
  • best would be total number of gene predictions.
  • Judy: look at Sue Rhee's recent paper

[ACTION ITEM]: DONE: Mary will include IC in the graphs

[ACTION ITEM]: ADDED TO GOC meeting agenda. Discuss at the GOC meeting whether it would be useful to add the 'comprehensively annotated' tag to all genes, somehow? Either in the gene association file or in the database somehow

[ACTION ITEM]: REJECTED. The two lists do not have that much overlap. Mike (pascale) merge two email lists (reference genome and annotation) into 'annotation'

Ongoing action items

[ACTION ITEM]: can Mary show the date completed on the index page? Possibly - she will try

[ACTION ITEM]: Chris: generate new report that would show errors that need fixing for the Orthology determination project

[ACTION ITEM]: Chris will provide date on the ISS outliers query so that we dont always review the same annotations.

[ACTION ITEM] (Tanya Berardini, Emily Dimmer, Pascale Gaudet, David Hill, Chris Mungall, Kimberly Van Auken): Write up recommendations for usage of ISS, IEA, IC

[ACTION ITEM]: Mike will set up 'annotation' calls?

[ACTION ITEM]: all: look at Stan's error reports: http://www.geneontology.org/internal-reports/gp2protein/

  • not updated since october

Next conference call

Tuesday March 11, 2008, 1 PM CDT, 11 AM PDT, 7 PM GMT

Return to Reference_Genome_Annotation_Project