RefGenome8Apr08 Phone Conference (Archived)

Tuesday April 8, 10 AM CDT (8 AM PDT, 4 PM BST)


Pascale dictyBase
Emily EBI
Rachael EBI
Chris NCBO
Val pombe
Stacia SGD
Doug zfin
Victoria RGD
Ranjana wormbase
Kimberly WormBase
Mary MGI
David MGI
Tanya TAIR


  1. All: Annotation Quality control: Please pick an ortholog set from the Curation Targets table

  1. All: Annotation Quality control: Have a look at the SF items and see if the ortholog from your organism is correctly annotated ("comprehensive"). Let lead curator for that set know that you're done.
  2. Seth: send URL sometime to the prototype of the ortholog tool this week


We were supposed to WebEx "raise hand" feature. We didn't set that up because we expected too many people to attend. Pascale logged in to skype; hopefully people can skype to get attention if needed.

Review action items

1. Chris/Emily: figure out secondary IDs problems (many sequences were not loaded because the IDs were secondary). Maybe a script can be generated to map IDs? [in progress] New gp2protein file will be provided by UniProt. Also, Dan Barell will provide a mapping of secondary IDs. But generally all databases have secondary IDs issues, we need to figure out how to best deal with it.

2. IN PROGRESS. All: Annotation Quality control: Please pick an ortholog set from the Curation Targets table

Enter your name in Column K, and open a new item in the SF tracker

Contact Suzi if you need to be added to this tracker.

3. Fix problems in annotations and graphs pointed out in the SF "ref genome completion set" tracker. [DONE] David, Chris: David fixed some defs. There is still the problem that not 'anything to do with a heart' can be pulled out from the same branch in the graph. Chris will demo how to use cross products to do that at the next GO meeting.

4. (Chris/AmiGO) Look into loading IEAs for reference genome set into AmiGO [in progress]

  • The new loading cycle will incorporate IEAs from everything except GOA/Uniprot. Human is loaded separately.

5. (Amelia): Fix web page where the number of annotations are to give an estimated number of protein-coding genes; problems: unmapped genes; splice variants; etc. Maybe this should also be on the ref genome page. USE count from gp2protein file-- then it's all consistent.

in progress. Amelia had some questions: what should be taken as the correct number, the number of unique IDs in the first column [the db that produced the file], or the number in the second column [the UniProt or NCBI ID]? I just checked with Dan and he says that the mapping may not necessarily be one to one.

  • Chris/Judy: that may not be a reliable number anyway. At least for human, the proteome is not well documented.
  • best would be total number of gene predictions.
  • Judy: look at Sue Rhee's recent paper

6. Annotation summary Graphs:

a. Show the date completed on the index page of the graphs

[Mary] Because each group has its way of entering date information and since we will soon have a better way of entering the data without using the google spreadsheets, I am not sure it is worth the effort to extract the dates now.

b. Distinguish 'not yet annotated' from 'no ortholog' in the graphs

[Mary] The graphs currently distinguish four cases for entries in the comparison matrix for high level terms:

  • 'no ortholog' the entry is 'X';
  • if an ortholog exists but there is no annotation the entry is 'organism';
  • if there is experimental annotation the entry is a color-coded 'organism';
  • if there is only ISS annotation the entry is color-coded and enclosed in parentheses '(organism)'.

For example, see:

Reference Genome Meeting

April 20-21, Salt Lake City

  • Discuss agenda


Orthology determination

  • Kara: update (by email):
    • the ClustalW alignments of all the families have finished, and we are on the final analysis/computational step (PHYLIP) needed to generate the pretty phylogenetic graphs. The data thus far have been loaded into the database. If anyone is chomping at the bit to check things out, let me know and I can send you our development URL,though note that the interface is not there yet--our developer is making some improvements to the web display, and right now it is *very* bare bones and in debugging mode. But, you can at least see the members of the orthologous groups.
    • we started the protein list that consists of proteins/families prone to erroneous results with these types of ortholog identification methods. It's on the wiki so please feel free to add your favorite (or dreaded, depending how you look at it) proteins.

Curation tool update

[ACTION ITEM] Seth: send URL sometime this week

Annotation Pipeline document

Please have a look: Annotation_pipeline People like it

Annotation Quality Control

  • See Annotation_QC
  • SF tracker: HPRT1 (Emily)
  • there were some ortholog call issues (pombe/cerevisiae); settled now
  • if there is experimental annotations, preferably ISS to that (dictyBase is still referring to some InterPro)
  • be careful about what to ISS to: AVOID
    • Homodimerization/teramerization
    • grooming behavior, etc
  • generally good, people should fix ISS and mark the gene 'comprehensively annotated'

Next conference call

Tuesday May 13, 2008, 1 PM CDT, 11 AM PDT, 7 PM GMT

