RefGenome8Apr08 Phone Conference (Archived)
Tuesday April 8, 10 AM CDT (8 AM PDT, 4 PM BST)
- All: Annotation Quality control: Please pick an ortholog set from the Curation Targets table
- All: Annotation Quality control: Have a look at the SF items and see if the ortholog from your organism is correctly annotated ("comprehensive"). Let lead curator for that set know that you're done.
- Seth: send URL sometime to the prototype of the ortholog tool this week
We were supposed to WebEx "raise hand" feature. We didn't set that up because we expected too many people to attend. Pascale logged in to skype; hopefully people can skype to get attention if needed.
Review action items
1. Chris/Emily: figure out secondary IDs problems (many sequences were not loaded because the IDs were secondary). Maybe a script can be generated to map IDs? [in progress] New gp2protein file will be provided by UniProt. Also, Dan Barell will provide a mapping of secondary IDs. But generally all databases have secondary IDs issues, we need to figure out how to best deal with it.
2. IN PROGRESS. All: Annotation Quality control: Please pick an ortholog set from the Curation Targets table http://spreadsheets.google.com/ccc?key=pwOksMOra5uq4vIYjPgefPw
Enter your name in Column K, and open a new item in the SF tracker http://sourceforge.net/tracker/?group_id=36855&atid=1040173
Contact Suzi if you need to be added to this tracker.
3. Fix problems in annotations and graphs pointed out in the SF "ref genome completion set" tracker. [DONE] David, Chris: David fixed some defs. There is still the problem that not 'anything to do with a heart' can be pulled out from the same branch in the graph. Chris will demo how to use cross products to do that at the next GO meeting.
4. (Chris/AmiGO) Look into loading IEAs for reference genome set into AmiGO [in progress]
- The new loading cycle will incorporate IEAs from everything except GOA/Uniprot. Human is loaded separately.
5. (Amelia): Fix web page where the number of annotations are to give an estimated number of protein-coding genes; problems: unmapped genes; splice variants; etc. Maybe this should also be on the ref genome page. USE count from gp2protein file-- then it's all consistent.
in progress. Amelia had some questions: what should be taken as the correct number, the number of unique IDs in the first column [the db that produced the file], or the number in the second column [the UniProt or NCBI ID]? I just checked with Dan and he says that the mapping may not necessarily be one to one.
- Chris/Judy: that may not be a reliable number anyway. At least for human, the proteome is not well documented.
- best would be total number of gene predictions.
- Judy: look at Sue Rhee's recent paper
6. Annotation summary Graphs:
a. Show the date completed on the index page of the graphs
[Mary] Because each group has its way of entering date information and since we will soon have a better way of entering the data without using the google spreadsheets, I am not sure it is worth the effort to extract the dates now.
b. Distinguish 'not yet annotated' from 'no ortholog' in the graphs
[Mary] The graphs currently distinguish four cases for entries in the comparison matrix for high level terms:
- 'no ortholog' the entry is 'X';
- if an ortholog exists but there is no annotation the entry is 'organism';
- if there is experimental annotation the entry is a color-coded 'organism';
- if there is only ISS annotation the entry is color-coded and enclosed in parentheses '(organism)'.
For example, see: http://www.geneontology.org/images/RefGenomeGraphs/43.html#Slim
Reference Genome Meeting
April 20-21, Salt Lake City
- Discuss agenda
- Kara: update (by email):
- the ClustalW alignments of all the families have finished, and we are on the final analysis/computational step (PHYLIP) needed to generate the pretty phylogenetic graphs. The data thus far have been loaded into the database. If anyone is chomping at the bit to check things out, let me know and I can send you our development URL,though note that the interface is not there yet--our developer is making some improvements to the web display, and right now it is *very* bare bones and in debugging mode. But, you can at least see the members of the orthologous groups.
- we started the protein list that consists of proteins/families prone to erroneous results with these types of ortholog identification methods. It's on the wiki so please feel free to add your favorite (or dreaded, depending how you look at it) proteins.
Curation tool update
- Some requirements are here (David, Doug, Pascale): http://wiki.geneontology.org/index.php/Image:Refgene_Database_V3.ppt
- Chris, Siddhartha, Seth, Mary, Pascale, David, Doug
- Should have something to demo some time this week
[ACTION ITEM] Seth: send URL sometime this week
Annotation Pipeline document
Please have a look: Annotation_pipeline People like it
Annotation Quality Control
- See Annotation_QC
- SF tracker: HPRT1 (Emily)
- there were some ortholog call issues (pombe/cerevisiae); settled now
- if there is experimental annotations, preferably ISS to that (dictyBase is still referring to some InterPro)
- be careful about what to ISS to: AVOID
- grooming behavior, etc
- generally good, people should fix ISS and mark the gene 'comprehensively annotated'
Next conference call
Tuesday May 13, 2008, 1 PM CDT, 11 AM PDT, 7 PM GMT
Return to Reference_Genome_Annotation_Project