RefGenome11Mar08 Phone Conference (Archived)
Tuesday March 11, 1 PM CDT, 11 AM PDT, 7 PM GMT
Dong Hui TAIR
- Kara: update:
We launched the all-vs.-all BLAST on Feb. 18. I generated fasta files based on the gp2protein files that everyone provided. I saved everything and put it on an ftp site here:
with the subdirectories:
gp2protein: contains the gp2protein files used to generate the protein fasta files for the analysis
error: contains IDs from the gp2protein files that were unable to be retrieved from NCBI or UniProt
- Issue: may proteins were identified by their secondary IDs (especially human!)
[ACTION ITEM]: Chris/Emily: figure out secondary IDs. Maybe a script can be generated to map IDs?
fasta: contains the fasta files generated from the gp2protein files
The BLAST is just about done (a bit ahead of schedule!), and the next step is to start OrthoMCL. Rough time line, depending on cluster usage: We'll be able to view the OrthoMCL families in simple list form (query by a gene name, get a list of orthologous genes back) in about two weeks. In a month, we will have phylogenetic trees available, and other handy info, as shown in our current, production version of the web interface:
For this first run, the same basic features will be available for the Ref. Genome stuff. The plan is to send the results around to everyone and see what they think, then we'd collect feedback and suggestions and go from there.
What we need from you:
We'd like to link to each MOD (rather than ENSEMBL, which we do in several cases in the current P-POD) from all the protein IDs for curators' convenience.
Links will get generated to MODs. I'm assuming that we should use the IDs in the first column of the gp2protein files, but if that is not the case, let me know.
Curation tool update
Annotation Quality Control
[ACTION ITEM]: DONE All: please check and comment new version of the graphs http://www.geneontology.org/images/RefGenomeGraphs/
[ACTION ITEM]: All: Annotation Quality control: Please pick an ortholog set from the Curation Targets table http://spreadsheets.google.com/ccc?key=pwOksMOra5uq4vIYjPgefPw
Enter your name in Column K, and open a new item in the SF tracker http://sourceforge.net/tracker/?group_id=36855&atid=1040173
Contact Suzi if you need to be added to this tracker.
More instructions will follow by email.
Review action items
[ACTION ITEM]: (Chris/AmiGO) Look into loading IEAs for reference genome set into AmiGO [in progress]
- The new loading cycle will incorporate IEAs from everything except GOA/Uniprot. Human is loaded separately.
[ACTION ITEM]: (Amelia): Fix web page where the number of annotations are to give an estimated number of protein-coding genes; problems: unmapped genes; splice variants; etc. Maybe this should also be on the ref genome page. USE count from gp2protein file-- then it's all consistent.
in progress. Amelia had some questions: what should be taken as the correct number, the number of unique IDs in the first column [the db that produced the file], or the number in the second column [the UniProt or NCBI ID]? I just checked with Dan and he says that the mapping may not necessarily be one to one.
- Chris/Judy: that may not be a reliable number anyway. At least for human, the proteome is not well documented.
- best would be total number of gene predictions.
- Judy: look at Sue Rhee's recent paper
Ongoing action items
[ACTION ITEM]: can Mary show the date completed on the index page? Possibly - she will try
[ACTION ITEM]: Chris: generate new report that would show errors that need fixing for the Orthology determination project
[ACTION ITEM]: Chris will provide date on the ISS outliers query so that we dont always review the same annotations.
[ACTION ITEM] (Tanya Berardini, Emily Dimmer, Pascale Gaudet, David Hill, Chris Mungall, Kimberly Van Auken): Write up recommendations for usage of ISS, IEA, IC
- Started, IEA,_ISS,_IC_Usage_Discussion
- This depends on whether the IEAs can be shown in AmiGO, at least for ref genomes.
[ACTION ITEM]: Mike will set up 'annotation' calls?
[ACTION ITEM]: all: look at Stan's error reports: http://www.geneontology.org/internal-reports/gp2protein/
- not updated since october
Next conference call
Tuesday April 8, 2008, 10 PM CDT, 8 AM PDT, 4 PM GMT
Return to Reference_Genome_Annotation_Project