RefGenome12Feb08 Phone Conference (Archived): Difference between revisions
mNo edit summary |
m (Pascale moved page RefGenome12Feb08 Phone Conference to RefGenome12Feb08 Phone Conference (Archived)) |
||
(20 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
[[Category:Reference Genome]][[Category:Archived]] | |||
Tuesday February 12, 10 AM CDT (8 AM PDT, 4 PM BST)''' | Tuesday February 12, 10 AM CDT (8 AM PDT, 4 PM BST)''' | ||
Line 13: | Line 14: | ||
* Kimberly | * Kimberly | ||
* Val | * Val | ||
* Suzi | |||
* Chris | |||
* Tanya | |||
* Mary | |||
== Next Reference Genome Meeting == | == Next Reference Genome Meeting == | ||
Line 22: | Line 27: | ||
==Orthology determination== | ==Orthology determination== | ||
* Kara: update | * Kara: update: | ||
* Should we re-run Stan's reports? | as of Wed Feb 12: | ||
I have the following species still to go: | |||
**gp2protein_input/gp2protein.rgd | |||
**gp2protein_input/gp2protein.sgd | |||
**gp2protein_input/gp2protein.tair | |||
**gp2protein_input/gp2protein.wb | |||
**gp2protein_input/gp2protein.zfin | |||
We did manage to get a local copy of uniprot downloaded and installed, so going forward, things should speed up a bit.... | |||
For the ones that have finished, I've gotten a few errors (bad IDs) but not a huge number, so I think we're ok. | |||
* Should we re-run Stan's reports? Chris says no; she will find problems important for her to load the sequences | |||
==Curation tool update== | ==Curation tool update== | ||
Chris, Siddhartha, Seth, Mary, Pascale, David, Doug | * Chris, Siddhartha, Seth, Mary, Pascale, David, Doug | ||
* Met last week to define the requirements; good meeting | |||
* programers are now working on the log in screen | |||
* expect progress to be faster | |||
* Berkeley: loaded P-POD locally so they can look at data structure | |||
==Updated graphs== | ==Updated graphs== | ||
Mary | Mary | ||
* I have posted new refG graphs | |||
http:// | * I have posted new refG graphs at: | ||
http://www.geneontology.org/images/RefGenomeGraphs/ | |||
To simplify comparison of organism annotations (and following Chris' lead on detecting outliers) I have modified the comparison matrix to show only high order GO terms, e.g. | To simplify comparison of organism annotations (and following Chris' lead on detecting outliers) I have modified the comparison matrix to show only high order GO terms, e.g. | ||
PEX1 http:// | PEX1 http://www.geneontology.org/images/RefGenomeGraphs/5189.html#Slim | ||
or | or | ||
POLA http:// | POLA http://www.geneontology.org/images/RefGenomeGraphs/5422.html#Slim | ||
Please review the graphs and let me know if you notice missing orthologs or annotations -- I am still doing a lot of data editing manually :( | Please review the graphs and let me know if you notice missing orthologs or annotations -- I am still doing a lot of data editing manually :( | ||
[ACTION ITEM]: All: please check and comment | |||
==Annotation Quality Control== | ==Annotation Quality Control== | ||
Line 48: | Line 70: | ||
* See [[Annotation_QC]] | * See [[Annotation_QC]] | ||
Suzi, Pascale, Val, Emily propose that each curator will be assigned one orthology set to check curation status and possible mistakes in annotations, and make sure the ortholog set get completely curated. There is a new SF tracker https://sourceforge.net/tracker/?group_id=36855&atid=1040173 where each ref genome gene will be assigned to a curator. As a first step, each curator will do one gene as an experiment and we'll discuss at the Salt Lake City meeting how things went and how to improve the process. | Suzi, Pascale, Val, Emily propose that each curator will be assigned one orthology set to check curation status and possible mistakes in annotations, and make sure the ortholog set get completely curated. There is a new SF tracker https://sourceforge.net/tracker/?group_id=36855&atid=1040173 where each ref genome gene will be assigned to a curator. As a first step, each curator will do one gene as an experiment and we'll discuss at the next call and at the Salt Lake City meeting how things went and how to improve the process. | ||
*We suggest to document the conclusions of any discussions in an 'Annotation Handbook' | *We suggest to document the conclusions of any discussions in an 'Annotation Handbook' | ||
==New action items== | |||
[ACTION ITEM]: All: please check and comment new version of the graphs | |||
http://proto.informatics.jax.org/prototypes/GOgraphEX/RefGenomeGraphs/ | |||
[ACTION ITEM]: All: Annotation Quality control: Please pick an ortholog set from the Curation Targets table | |||
http://spreadsheets.google.com/ccc?key=pwOksMOra5uq4vIYjPgefPw | |||
Enter your name in Column K, and open a new item in the SF tracker | |||
http://sourceforge.net/tracker/?group_id=36855&atid=1040173 | |||
Contact Suzi if you need to be added to this tracker. | |||
More instructions will follow by email. | |||
==Review action items== | ==Review action items== | ||
[ACTION ITEM] Suzi, Pascale, Emily, Val [and others]: Go over the issues relating to quality control. | [ACTION ITEM] Suzi, Pascale, Emily, Val [and others]: [in progress] Go over the issues relating to quality control. We have set up a SF tracker where each curator will examine an ortholog set and comment on whether it's completely curates, and what problems they might find in the annotations. | ||
[ACTION ITEM]: (Chris/AmiGO) Look into loading IEAs for reference genome set into AmiGO | [ACTION ITEM]: (Chris/AmiGO) Look into loading IEAs for reference genome set into AmiGO [in progress] | ||
* The new loading cycle will incorporate IEAs from everything except GOA/Uniprot. Human is loaded separately. | |||
[ACTION ITEM]: (Amelia): Fix web page where the number of annotations are to give an estimated number of protein-coding genes; problems: unmapped genes; splice variants; etc. Maybe this should also be on the ref genome page. USE count from gp2protein file-- then it's all consistent. | [ACTION ITEM]: (Amelia): Fix web page where the number of annotations are to give an estimated number of protein-coding genes; problems: unmapped genes; splice variants; etc. Maybe this should also be on the ref genome page. USE count from gp2protein file-- then it's all consistent. | ||
in progress. Amelia had some questions: what should be taken as the correct number, the number of unique IDs in the first column [the db that produced the file], or the number in the second column [the UniProt or NCBI ID]? I just checked with Dan and he says that the mapping may not necessarily be one to one. | in progress. Amelia had some questions: what should be taken as the correct number, the number of unique IDs in the first column [the db that produced the file], or the number in the second column [the UniProt or NCBI ID]? I just checked with Dan and he says that the mapping may not necessarily be one to one. | ||
* Chris/Judy: that may not be a reliable number anyway. At least for human, the proteome is not well documented. | |||
* best would be total number of gene predictions. | |||
* Judy: look at Sue Rhee's recent paper | |||
[ACTION ITEM]: DONE: Mary will include IC in the graphs | [ACTION ITEM]: DONE: Mary will include IC in the graphs | ||
Line 67: | Line 106: | ||
[ACTION ITEM]: ADDED TO GOC meeting agenda. Discuss at the GOC meeting whether it would be useful to add the 'comprehensively annotated' tag to all genes, somehow? Either in the gene association file or in the database somehow | [ACTION ITEM]: ADDED TO GOC meeting agenda. Discuss at the GOC meeting whether it would be useful to add the 'comprehensively annotated' tag to all genes, somehow? Either in the gene association file or in the database somehow | ||
[ACTION ITEM]: REJECTED. The two lists do not have that much overlap. Mike (pascale) merge two email lists (reference genome and annotation) into 'annotation' | |||
[ACTION ITEM]: REJECTED. The two lists do not have that much overlap. Mike(pascale) merge two email lists (reference genome and annotation) into 'annotation' | |||
==Ongoing action items== | ==Ongoing action items== | ||
Line 77: | Line 115: | ||
[ACTION ITEM]: Chris will provide date on the ISS outliers query so that we dont always review the same annotations. | [ACTION ITEM]: Chris will provide date on the ISS outliers query so that we dont always review the same annotations. | ||
[ACTION ITEM] (Tanya Berardini, Emily Dimmer, Pascale Gaudet, David Hill, Chris Mungall, Kimberly Van Auken): Write up recommendations for usage of ISS, IEA, IC | [ACTION ITEM] (Tanya Berardini, Emily Dimmer, Pascale Gaudet, David Hill, Chris Mungall, Kimberly Van Auken): Write up recommendations for usage of ISS, IEA, IC | ||
Line 84: | Line 121: | ||
[ACTION ITEM]: Mike will set up 'annotation' calls? | [ACTION ITEM]: Mike will set up 'annotation' calls? | ||
[ACTION ITEM]: all: look at Stan's error reports: http://www.geneontology.org/internal-reports/gp2protein/ | [ACTION ITEM]: all: look at Stan's error reports: http://www.geneontology.org/internal-reports/gp2protein/ |
Latest revision as of 11:30, 16 January 2018
Tuesday February 12, 10 AM CDT (8 AM PDT, 4 PM BST)
Present
- Pascale
- Seth
- Susan
- Judy
- David
- Stacia
- Emily
- Rachael
- Rex
- Kimberly
- Val
- Suzi
- Chris
- Tanya
- Mary
Next Reference Genome Meeting
April 20-21, Salt Lake City
This will be followed by a GO Consortium Meeting on April 22 and 23 in Salt Lake City.
Karen Eilbeck: host
Orthology determination
- Kara: update:
as of Wed Feb 12: I have the following species still to go:
- gp2protein_input/gp2protein.rgd
- gp2protein_input/gp2protein.sgd
- gp2protein_input/gp2protein.tair
- gp2protein_input/gp2protein.wb
- gp2protein_input/gp2protein.zfin
We did manage to get a local copy of uniprot downloaded and installed, so going forward, things should speed up a bit....
For the ones that have finished, I've gotten a few errors (bad IDs) but not a huge number, so I think we're ok.
- Should we re-run Stan's reports? Chris says no; she will find problems important for her to load the sequences
Curation tool update
- Chris, Siddhartha, Seth, Mary, Pascale, David, Doug
- Met last week to define the requirements; good meeting
- programers are now working on the log in screen
- expect progress to be faster
- Berkeley: loaded P-POD locally so they can look at data structure
Updated graphs
Mary
- I have posted new refG graphs at:
http://www.geneontology.org/images/RefGenomeGraphs/
To simplify comparison of organism annotations (and following Chris' lead on detecting outliers) I have modified the comparison matrix to show only high order GO terms, e.g. PEX1 http://www.geneontology.org/images/RefGenomeGraphs/5189.html#Slim or POLA http://www.geneontology.org/images/RefGenomeGraphs/5422.html#Slim
Please review the graphs and let me know if you notice missing orthologs or annotations -- I am still doing a lot of data editing manually :( [ACTION ITEM]: All: please check and comment
Annotation Quality Control
- Issues:
- We have no QC measures
- Nobody follows up on annotation issues brought up on the ref genome or annotation email lists.
- See Annotation_QC
Suzi, Pascale, Val, Emily propose that each curator will be assigned one orthology set to check curation status and possible mistakes in annotations, and make sure the ortholog set get completely curated. There is a new SF tracker https://sourceforge.net/tracker/?group_id=36855&atid=1040173 where each ref genome gene will be assigned to a curator. As a first step, each curator will do one gene as an experiment and we'll discuss at the next call and at the Salt Lake City meeting how things went and how to improve the process.
- We suggest to document the conclusions of any discussions in an 'Annotation Handbook'
New action items
[ACTION ITEM]: All: please check and comment new version of the graphs http://proto.informatics.jax.org/prototypes/GOgraphEX/RefGenomeGraphs/
[ACTION ITEM]: All: Annotation Quality control: Please pick an ortholog set from the Curation Targets table http://spreadsheets.google.com/ccc?key=pwOksMOra5uq4vIYjPgefPw
Enter your name in Column K, and open a new item in the SF tracker http://sourceforge.net/tracker/?group_id=36855&atid=1040173
Contact Suzi if you need to be added to this tracker.
More instructions will follow by email.
Review action items
[ACTION ITEM] Suzi, Pascale, Emily, Val [and others]: [in progress] Go over the issues relating to quality control. We have set up a SF tracker where each curator will examine an ortholog set and comment on whether it's completely curates, and what problems they might find in the annotations.
[ACTION ITEM]: (Chris/AmiGO) Look into loading IEAs for reference genome set into AmiGO [in progress]
- The new loading cycle will incorporate IEAs from everything except GOA/Uniprot. Human is loaded separately.
[ACTION ITEM]: (Amelia): Fix web page where the number of annotations are to give an estimated number of protein-coding genes; problems: unmapped genes; splice variants; etc. Maybe this should also be on the ref genome page. USE count from gp2protein file-- then it's all consistent.
in progress. Amelia had some questions: what should be taken as the correct number, the number of unique IDs in the first column [the db that produced the file], or the number in the second column [the UniProt or NCBI ID]? I just checked with Dan and he says that the mapping may not necessarily be one to one.
- Chris/Judy: that may not be a reliable number anyway. At least for human, the proteome is not well documented.
- best would be total number of gene predictions.
- Judy: look at Sue Rhee's recent paper
[ACTION ITEM]: DONE: Mary will include IC in the graphs
[ACTION ITEM]: ADDED TO GOC meeting agenda. Discuss at the GOC meeting whether it would be useful to add the 'comprehensively annotated' tag to all genes, somehow? Either in the gene association file or in the database somehow
[ACTION ITEM]: REJECTED. The two lists do not have that much overlap. Mike (pascale) merge two email lists (reference genome and annotation) into 'annotation'
Ongoing action items
[ACTION ITEM]: can Mary show the date completed on the index page? Possibly - she will try
[ACTION ITEM]: Chris: generate new report that would show errors that need fixing for the Orthology determination project
[ACTION ITEM]: Chris will provide date on the ISS outliers query so that we dont always review the same annotations.
[ACTION ITEM] (Tanya Berardini, Emily Dimmer, Pascale Gaudet, David Hill, Chris Mungall, Kimberly Van Auken): Write up recommendations for usage of ISS, IEA, IC
- Started, IEA,_ISS,_IC_Usage_Discussion
- This depends on whether the IEAs can be shown in AmiGO, at least for ref genomes.
[ACTION ITEM]: Mike will set up 'annotation' calls?
[ACTION ITEM]: all: look at Stan's error reports: http://www.geneontology.org/internal-reports/gp2protein/
- not updated since october
Next conference call
Tuesday March 11, 2008, 1 PM CDT, 11 AM PDT, 7 PM GMT
Return to Reference_Genome_Annotation_Project