RefGenome8Apr08 Phone Conference (Archived): Difference between revisions

From GO Wiki
Jump to navigation Jump to search
 
(21 intermediate revisions by 4 users not shown)
Line 1: Line 1:
[[Category:Reference Genome]][[Category:Archived]]
Tuesday April 8, 10 AM CDT (8 AM PDT, 4 PM BST)'''
Tuesday April 8, 10 AM CDT (8 AM PDT, 4 PM BST)'''


Line 14: Line 15:
Ranjana wormbase<br>
Ranjana wormbase<br>
Kimberly WormBase<br>
Kimberly WormBase<br>
Mary MGI<br>
David MGI<br>
Tanya TAIR<br>
== ACTION ITEMS ==
#  All: Annotation Quality control: Please pick an ortholog set from the Curation Targets table
http://spreadsheets.google.com/ccc?key=pwOksMOra5uq4vIYjPgefPw
# All: Annotation Quality control: Have a look at the SF items and see if the ortholog from your organism is correctly annotated ("comprehensive"). Let lead curator for that set know that you're done.
# Seth: send URL sometime to the prototype of the ortholog tool this week


==Webex==
==Webex==
Try WebEx "raise hand" feature for next conf call.  
We were supposed to WebEx "raise hand" feature. We didn't set that up because we expected too many people to attend. Pascale logged in to skype; hopefully people can skype to get attention if needed.


== ACTION ITEMS ==
 
 
== Review action items ==
1.  Chris/Emily: figure out secondary IDs problems (many sequences were not loaded because the IDs were secondary). Maybe a script can be generated to map IDs?
1.  Chris/Emily: figure out secondary IDs problems (many sequences were not loaded because the IDs were secondary). Maybe a script can be generated to map IDs?
[in progress] New gp2protein file will be provided by UniProt. Also, Dan Barell will provide a mapping of secondary IDs. But generally all databases have secondary IDs issues, we need to figure out how to best deal with it.


2. IN PROGRESS. All: Annotation Quality control: Please pick an ortholog set from the Curation Targets table
2. IN PROGRESS. All: Annotation Quality control: Please pick an ortholog set from the Curation Targets table
Line 29: Line 43:
Contact Suzi if you need to be added to this tracker.
Contact Suzi if you need to be added to this tracker.


3. Fix problems in annotations and graphs pointed out in the SF "ref genome completion set" tracker.  
3. Fix problems in annotations and graphs pointed out in the SF "ref genome completion set" tracker. [DONE] David, Chris: David fixed some defs. There is still the problem that not 'anything to do with a heart' can be pulled out from the same branch in the graph. Chris will demo how to use cross products to do that at the next GO meeting.  


4. (Chris/AmiGO) Look into loading IEAs for reference genome set into AmiGO [in progress]
4. (Chris/AmiGO) Look into loading IEAs for reference genome set into AmiGO [in progress]
Line 44: Line 58:


a. Show the date completed on the index page of the graphs
a. Show the date completed on the index page of the graphs
[Mary] Because each group has its way of entering date information and since we will soon have a better way of entering the data without using the google spreadsheets, I am not sure it is worth the effort to extract the dates now.


b. Distinguish 'not yet annotated' from 'no ortholog' in the graphs  
b. Distinguish 'not yet annotated' from 'no ortholog' in the graphs  
[Mary] The graphs currently distinguish four cases for entries in the comparison matrix for high level terms:
* 'no ortholog' the entry is 'X';
* if an ortholog exists but there is no annotation the entry is 'organism';
* if there is experimental annotation the entry is a color-coded 'organism';
* if there is only ISS annotation the entry is color-coded and enclosed in parentheses '(organism)'.
For example, see: http://www.geneontology.org/images/RefGenomeGraphs/43.html#Slim


== Reference Genome Meeting ==
== Reference Genome Meeting ==
Line 53: Line 77:


==Orthology determination==
==Orthology determination==
* Kara: update:  
* Kara: update (by email):
 
**the ClustalW alignments of all the families have finished, and we are on the final analysis/computational step (PHYLIP) needed to generate the pretty phylogenetic graphs.  The data thus far have been loaded into the database.  If anyone is chomping at the bit to check things out, let me know and I can send you our development URL,though note that the interface is not there yet--our developer is making some improvements to the web display, and right now it is *very* bare bones and in debugging mode.  But, you can at least see the members of the orthologous groups.
**we started the protein list that consists of proteins/families prone to erroneous results with these types of ortholog identification methods.  It's on the wiki so please feel free to add your favorite (or dreaded, depending how you look at it) proteins.


==Curation tool update==
==Curation tool update==
* Some requirements are here (David, Doug, Pascale):  [http://wiki.geneontology.org/index.php/Image:Refgene_Database_V3.ppt http://wiki.geneontology.org/index.php/Image:Refgene_Database_V3.ppt]
* Some requirements are here (David, Doug, Pascale):  [http://wiki.geneontology.org/index.php/Image:Refgene_Database_V3.ppt http://wiki.geneontology.org/index.php/Image:Refgene_Database_V3.ppt]
* Chris, Siddhartha, Seth, Mary, Pascale, David, Doug
* Chris, Siddhartha, Seth, Mary, Pascale, David, Doug
* Should have something to demo some time this week
[ACTION ITEM] Seth: send URL sometime this week


==Annotation Pipeline document==
==Annotation Pipeline document==
Please have a look: [[Annotation_pipeline]]
Please have a look: [[Annotation_pipeline]]
 
People like it


==Annotation Quality Control==
==Annotation Quality Control==
* See [[Annotation_QC]]
* See [[Annotation_QC]]
*SF tracker: go through some examples
*SF tracker: HPRT1 (Emily)  
#HPRT1 (Emily)  
*there were some ortholog call issues (pombe/cerevisiae); settled now
#GRIN1 (Donghui)
*if there is experimental annotations, preferably ISS to that (dictyBase is still referring to some InterPro)
* be careful about what to ISS to: AVOID
**Homodimerization/teramerization
**grooming behavior, etc
*generally good, people should fix ISS and mark the gene 'comprehensively annotated'
 
==Next conference call==
Tuesday May 13, 2008,  1 PM CDT, 11 AM PDT, 7 PM GMT
 
Return to [[Reference_Genome_Annotation_Project]]

Latest revision as of 11:30, 16 January 2018

Tuesday April 8, 10 AM CDT (8 AM PDT, 4 PM BST)

Present

Pascale dictyBase
Emily EBI
Rachael EBI
Chris NCBO
Val pombe
Stacia SGD
Doug zfin
Seth BBOP
Victoria RGD
Ranjana wormbase
Kimberly WormBase
Mary MGI
David MGI
Tanya TAIR

ACTION ITEMS

  1. All: Annotation Quality control: Please pick an ortholog set from the Curation Targets table

http://spreadsheets.google.com/ccc?key=pwOksMOra5uq4vIYjPgefPw

  1. All: Annotation Quality control: Have a look at the SF items and see if the ortholog from your organism is correctly annotated ("comprehensive"). Let lead curator for that set know that you're done.
  2. Seth: send URL sometime to the prototype of the ortholog tool this week


Webex

We were supposed to WebEx "raise hand" feature. We didn't set that up because we expected too many people to attend. Pascale logged in to skype; hopefully people can skype to get attention if needed.


Review action items

1. Chris/Emily: figure out secondary IDs problems (many sequences were not loaded because the IDs were secondary). Maybe a script can be generated to map IDs? [in progress] New gp2protein file will be provided by UniProt. Also, Dan Barell will provide a mapping of secondary IDs. But generally all databases have secondary IDs issues, we need to figure out how to best deal with it.

2. IN PROGRESS. All: Annotation Quality control: Please pick an ortholog set from the Curation Targets table http://spreadsheets.google.com/ccc?key=pwOksMOra5uq4vIYjPgefPw

Enter your name in Column K, and open a new item in the SF tracker http://sourceforge.net/tracker/?group_id=36855&atid=1040173

Contact Suzi if you need to be added to this tracker.

3. Fix problems in annotations and graphs pointed out in the SF "ref genome completion set" tracker. [DONE] David, Chris: David fixed some defs. There is still the problem that not 'anything to do with a heart' can be pulled out from the same branch in the graph. Chris will demo how to use cross products to do that at the next GO meeting.

4. (Chris/AmiGO) Look into loading IEAs for reference genome set into AmiGO [in progress]

  • The new loading cycle will incorporate IEAs from everything except GOA/Uniprot. Human is loaded separately.

5. (Amelia): Fix web page where the number of annotations are to give an estimated number of protein-coding genes; problems: unmapped genes; splice variants; etc. Maybe this should also be on the ref genome page. USE count from gp2protein file-- then it's all consistent.

in progress. Amelia had some questions: what should be taken as the correct number, the number of unique IDs in the first column [the db that produced the file], or the number in the second column [the UniProt or NCBI ID]? I just checked with Dan and he says that the mapping may not necessarily be one to one.

  • Chris/Judy: that may not be a reliable number anyway. At least for human, the proteome is not well documented.
  • best would be total number of gene predictions.
  • Judy: look at Sue Rhee's recent paper

6. Annotation summary Graphs:

a. Show the date completed on the index page of the graphs

[Mary] Because each group has its way of entering date information and since we will soon have a better way of entering the data without using the google spreadsheets, I am not sure it is worth the effort to extract the dates now.

b. Distinguish 'not yet annotated' from 'no ortholog' in the graphs

[Mary] The graphs currently distinguish four cases for entries in the comparison matrix for high level terms:

  • 'no ortholog' the entry is 'X';
  • if an ortholog exists but there is no annotation the entry is 'organism';
  • if there is experimental annotation the entry is a color-coded 'organism';
  • if there is only ISS annotation the entry is color-coded and enclosed in parentheses '(organism)'.

For example, see: http://www.geneontology.org/images/RefGenomeGraphs/43.html#Slim

Reference Genome Meeting

April 20-21, Salt Lake City

  • Discuss agenda

SLC_GO_Reference_Genome_Project_Meeting#Draft_Agenda

Orthology determination

  • Kara: update (by email):
    • the ClustalW alignments of all the families have finished, and we are on the final analysis/computational step (PHYLIP) needed to generate the pretty phylogenetic graphs. The data thus far have been loaded into the database. If anyone is chomping at the bit to check things out, let me know and I can send you our development URL,though note that the interface is not there yet--our developer is making some improvements to the web display, and right now it is *very* bare bones and in debugging mode. But, you can at least see the members of the orthologous groups.
    • we started the protein list that consists of proteins/families prone to erroneous results with these types of ortholog identification methods. It's on the wiki so please feel free to add your favorite (or dreaded, depending how you look at it) proteins.

Curation tool update

[ACTION ITEM] Seth: send URL sometime this week

Annotation Pipeline document

Please have a look: Annotation_pipeline People like it

Annotation Quality Control

  • See Annotation_QC
  • SF tracker: HPRT1 (Emily)
  • there were some ortholog call issues (pombe/cerevisiae); settled now
  • if there is experimental annotations, preferably ISS to that (dictyBase is still referring to some InterPro)
  • be careful about what to ISS to: AVOID
    • Homodimerization/teramerization
    • grooming behavior, etc
  • generally good, people should fix ISS and mark the gene 'comprehensively annotated'

Next conference call

Tuesday May 13, 2008, 1 PM CDT, 11 AM PDT, 7 PM GMT

Return to Reference_Genome_Annotation_Project