Manager 17Nov10: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
Line 12: Line 12:


===Report from Annotation group===
===Report from Annotation group===
* Annotation jamboree went well last week. We all annotated from two papers on transcription and went over the annotations.
* Annotation jamboree went well last week. Format was different this time. All curators annotated from two papers on transcription and we went over the annotations.
* Rama and Karen had a phone conference call with Michelle and Marcus about ECO. We talked about what should go in the first version, definitions for some codes etc. We should have the next version sometime this week.
* Rama and Karen had a phone conference call with Michelle and Marcus about ECO. We talked about what should go in the first version, definitions for some codes etc. We should have the next version sometime this week.
* Soft QC checks-http://gocwiki.geneontology.org/index.php/Annotation_Quality_Control_Checks#Soft_QC_checks
* Soft QC checks-http://gocwiki.geneontology.org/index.php/Annotation_Quality_Control_Checks#Soft_QC_checks

Revision as of 18:25, 16 November 2010

Agenda

Progress Reports from managers and annotation groups (Judy)

I have requested Manager reports already.

I would like to request MOD and AnnotationProject progress reports as we did last year. This would be in addition to getting necessary annotation and other data directly from the GOdb on Dec 1. These reports would be static and would be referred to in the formal Progress Report that we submit to NIH. This is important to reference the variety of contributions to GOC from annotation groups.

See: http://wiki.geneontology.org/index.php/Grant_Progress_Reports_December_2009

I know that there was discussion about ‘template’ for MOD reports, and that the annotation data can be effectively derived for reference genome and other purposes directly from the database. I want to make sure we understand the difference between these yearly reports and the GOC generated, on-going, reports that are generated on a regular schedule from GOC resources.

Report from Annotation group

  • Annotation jamboree went well last week. Format was different this time. All curators annotated from two papers on transcription and we went over the annotations.
  • Rama and Karen had a phone conference call with Michelle and Marcus about ECO. We talked about what should go in the first version, definitions for some codes etc. We should have the next version sometime this week.
  • Soft QC checks-http://gocwiki.geneontology.org/index.php/Annotation_Quality_Control_Checks#Soft_QC_checks
  • MikeC has implemented the first phase of hard QC checks. I have reviewed the annotations that were flagged for SGD to make sure the checks are right.
    • Summary of the errors is shown below (this is just to give you an idea of how many errors are there). You can skip most of them and look at the # of errors for rat/mouse/SGD.
    • Question: Do we need to give time to the MODs before we move this to production? Please note that committing the script with these checks will result in removing hundreds/thousands of annotations for several GAFs.
1) NOT qualifier cannot be used with protein binding GO:5515

2) Annotations to Protein binding (GO:5515) can be *only* with IPI and *should* gave an identifier in the 'with' column. 
'ISS, ISA, ISM, IEA, NAS','TAS','IDA','IMP','IGC','IEP','ND','IC','RCA','EXP', 'IGI' are all not allowed for annotating to this term.

3) IEP evidence code is allowed for only BP annotations

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.GeneDB_Spombe.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	12
Evidence		7	11

TOTAL ERRORS = 23
TOTAL ROWS with no issues = 36022
TOTAL ROWS in file = 36070

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.GeneDB_Tbrucei.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	15
Evidence		7	44

TOTAL ERRORS = 59
TOTAL ROWS with no issues = 10330
TOTAL ROWS in file = 10414

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.aspgd.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	39
With			8	3

TOTAL ERRORS = 42
TOTAL ROWS with no issues = 38747
TOTAL ROWS in file = 38824

 ---------------------------------------------------
gene_association.dictyBase.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	427
With			8	1

TOTAL ERRORS = 428
TOTAL ROWS with no issues = 31292
TOTAL ROWS in file = 31750

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.ecocyc.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	116
Evidence		7	327
With			8	14

TOTAL ERRORS = 457
TOTAL ROWS with no issues = 99795
TOTAL ROWS in file = 100278

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.fb.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	632

TOTAL ERRORS = 632
TOTAL ROWS with no issues = 76346
TOTAL ROWS in file = 77007

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.goa_chicken.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	1
Evidence		7	3038

TOTAL ERRORS = 3039
TOTAL ROWS with no issues = 72143
TOTAL ROWS in file = 75207

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.goa_cow.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	4348

TOTAL ERRORS = 4348
TOTAL ROWS with no issues = 103977
TOTAL ROWS in file = 108350

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.goa_human.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	7
Evidence		7	1598

TOTAL ERRORS = 1605
TOTAL ROWS with no issues = 196596
TOTAL ROWS in file = 198226

-------------------------------------------------------
gene_association.goa_pdb.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	3054

TOTAL ERRORS = 3054
TOTAL ROWS with no issues = 798487
TOTAL ROWS in file = 801578

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.goa_uniprot_noiea.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	5
Evidence		7	4
With			8	2

TOTAL ERRORS = 11
TOTAL ROWS with no issues = 49984
TOTAL ROWS in file = 49999

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.gramene_oryza.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	65
Evidence		7	4

TOTAL ERRORS = 69
TOTAL ROWS with no issues = 49967
TOTAL ROWS in file = 50065

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.jcvi_Aphagocytophilum.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	8

TOTAL ERRORS = 8
TOTAL ROWS with no issues = 3462
TOTAL ROWS in file = 3497

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.jcvi_Banthracis.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	10

TOTAL ERRORS = 10
TOTAL ROWS with no issues = 13046
TOTAL ROWS in file = 13083

 [snip] I removed the reports for other JCVI files (they are similar in number)
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.mgi.gz

gzip: gene_association.mgi.gz: decompression OK, trailing garbage ignored

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	387
Evidence		7	1536
With			8	21
Date			14	98581
General errors		-	42

TOTAL ERRORS = 100567
TOTAL ROWS with no issues = 62232
TOTAL ROWS in file = 162829

 ----------------------------------------------------
gene_association.rgd.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Qualifier		4	1
GOID			5	41
Evidence		7	11715

TOTAL ERRORS = 11757
TOTAL ROWS with no issues = 234494
TOTAL ROWS in file = 246277

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.sgd.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	13
Evidence		7	208
With			8	23

TOTAL ERRORS = 244
TOTAL ROWS with no issues = 89775
TOTAL ROWS in file = 90047

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.sgn.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	33

TOTAL ERRORS = 33
TOTAL ROWS with no issues = 271
TOTAL ROWS in file = 331

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.tair.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	67
Evidence		7	651
With			8	21

TOTAL ERRORS = 739
TOTAL ROWS with no issues = 132845
TOTAL ROWS in file = 133610

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.wb.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	1146
With			8	5

TOTAL ERRORS = 1151
TOTAL ROWS with no issues = 108799
TOTAL ROWS in file = 109980

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.zfin.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	1798
With			8	4

TOTAL ERRORS = 1802
TOTAL ROWS with no issues = 108164
TOTAL ROWS in file = 109995

Reference Genome

  • Ref Genome Gene Targets / WNT Pathway (Kara: unfortunately, I might be late/absent from this call, but please discuss in my absence)

For the Ref Genome gene targets, we hatched a plan to shift away from the WNT pathway targets to targets that are more lower euk heavy, to enable some of the curators of higher euks to help out with PAINT annotation. After thinking about this a bit, I wonder if it would make more sense to enlist the lower euk curators for PAINT, so that the primary WNT annotation can continue? My concern about our original plan is that we will lose momentum on the WNTs and won't be able to finish that project up. I do think it would be very beneficial to wrap up at least one of these focused areas for the renewal. I'm happy to stick to the original plan if others don't think it'll be a problem to go back to the WNTs--but I think from a curation standpoint, it's more efficient to knock out all the WNT genes in consecutive months, on the assumption that curation will go faster when everyone has WNT on the brain.  ;)