Manager 17Nov10: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 225: Line 225:
With 8 12
With 8 12


TOTAL ERRORS = 100567
TOTAL ERRORS = 995
TOTAL ROWS with no issues = 62232
TOTAL ROWS with no issues = 275874
TOTAL ROWS in file = 162829
TOTAL ROWS in file = 276900


  ----------------------------------------------------
  ----------------------------------------------------

Revision as of 12:48, 18 November 2010

Agenda

Progress Reports from managers and annotation groups (Judy)

I have requested Manager reports already.

I would like to request MOD and AnnotationProject progress reports as we did last year. This would be in addition to getting necessary annotation and other data directly from the GOdb on Dec 1. These reports would be static and would be referred to in the formal Progress Report that we submit to NIH. This is important to reference the variety of contributions to GOC from annotation groups.

See: http://wiki.geneontology.org/index.php/Grant_Progress_Reports_December_2009

I know that there was discussion about ‘template’ for MOD reports, and that the annotation data can be effectively derived for reference genome and other purposes directly from the database. I want to make sure we understand the difference between these yearly reports and the GOC generated, on-going, reports that are generated on a regular schedule from GOC resources.

Discussion

  • Please get the report done by Dec 2nd. Judy will also send an email out about MOD reports. They are also due on Dec 2nd.
  • Also remember to add publications, posters, tutorials to the wiki page.

Report from Annotation group

  • Annotation jamboree went well last week. Format was different this time. All curators annotated from two papers on transcription and we went over the annotations.
  • Rama and Karen had a phone conference call with Michelle and Marcus about ECO. We talked about what should go in the first version, definitions for some codes etc. We should have the next version sometime this week.
  • Soft QC checks-http://gocwiki.geneontology.org/index.php/Annotation_Quality_Control_Checks#Soft_QC_checks
  • MikeC has implemented the first phase of hard QC checks. I have reviewed the annotations that were flagged for SGD to make sure the checks are right.
    • Summary of the errors is shown below (this is just to give you an idea of how many errors are there). You can skip most of them and look at the # of errors for rat/mouse/SGD.
    • Question: Do we need to give time to the MODs before we move this to production? Please note that committing the script with these checks will result in removing hundreds/thousands of annotations for several GAFs.

Discussion

  • Soft QC- Rama will work with Amelia on the soft QC checks. Taxon triggers is currently listed as a soft QC. They are of the kind 'resolution deferred'- Rama will fix the specs page to reflect this difference.
  • Hard QC- Mike C will fix his script (regarding the Date issue), rerun and send the errors to all the groups for reviewing. We want the new checks in production in the new year. Between now and new year is enough time if groups have any concerns.
1) NOT qualifier cannot be used with protein binding GO:5515

2) Annotations to Protein binding (GO:5515) can be *only* with IPI and *should* gave an identifier in the 'with' column. 
'ISS, ISA, ISM, IEA, NAS','TAS','IDA','IMP','IGC','IEP','ND','IC','RCA','EXP', 'IGI' are all not allowed for annotating to this term.

3) IEP evidence code is allowed for only BP annotations

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.GeneDB_Spombe.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	12
Evidence		7	11

TOTAL ERRORS = 23
TOTAL ROWS with no issues = 36022
TOTAL ROWS in file = 36070

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.GeneDB_Tbrucei.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	15
Evidence		7	44

TOTAL ERRORS = 59
TOTAL ROWS with no issues = 10330
TOTAL ROWS in file = 10414

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.aspgd.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	39
With			8	3

TOTAL ERRORS = 42
TOTAL ROWS with no issues = 38747
TOTAL ROWS in file = 38824

 ---------------------------------------------------
gene_association.dictyBase.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	427
With			8	1

TOTAL ERRORS = 428
TOTAL ROWS with no issues = 31292
TOTAL ROWS in file = 31750

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.ecocyc.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	116
Evidence		7	327
With			8	14

TOTAL ERRORS = 457
TOTAL ROWS with no issues = 99795
TOTAL ROWS in file = 100278

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.fb.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	632

TOTAL ERRORS = 632
TOTAL ROWS with no issues = 76346
TOTAL ROWS in file = 77007

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.goa_chicken.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	1
Evidence		7	3038

TOTAL ERRORS = 3039
TOTAL ROWS with no issues = 72143
TOTAL ROWS in file = 75207

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.goa_cow.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	4348

TOTAL ERRORS = 4348
TOTAL ROWS with no issues = 103977
TOTAL ROWS in file = 108350

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.goa_human.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	7
Evidence		7	1598

TOTAL ERRORS = 1605
TOTAL ROWS with no issues = 196596
TOTAL ROWS in file = 198226

-------------------------------------------------------
gene_association.goa_pdb.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	3054

TOTAL ERRORS = 3054
TOTAL ROWS with no issues = 798487
TOTAL ROWS in file = 801578

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.goa_uniprot_noiea.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	5
Evidence		7	4
With			8	2

TOTAL ERRORS = 11
TOTAL ROWS with no issues = 49984
TOTAL ROWS in file = 49999

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.gramene_oryza.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	65
Evidence		7	4

TOTAL ERRORS = 69
TOTAL ROWS with no issues = 49967
TOTAL ROWS in file = 50065

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.jcvi_Aphagocytophilum.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	8

TOTAL ERRORS = 8
TOTAL ROWS with no issues = 3462
TOTAL ROWS in file = 3497

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.jcvi_Banthracis.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	10

TOTAL ERRORS = 10
TOTAL ROWS with no issues = 13046
TOTAL ROWS in file = 13083

 [snip] I removed the reports for other JCVI files (they are similar in number)
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.mgi.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	983
With			8	12

TOTAL ERRORS = 995
TOTAL ROWS with no issues = 275874
TOTAL ROWS in file = 276900

 ----------------------------------------------------
gene_association.rgd.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Qualifier		4	1
GOID			5	41
Evidence		7	11715

TOTAL ERRORS = 11757
TOTAL ROWS with no issues = 234494
TOTAL ROWS in file = 246277

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.sgd.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	13
Evidence		7	208
With			8	23

TOTAL ERRORS = 244
TOTAL ROWS with no issues = 89775
TOTAL ROWS in file = 90047

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.sgn.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	33

TOTAL ERRORS = 33
TOTAL ROWS with no issues = 271
TOTAL ROWS in file = 331

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.tair.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
GOID			5	67
Evidence		7	651
With			8	21

TOTAL ERRORS = 739
TOTAL ROWS with no issues = 132845
TOTAL ROWS in file = 133610

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.wb.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	1146
With			8	5

TOTAL ERRORS = 1151
TOTAL ROWS with no issues = 108799
TOTAL ROWS in file = 109980

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
gene_association.zfin.gz

NUMBER of ERRORS by COLUMN

Column			Col#	Number of Errors
Evidence		7	1798
With			8	4

TOTAL ERRORS = 1802
TOTAL ROWS with no issues = 108164
TOTAL ROWS in file = 109995

Reference Genome

  • Ref Genome Gene Targets / WNT Pathway (Kara: unfortunately, I might be late/absent from this call, but please discuss in my absence)

For the Ref Genome gene targets, we hatched a plan to shift away from the WNT pathway targets to targets that are more lower euk heavy, to enable some of the curators of higher euks to help out with PAINT annotation. After thinking about this a bit, I wonder if it would make more sense to enlist the lower euk curators for PAINT, so that the primary WNT annotation can continue? My concern about our original plan is that we will lose momentum on the WNTs and won't be able to finish that project up. I do think it would be very beneficial to wrap up at least one of these focused areas for the renewal. I'm happy to stick to the original plan if others don't think it'll be a problem to go back to the WNTs--but I think from a curation standpoint, it's more efficient to knock out all the WNT genes in consecutive months, on the assumption that curation will go faster when everyone has WNT on the brain.  ;)

Discussion

  • We will focus on training curators to do PAINT inferencing. We have a small number of curators (Rama, Li, Dimitri) that will start right away. Depending on how it goes more curators will be trained in Jan.
  • We won't continue with WNT and won't have new targets in the coming months. Our focus is to do more propagation


Other items

  • Pascale: not everybody can Commit PAINT annotation files into CVS. Send email to go-admin@genome.stanford.edu if you want access to check into CVS.
  • Chris requested Paul Thomas to give him admin privilege to moderate PAINT-users mailing list. Chris should send an email to Paul about this.
  • The login system should be fixed in PAINT interface. Currently everybody logs in as 'gouser'. PaulT will look into fixing this.