Manager 17Nov10
Agenda
Progress Reports from managers and annotation groups (Judy)
I have requested Manager reports already.
I would like to request MOD and AnnotationProject progress reports as we did last year. This would be in addition to getting necessary annotation and other data directly from the GOdb on Dec 1. These reports would be static and would be referred to in the formal Progress Report that we submit to NIH. This is important to reference the variety of contributions to GOC from annotation groups.
See: http://wiki.geneontology.org/index.php/Grant_Progress_Reports_December_2009
I know that there was discussion about ‘template’ for MOD reports, and that the annotation data can be effectively derived for reference genome and other purposes directly from the database. I want to make sure we understand the difference between these yearly reports and the GOC generated, on-going, reports that are generated on a regular schedule from GOC resources.
Discussion
- Please get the report done by Dec 2nd. Judy will also send an email out about MOD reports. They are also due on Dec 2nd.
- Also remember to add publications, posters, tutorials to the wiki page.
Report from Annotation group
- Annotation jamboree went well last week. Format was different this time. All curators annotated from two papers on transcription and we went over the annotations.
- Rama and Karen had a phone conference call with Michelle and Marcus about ECO. We talked about what should go in the first version, definitions for some codes etc. We should have the next version sometime this week.
- Soft QC checks-http://gocwiki.geneontology.org/index.php/Annotation_Quality_Control_Checks#Soft_QC_checks
- MikeC has implemented the first phase of hard QC checks. I have reviewed the annotations that were flagged for SGD to make sure the checks are right.
- Summary of the errors is shown below (this is just to give you an idea of how many errors are there). You can skip most of them and look at the # of errors for rat/mouse/SGD.
- Question: Do we need to give time to the MODs before we move this to production? Please note that committing the script with these checks will result in removing hundreds/thousands of annotations for several GAFs.
- Summary of the errors is shown below (this is just to give you an idea of how many errors are there). You can skip most of them and look at the # of errors for rat/mouse/SGD.
Discussion
- Soft QC- Rama will work with Amelia on the soft QC checks. Taxon triggers is currently listed as a soft QC. They are of the kind 'resolution deferred'- Rama will fix the specs page to reflect this difference.
- Hard QC- Mike C will fix his script (regarding the Date issue), rerun and send the errors to all the groups for reviewing. We want the new checks in production in the new year. Between now and new year is enough time if groups have any concerns.
1) NOT qualifier cannot be used with protein binding GO:5515 2) Annotations to Protein binding (GO:5515) can be *only* with IPI and *should* gave an identifier in the 'with' column. 'ISS, ISA, ISM, IEA, NAS','TAS','IDA','IMP','IGC','IEP','ND','IC','RCA','EXP', 'IGI' are all not allowed for annotating to this term. 3) IEP evidence code is allowed for only BP annotations - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.GeneDB_Spombe.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors GOID 5 12 Evidence 7 11 TOTAL ERRORS = 23 TOTAL ROWS with no issues = 36022 TOTAL ROWS in file = 36070 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.GeneDB_Tbrucei.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors GOID 5 15 Evidence 7 44 TOTAL ERRORS = 59 TOTAL ROWS with no issues = 10330 TOTAL ROWS in file = 10414 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.aspgd.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors Evidence 7 39 With 8 3 TOTAL ERRORS = 42 TOTAL ROWS with no issues = 38747 TOTAL ROWS in file = 38824 --------------------------------------------------- gene_association.dictyBase.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors Evidence 7 427 With 8 1 TOTAL ERRORS = 428 TOTAL ROWS with no issues = 31292 TOTAL ROWS in file = 31750 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.ecocyc.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors GOID 5 116 Evidence 7 327 With 8 14 TOTAL ERRORS = 457 TOTAL ROWS with no issues = 99795 TOTAL ROWS in file = 100278 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.fb.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors Evidence 7 632 TOTAL ERRORS = 632 TOTAL ROWS with no issues = 76346 TOTAL ROWS in file = 77007 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.goa_chicken.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors GOID 5 1 Evidence 7 3038 TOTAL ERRORS = 3039 TOTAL ROWS with no issues = 72143 TOTAL ROWS in file = 75207 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.goa_cow.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors Evidence 7 4348 TOTAL ERRORS = 4348 TOTAL ROWS with no issues = 103977 TOTAL ROWS in file = 108350 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.goa_human.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors GOID 5 7 Evidence 7 1598 TOTAL ERRORS = 1605 TOTAL ROWS with no issues = 196596 TOTAL ROWS in file = 198226 ------------------------------------------------------- gene_association.goa_pdb.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors Evidence 7 3054 TOTAL ERRORS = 3054 TOTAL ROWS with no issues = 798487 TOTAL ROWS in file = 801578 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.goa_uniprot_noiea.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors GOID 5 5 Evidence 7 4 With 8 2 TOTAL ERRORS = 11 TOTAL ROWS with no issues = 49984 TOTAL ROWS in file = 49999 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.gramene_oryza.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors GOID 5 65 Evidence 7 4 TOTAL ERRORS = 69 TOTAL ROWS with no issues = 49967 TOTAL ROWS in file = 50065 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.jcvi_Aphagocytophilum.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors Evidence 7 8 TOTAL ERRORS = 8 TOTAL ROWS with no issues = 3462 TOTAL ROWS in file = 3497 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.jcvi_Banthracis.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors Evidence 7 10 TOTAL ERRORS = 10 TOTAL ROWS with no issues = 13046 TOTAL ROWS in file = 13083 [snip] I removed the reports for other JCVI files (they are similar in number) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.mgi.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors Evidence 7 983 With 8 12 TOTAL ERRORS = 995 TOTAL ROWS with no issues = 275874 TOTAL ROWS in file = 276900 ---------------------------------------------------- gene_association.rgd.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors Qualifier 4 1 GOID 5 41 Evidence 7 11715 TOTAL ERRORS = 11757 TOTAL ROWS with no issues = 234494 TOTAL ROWS in file = 246277 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.sgd.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors GOID 5 13 Evidence 7 208 With 8 23 TOTAL ERRORS = 244 TOTAL ROWS with no issues = 89775 TOTAL ROWS in file = 90047 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.sgn.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors GOID 5 33 TOTAL ERRORS = 33 TOTAL ROWS with no issues = 271 TOTAL ROWS in file = 331 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.tair.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors GOID 5 67 Evidence 7 651 With 8 21 TOTAL ERRORS = 739 TOTAL ROWS with no issues = 132845 TOTAL ROWS in file = 133610 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.wb.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors Evidence 7 1146 With 8 5 TOTAL ERRORS = 1151 TOTAL ROWS with no issues = 108799 TOTAL ROWS in file = 109980 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - gene_association.zfin.gz NUMBER of ERRORS by COLUMN Column Col# Number of Errors Evidence 7 1798 With 8 4 TOTAL ERRORS = 1802 TOTAL ROWS with no issues = 108164 TOTAL ROWS in file = 109995
Reference Genome
- Ref Genome Gene Targets / WNT Pathway (Kara: unfortunately, I might be late/absent from this call, but please discuss in my absence)
For the Ref Genome gene targets, we hatched a plan to shift away from the WNT pathway targets to targets that are more lower euk heavy, to enable some of the curators of higher euks to help out with PAINT annotation. After thinking about this a bit, I wonder if it would make more sense to enlist the lower euk curators for PAINT, so that the primary WNT annotation can continue? My concern about our original plan is that we will lose momentum on the WNTs and won't be able to finish that project up. I do think it would be very beneficial to wrap up at least one of these focused areas for the renewal. I'm happy to stick to the original plan if others don't think it'll be a problem to go back to the WNTs--but I think from a curation standpoint, it's more efficient to knock out all the WNT genes in consecutive months, on the assumption that curation will go faster when everyone has WNT on the brain. ;)
Discussion
- We will focus on training curators to do PAINT inferencing. We have a small number of curators (Rama, Li, Dimitri) that will start right away. Depending on how it goes more curators will be trained in Jan.
- We won't continue with WNT and won't have new targets in the coming months. Our focus is to do more propagation
Other items
- Pascale: not everybody can Commit PAINT annotation files into CVS. Send email to go-admin@genome.stanford.edu if you want access to check into CVS.
- Chris requested Paul Thomas to give him admin privilege to moderate PAINT-users mailing list. Chris should send an email to Paul about this.
- The login system should be fixed in PAINT interface. Currently everybody logs in as 'gouser'. PaulT will look into fixing this.