Reference Genome August 2010 (Archived)
PAINT-generated GAF files are now available per species
From Suzi: The first cut of the script is written, committed to CVS, and the GAF files are now available under CVS at: gene_associations/submission/gene_association.paint_XXX where XXX is the organism name. Be aware there are two new evidence codes (as discussed) for NOT annotations.
- Link to the CVS location of the species-specific files:
- Individual families are available from the same location as before:
- Curator/MOD folk please feel free to grab these and use them.
Issues with PAINT GAFs
Mike C: filter-gene-association.pl script changes Mike, I had to make two changes to the filtering script. First I added two more evidence codes: IRD (rapid divergence) and IMD (motif divergence). These are used in conjunction with NOT to indicate why the protein didn't inherit the function. Second, also needed to add 'rapid_divergence' as a qualifier. This seems redundant so I'd like Ed and the PAINT group to check the PAINT code to see if this is really necessary and so it may be eliminated later.
Paulo: gene symbol error There is one gene symbol that contains a pipe character "GPM1a|GPM1b". This provokes an error because in the GAF file the pipe is reserved for separating values in the columns. Perhaps there is a way to escape this character when parsing, but someone else would need to answer this. I think in this case though the gene symbol perhaps should simply be GPM1, or something else.
Ed & Paulo: PAINT error in generating GAF In the protein family PTHR10202 there were 3 genes from FB, MGI, and TAIR that had "colocalizes_with" appearing in the synonym column! Something funny was going on here, but I couldn't debug it because I couldn't connect to the PANTHER database (PAUL what is the new location for this??). Compare the two versions under CVS (I manually fixed the second version). I don't think this is anything Mike L may have done because it is too systematic. Can we dig into this next week.
Mike C: gzip needed? Another question is whether or not these generated files need to be gzipped before committing. I didn't do it this time, but can easily switch.
All: auto CVS commit Right now, I've just manually committed the generated GAF files, but this will need to be added soon.
All: filter-paint_association.pl Still could use further testing and polishing. I'd like to have a report generated and mailed out (undone). And the debug mode could be tighter. And not sure how it will run as a cron job in the SGD environment, or whether we should be running it here and just committing to CVS afterward (since that needs to be done in any case). Thoughts?
Communication on tree annotation
- Mike is now creating wiki pages from the evidence.txt' files (see below) so that the discussion about annotations are public.