Reference Genome Database Requirements Discussion 2007 (Retired): Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
No edit summary
Line 36: Line 36:


==Should provide reports to focus curation effort==
==Should provide reports to focus curation effort==
The interface should provide reports that will help focus curator effort.  One example might be to provide a facility to search the data for species-specific orthologs where curation is not 'comprehensive'...these are the genes we should be working on.  Another example might be to provide a report of genes where no ortholog was determined yet.  It would be nice to be able to alter the sort order of the results in such reports by the following parameters:  date Human Gene was added to the Ref. Genome set, ortholog ID, OMIM ID, most papers associated, biggest difference between papers associated with a gene and papers read for GO for that gene...maybe others... -Doug
The interface should provide reports that will help focus curator effort.  One example might be to provide a facility to search the data for species-specific orthologs where curation is not 'comprehensive'...these are the genes we should be working on.  Another example might be to provide a report of genes where no ortholog was determined yet.  It would be nice to be able to alter the sort order of the results in such reports by the following parameters:  date Human Gene was added to the Ref. Genome set, ortholog ID, OMIM ID, most papers associated, biggest difference between papers associated with a gene and papers read for GO for that gene...maybe others...  
It would also be good to have a report generated by a query on OMIM ID that shows which species have 'comprehensive' annotation done for their ortholog(s)-Doug
 
==Should record that orthology is 'comprehensive' as of a certain date==
Curators should be able to mark that curation of an ortholog is 'comprehensive' as of a certain date.  It would be good to be able to generate a report to look for cases where  the 'comprehensive' curation date is getting old.  These may need to be reviewed and updated.
-Doug

Revision as of 16:03, 11 July 2007

This is the place to discuss features and requirements for the Reference Genome Database being designed to replace the Google Spreadsheet system currently in use.

(here's one to get us started --chris):

Ensures consistent use of identifiers

Identifiers must unambiguously identify a single entry in a database.

Identifiers should conform to the following syntax:

 DBAuthority : LocalID

DBAuthority should be in the GO xrefs metadata list:

E.g.

 FB:FBgn0000001

Curators should not be expected to memorise identifiers, so a data entry system should allow them to enter symbols etc and have this resolved as an ID eg using some automatic lookup mechanism

Should allow loading of MOD reports

The database should allow MODs to submit their metrics via a tab-delimited file that can be automatically downloaded from their ftp site. The file should contain columns for the Reference gene, the organism's ortholog/orthologs, date genes have been completed and reference counts for total number of papers associated with a gene etc.

In the future we may want to add capability to determine when genes that have been completed but have new references associated with them.

References should be compiled in a central location, so once a paper is curated, it is somehow flagged that it has been done.

We need to decide whether we will allow individual users to modify the database a record at a time or whether the database should only be populated with files from each MOD.

Should track that no ortholog was found

There should be a mechanism for indicating that a curator has looked and no ortholog could be located as of a certain date. For genomes that are not yet completely sequenced, we will want to revisit these when a new genome build is released. It would be nice to have a free text note field associated as well so we can leave notes regarding the analysis that was performed. -Doug

Should provide reports to focus curation effort

The interface should provide reports that will help focus curator effort. One example might be to provide a facility to search the data for species-specific orthologs where curation is not 'comprehensive'...these are the genes we should be working on. Another example might be to provide a report of genes where no ortholog was determined yet. It would be nice to be able to alter the sort order of the results in such reports by the following parameters: date Human Gene was added to the Ref. Genome set, ortholog ID, OMIM ID, most papers associated, biggest difference between papers associated with a gene and papers read for GO for that gene...maybe others... It would also be good to have a report generated by a query on OMIM ID that shows which species have 'comprehensive' annotation done for their ortholog(s)-Doug

Should record that orthology is 'comprehensive' as of a certain date

Curators should be able to mark that curation of an ortholog is 'comprehensive' as of a certain date. It would be good to be able to generate a report to look for cases where the 'comprehensive' curation date is getting old. These may need to be reviewed and updated. -Doug