GO Metadata tags proposal
These tags have been proposed to help synchronize the format used by the bibliography, GO.xrf_abbs, GO.references and the soon-to-be-created GO tools file. It may be preferable to store the latter in some kind of database, for example, an SQLite database.
Global tags
- comment
- is_obsolete
- replaced_by
- consider
Bibliography / GO.references
I've tried to base this on the consensus information in the citation microformat document at http://microformats.org/wiki/citation-formats#Implied_Schema and the shared info in the references and bibliography files.
Current format
GO.references
- go_ref_id: [mandatory; cardinality 1; GO_REF:nnnnnnn]
- alt_id: [not mandatory; cardinality 0,1,>1; GO_REF:nnnnnnn]
- title: [mandatory; cardinality 1; free text]
- author: [mandatory; cardinality 1; free text?? or cardinality 1,>1 and one entry per author?]
- year: [mandatory, cardinality 1]
- external_accession: [not mandatory; cardinality 0,1,>1; DB:id]
- citation: [not mandatory; cardinality 0,1; use for published refs]
- abstract: [mandatory; cardinality 1; free text]
- comment: [not mandatory; cardinality 1; free text]
- is_obsolete: [not mandatory; cardinality 0,1; 'true'; if tag is not present, value 'false' is assumed denotes a reference no longer used by the contributing database]
Bibliography
Tab-delimited text. Column headers are:
- year
- author
- title
- publication
- issue
- link
- extra
- pmid
- doi
- category
Proposal
- id (GO.refs only at the moment; should the biblio have it too?)
- alt_id (GO refs only)
- author
- title
- year --> should this be split into day/month/year or just left as year?
- journal | volume | issue | page
- abstract (GO.references only)
- url (plus URL type, e.g. full text, abstract, pdf)
- xref
- category (biblio only)
- comment
Sample stanzas
GO.references
id: GO_REF:0000027 alt_id: GO_REF:000032 year: 2007 author: Midori Harris, PAMGO_GAT curators title: BLAST search criteria for ISS assignment in PAMGO_GAT abstract: This GO reference describes the... [snip the rest of the abstract] xref: PMID:192381742 xref: DDB_REF:10158 comment: This is made up.
GO.biblio
year: 2005 author: Beckett P, Bancroft I, Trick M title: Computational tools for Brassica-Arabidopsis comparative genomics. journal: Comp Funct Genomics issue: 6 page: 147-152 url: http://www3.interscience.wiley.com/cgi-bin/abstract/110473755/ABSTRACT [abs] comment: Brassica genomic survey sequence xref: PMID:024812674 xref: DOI:001.32395.1041483 category: 6 category: 11
Comments: (Midori 2008-07-24) At present, the bibilo data format doesn't seem broken, but if there are behind-the-scenes reasons to change it, no objections. I would appreciate at least a week's notice before it changes; also, there's that web interface from an age or two ago for additions -- would someone code up a successor? The outside world doesn't need IDs in the biblio data, so add them only if they'll be useful for GO internally. I don't think we need finer-grained dates than just the year (and it would be inconvenient to fill in day & month for the biblio).
(Amelia 2008-07-24) I've written a new, more generic system that I hope to use for the biblio, GO.references, GO.xref_abbs and the tools [will post URL when I've sorted out somewhere to host it]. I would like to incorporate a submission system for the biblio which allowed people to enter some information about the paper (probably an xref), and it would pull the info from Pubmed and have it all ready to submit to the biblio. I don't think I will have time to do that unfortunately, as AmiGO is beckoning...
Database info (xrf_abbs)
Current format
- abbreviation
- shorthand_name (for the database)
- database
- object (the object in the database to be represented in the ID)
- synonym (of the abbreviation)
- example_id (abbreviation + local_id)
- generic_url
- url_syntax
- url_example
- is_obsolete
- consider
- replaced_by
Comments are written as follows:
! This is a comment
Proposal
- prefix (instead of abbreviation)
- synonym
- name (i.e. database name)
- description
- url (i.e. database URL)
- object
- example_id
- local_id_syntax (a regexp with the ID syntax in it)
- url_syntax
- url_example
I'm not really all that keen on 'synonym' - it might be better if it were more specific what the synonym was of. Maybe something like alt_prefix instead?
Sample stanza
prefix: FB synonym: FlyBase name: FlyBase description: The database concerned with fruit flies. url: http://flybase.org/ object: Gene identifier example_id: FB:FBgn0000024 local_id_syntax: ^FBgn\d{7}$ url_syntax: http://flybase.org/reports/[local_id].html url_example: http://flybase.org/reports/FBgn0000024.html comment: NCBI use 'FLYBASE' instead of FB
Comments: (Midori 2008-07-24) Looks fine; 'prefix' and 'alt_prefix' sound OK.
Tools
Current format
Hard-coded HTML pages
Proposed syntax
- name
- common_name (with type? e.g. abbr / acronym / nickname)
- url
- email (contact for the tool)
- description
- submission_date
- developer:
- name
- url
- any extra address elements, such as locality (city, etc.), region, country-name
- publication - use the xref only
More tools-specific tags which I'll omit here -- see the tools spec for more details
Sample stanza
[Tool] name: MadeUpTool submission_date: 20080217 url: http://amigo.geneontology.org/cgi-bin/amigo/go.cgi email: gohelp@geneontology.org is_online_tool: true is_standalone_tool: true compatible_os: win compatible_os: mac developer: ORG:000008 publication: PMID:14962934 feature: ont_view feature: annot_view license: free_academic description: [description omitted here]
[Org] id: ORG:000008 name: Max Planck Institute for Molecular Genetics url: http://www.molgen.mpg.de locality: Berlin country-name: Germany