GO Metadata tags proposal

From GO Wiki
Jump to navigation Jump to search

These tags have been proposed to help synchronize the format used by the bibliography, GO.xrf_abbs, GO.references and the soon-to-be-created GO tools file. It may be preferable to store the latter in some kind of database, for example, an SQLite database.

Global tags

  • comment
  • is_obsolete
  • replaced_by
  • consider

Bibliography / GO.references

I've tried to base this on the consensus information in the citation microformat document at http://microformats.org/wiki/citation-formats#Implied_Schema and the shared info in the references and bibliography files.

Current format

GO.references

  • go_ref_id: [mandatory; cardinality 1; GO_REF:nnnnnnn]
  • alt_id: [not mandatory; cardinality 0,1,>1; GO_REF:nnnnnnn]
  • title: [mandatory; cardinality 1; free text]
  • author: [mandatory; cardinality 1; free text?? or cardinality 1,>1 and one entry per author?]
  • year: [mandatory, cardinality 1]
  • external_accession: [not mandatory; cardinality 0,1,>1; DB:id]
  • citation: [not mandatory; cardinality 0,1; use for published refs]
  • abstract: [mandatory; cardinality 1; free text]
  • comment: [not mandatory; cardinality 1; free text]
  • is_obsolete: [not mandatory; cardinality 0,1; 'true'; if tag is not present, value 'false' is assumed denotes a reference no longer used by the contributing database]

Bibliography

Tab-delimited text. Column headers are:

  • year
  • author
  • title
  • publication
  • issue
  • link
  • extra
  • pmid
  • doi
  • category

Proposal

  • id (GO.refs only at the moment; should the biblio have it too?)
  • alt_id (GO refs only)
  • author
  • title
  • year --> should this be split into day/month/year or just left as year?
  • journal | volume | issue | page
  • abstract (GO.references only)
  • url (plus URL type, e.g. full text, abstract, pdf)
  • xref
  • category (biblio only)
  • comment

Sample stanzas

GO.references

id: GO_REF:0000027
alt_id: GO_REF:000032
year: 2007
author: Midori Harris, PAMGO_GAT curators
title: BLAST search criteria for ISS assignment in PAMGO_GAT
abstract: This GO reference describes the... [snip the rest of the abstract]
xref: PMID:192381742
xref: DDB_REF:10158
comment: This is made up.

GO.biblio

year: 2005
author: Beckett P, Bancroft I, Trick M
title: Computational tools for Brassica-Arabidopsis comparative genomics.
journal: Comp Funct Genomics
issue: 6
page: 147-152
url: http://www3.interscience.wiley.com/cgi-bin/abstract/110473755/ABSTRACT [abs]
comment: Brassica genomic survey sequence
xref: PMID:024812674
xref: DOI:001.32395.1041483
category: 6
category: 11


Comments: (Midori 2008-07-24) At present, the bibilo data format doesn't seem broken, but if there are behind-the-scenes reasons to change it, no objections. I would appreciate at least a week's notice before it changes; also, there's that web interface from an age or two ago for additions -- would someone code up a successor? The outside world doesn't need IDs in the biblio data, so add them only if they'll be useful for GO internally. I don't think we need finer-grained dates than just the year (and it would be inconvenient to fill in day & month for the biblio).

(Amelia 2008-07-24) I've written a new, more generic system that I hope to use for the biblio, GO.references, GO.xref_abbs and the tools [will post URL when I've sorted out somewhere to host it]. I would like to incorporate a submission system for the biblio which allowed people to enter some information about the paper (probably an xref), and it would pull the info from Pubmed and have it all ready to submit to the biblio. I don't think I will have time to do that unfortunately, as AmiGO is beckoning...

Database info (xrf_abbs)

Current format

  • abbreviation
  • shorthand_name (for the database)
  • database
  • object (the object in the database to be represented in the ID)
  • synonym (of the abbreviation)
  • example_id (abbreviation + local_id)
  • generic_url
  • url_syntax
  • url_example
  • is_obsolete
  • consider
  • replaced_by

Comments are written as follows:

! This is a comment

Proposal

  • prefix (instead of abbreviation)
  • synonym
  • name (i.e. database name)
  • description
  • url (i.e. database URL)
  • object
  • example_id
  • local_id_syntax (a regexp with the ID syntax in it)
  • url_syntax
  • url_example

I'm not really all that keen on 'synonym' - it might be better if it were more specific what the synonym was of. Maybe something like alt_prefix instead?

Sample stanza

prefix: FB
synonym: FlyBase
name: FlyBase
description: The database concerned with fruit flies.
url: http://flybase.org/
object: Gene identifier
example_id: FB:FBgn0000024
local_id_syntax: ^FBgn\d{7}$
url_syntax: http://flybase.org/reports/[local_id].html
url_example: http://flybase.org/reports/FBgn0000024.html
comment: NCBI use 'FLYBASE' instead of FB

Comments: (Midori 2008-07-24) Looks fine; 'prefix' and 'alt_prefix' sound OK.

Tools

Current format

Hard-coded HTML pages

Proposed syntax

  • name
  • common_name (with type? e.g. abbr / acronym / nickname)
  • url
  • email (contact for the tool)
  • description
  • submission_date
  • developer:
    • name
    • url
    • any extra address elements, such as locality (city, etc.), region, country-name
  • publication - use the xref only

More tools-specific tags which I'll omit here -- see the tools spec for more details

Sample stanza

[Tool]
name: MadeUpTool
submission_date: 20080217
url: http://amigo.geneontology.org/cgi-bin/amigo/go.cgi
email: gohelp@geneontology.org
is_online_tool: true
is_standalone_tool: true
compatible_os: win
compatible_os: mac
developer: ORG:000008
publication: PMID:14962934
feature: ont_view
feature: annot_view
license: free_academic
description: [description omitted here]
[Org]
id: ORG:000008
name: Max Planck Institute for Molecular Genetics
url: http://www.molgen.mpg.de
locality: Berlin
country-name: Germany