File Description: go-stats

From GO Wiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
  IN PROGRESS 

File usage

Primary stat file computed.

Input data

Annotation stats are obtained by querying GOlr[1]. ***IS THIS THE RIGHT LINK???***

File format(s)

json

File description

release_date

  • release_date: Obtained from release/metadata/release-date.json (or snapshot/metadata/release-date.json).

ontology

  • valid_terms: total number of valid terms (non-obsolete) in the ontology.
  • obsolete_terms: total number of terms with "obsolete" status (ie, term_ids for which the is_obsolete field is true in the go.obo file) (this excludes merges).
  • merged_terms: total number of merged terms (calculated by counting the term_ids for which the field is_obsolete is true in the go.obo file, and that also are are as alt_ids of a valid term).
  • biological_process_terms: total number of valid terms for the biological_process aspect.
  • molecular_function_terms: total number of valid terms for the molecular_function aspect.
  • cellular_component_terms: total number of valid terms for the cellular_component aspect.
  • meta_statements: total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term.
  • cross_references: corresponds to the "xref" field in the go.obo file.
  • terms_relations: number of relations; the count of all relations, using the fields is_a, intersection_of and relationship of the go.obo file.
  • changes_created_terms: number of created terms since the previous release.
  • changes_obsolete_terms: number of terms obsoleted since the previous release.
  • changes_merged_terms: number of created merged since the previous release.

annotations

  • total: The total number of annotations.
  • by_aspect
    • P: all annotations in the database for biological_process.
    • F:all annotations in the database for molecular_function.
    • C:all annotations in the database for cellular_component.
  • by_bioentity_type
    • all (same as bioentities > total > by_type>all)
    • cluster (same as bioentities > total > by_type>by_bioentity_type_cluster)
    • by_taxon: Number of annotations for each of the annotated species in the database.
  • by_evidence
    • all
    • cluster
  • by_model_organism: For each species, the number of annotations are shown:
  • by evidence: number of annotations for each individual evidence code, detailed by aspect
  • by_evidence_cluster: number of annotations for each evidence cluster, detailed by aspect.
  • by_group: Number of annotation by each group, obtained using the 'assigned_by' field.
  • taxa
    • total: number of species with annotations.
    • filtered: number of species with > 1,000 annotations.
  • bioentities
    • total: total number of annotated bioentities
    • by_type
      • all (see list of bioentity types)
      • cluster
    • by_filtered_taxon
      • all: number of annotations for each species, by bioentity_type and by aspect.
      • cluster: number of annotations for each species, by bioentity_type_cluster and by aspect.
  • references
    • all
      • total: total number of distinct annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome (note that for papers with both a PMID and an internal reference ID, the paper is counted twice).
    • by_filtered_taxon
    • by group
    • pmid: same as above, filtered for pmids.
      • total: total number of distinct pmids.
      • by_filtered_taxon
      • by_group

Review Status

Last reviewed: October 17, 2018