File Description: go-stats-summary

From GO Wiki
Revision as of 05:16, 18 October 2019 by Pascale (talk | contribs) (→‎ontology)
Jump to navigation Jump to search

File usage

This file provides a summary of the statistics for both the ontology and the annotations. The data presented on the GO website is obtained from this file.

Input data

Ontology stats are obtained from the go.obo file[1]; data is extracted from the go-ontology-changes.json file. Annotation stats are extracted from the go-stats.json file.

File format(s)

json

File description

release date

Obtained from release/metadata/release-date.json (or snapshot/metadata/release-date.json).

ontology

  • valid_terms: Total number of valid terms (non-obsolete) in the ontology.
  • obsolete_terms: Total number of terms with "obsolete" status (ie, term_ids for which the is_obsolete field is true in the go.obo file) (this excludes merges).
  • merged_terms: Total number of merged terms (calculated by counting the term_ids for which the field is_obsolete is true in the go.obo file, and that also are are as alt_ids of a valid term).
  • biological_process_terms: Total number of valid terms for the biological_process aspect.
  • molecular_function_terms: Total number of valid terms for the molecular_function aspect.
  • cellular_component_terms: Total number of valid terms for the cellular_component aspect.
  • meta_statements: Total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term.
  • cross_references: Total number of cross_references, from the xref field of the go.obo file.
  • terms_relations: Total number of relations; the count of all relations, using the fields is_a, intersection_of and relationship of the go.obo file.
  • changes_created_terms: Number of created terms since the previous release.
  • changes_obsolete_terms: Number of terms obsoleted since the previous release.
  • changes_merged_terms: Number of created merged since the previous release.

annotations

  • total: total number of annotations.
  • total_no_pb: total number of annotations, excluding direct annotations to GO:0005515 protein binding.
  • by_aspect * B
  • by_bioentity_type_cluster: annotations by clusters of bioentity types, as defined below:
    • protein
    • gene_product
    • gene
    • RNA (RNA cluster)
    • protein_complex
    • pseudogene
  • by_bioentity_type_cluster_np_pb: annotations by clusters of bioentity types, excluding direct annotations to GO:0005515 protein binding.
  • by_evidence_cluster: annotations grouped by clusters of evidence by_evidence_cluster_no_pb: annotations grouped by clusters of evidence, excluding direct annotations to GO:0005515 protein binding.
  • by_model_organism: human and 10 model organisms.

For each species, the number of annotations are shown: by_evidence_cluster: number of annotations for each evidence code cluster (PHYLO, IEA, OTHER, EXP, ND
, HTP), detailed by aspect (A, P, F, C, B).

  • taxa
    • total: number of species with annotations
    • filtered: number of species with > 1,000 annotations

bioentities

  • total: total number of annotated bioentities
  • total_no_pb: total number of annotated bioentities, excluding direct annotations to GO:0005515 protein binding.
  • by_type_cluster: number of annotated for different clusters of bioentities:
    • protein
    • gene_product
    • gene
    • RNA cluster
    • protein_complex
    • pseudogene
  • by_type_cluster_no_pb: number of annotated bioentities for different clusters of bioentities as described above, excluding direct annotations to GO:0005515 protein binding.
  • by_model_organism:number of annotated bioentities for human and each of the 10 model organisms. For each species, the number of annotated bioentities are grouped by bioentity clusters, and detailed by aspect (A, P, F, C).

references

all:

  • total: total number of distinct annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome (note that for papers with both a PMID and an internal reference ID, the paper is counted twice).
  • total_no_pb: total number of annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome), excluding papers only directly annotated to ‘GO:0005515
  • by_model_organism: total number of annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome), for human and each of the 10 model organisms.

pmid:

  • total: total number of annotated PMIDs
  • total_no_pb: total number of annotated PMIDs, excluding papers only directly annotated to "GO:0005515".
  • by_model_organism: total number of annotated PMIDs, for human and each of the 10 model organisms.

Review Status

Last reviewed: October 17, 2018