Difference between revisions of "File Description: go-stats"

From GO Wiki
Jump to: navigation, search
m (release_date)
m (Review Status)
 
(38 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
+
=Usage=
  IN PROGRESS
+
Primary stats file computed.
 
 
=File usage=
 
Primary stat file computed.
 
  
 
=Input data =
 
=Input data =
Annotation stats are obtained by querying GOlr[http://golr-aux.geneontology.io/].  ***IS THIS THE RIGHT LINK???***
+
Annotation stats are obtained by querying the GOlr (GO Solr instance).
  
=File format(s)=
+
=Format(s)=
 
json
 
json
  
 
=File description=
 
=File description=
 +
The <code>go-stats</code> file contains the following information:
 
==release_date==
 
==release_date==
*'''release_date:''' Obtained from <code>release/metadata/release-date.json</code> (or <code>snapshot/metadata/release-date.json</code>).
+
*'''release_date:''' Obtained from <code>release/metadata/release-date.json</code> or <code>snapshot/metadata/release-date.json</code>.
  
 
==ontology==
 
==ontology==
* '''valid_terms:''' total number of valid terms (non-obsolete) in the ontology.
+
* '''valid_terms:''' Total number of valid terms (non-obsolete) in the ontology.
* '''obsolete_terms:''' total number of terms with "obsolete" status (ie, term_ids for which the is_obsolete field is true in the go.obo file) (this excludes merges).
+
* '''obsolete_terms:''' Total number of terms with <code>obsolete</code> status (ie, <code>term_ids</code> for which the <code>is_obsolete</code> field is true in the <code>go.obo</code> file) (this excludes merges).
* '''merged_terms:''' total number of merged terms (calculated by counting the term_ids for which the field is_obsolete is true in the go.obo file, and that also are are as alt_ids of a valid term).
+
* '''merged_terms:''' Total number of merged terms (calculated by counting the <code>term_ids</code> for which the field <code>is_obsolete</code> is true in the <code>go.obo</code> file, and that also are are as <code>alt_ids</code> of a valid term).
* '''biological_process_terms:''' total number of valid terms for the biological_process aspect.  
+
* '''biological_process_terms:''' Total number of valid terms for the biological_process aspect.  
* '''molecular_function_terms:''' total number of valid terms for the molecular_function aspect.
+
* '''molecular_function_terms:''' Total number of valid terms for the molecular_function aspect.
* '''cellular_component_terms:''' total number of valid terms for the cellular_component aspect.
+
* '''cellular_component_terms:''' Total number of valid terms for the cellular_component aspect.
* '''meta_statements:''' total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term.
+
* '''meta_statements:''' Total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term.
* '''cross_references:''' corresponds to the "xref" field in the go.obo file.  
+
* '''cross_references:''' Total number of cross_references, from the <code>xref</code> field of the <code>go.obo</code> file.  
* '''terms_relations:''' number of relations; the count of all relations, using the fields is_a, intersection_of and relationship of the go.obo file.  
+
* '''terms_relations:''' Total number of relations; the count of all relations, using the fields <code>is_a</code>, <code>intersection_of</code> and <code>relationship</code> of the <code>go.obo</code> file.
* '''changes_created_terms:''' number of created terms since the previous release.  
+
* '''changes_created_terms:''' Number of created terms since the previous release.
* '''changes_obsolete_terms:''' number of terms obsoleted since the previous release.  
+
* '''changes_valid_terms:''' Number of valid terms since the previous release.  
* '''changes_merged_terms:''' number of created merged since the previous release.
+
* '''changes_obsolete_terms:''' Number of terms obsoleted since the previous release.  
 +
* '''changes_merged_terms:''' Number of created merged since the previous release.
 +
* '''changes_biological_process_terms:''' Changes in the number of BP terms.
 +
* '''changes_molecular_function_terms":''' Changes in the number of MF terms.
 +
* '''changes_cellular_component_terms":'''Changes in the number of CC terms.
  
 
==annotations==
 
==annotations==
 
* '''total:''' The total number of annotations.
 
* '''total:''' The total number of annotations.
* '''by_aspect'''
+
* '''[[GO_stats-glossary#aspect |by_aspect]]''': P, F, C.
** P: all annotations in the database for biological_process.
+
* '''[[GO_stats-glossary#bioentity_type |by_bioentity_type]]:'''
** F:all annotations in the database for molecular_function.
+
** '''[[GO_stats-glossary#bioentity_type| all]]''': Number of annotations for each bioentity type.
** C:all annotations in the database for cellular_component.
+
** '''[[GO_stats-glossary#bioentity_type_cluster| cluster]]''': Number of annotations for each [[GO_stats-glossary#bioentity_type_cluster|bioentity type cluster]].
* '''by_bioentity_type'''
+
* '''by_qualifier:''' contributes_to, colocalizes_with, NOT
** all (same as bioentities > total > by_type>all)
+
* '''by_taxon''': Number of annotations for each of the annotated species in the database.
** cluster (same as bioentities > total > by_type>by_bioentity_type_cluster)
 
** by_taxon: Number of annotations for each of the annotated species in the database.
 
 
* '''by_evidence'''
 
* '''by_evidence'''
** '''all'''
+
** '''[http://geneontology.org/docs/guide-go-evidence-codes/ all]'''
** '''cluster'''
+
** '''[[GO_stats-glossary#evidence_cluster|by_evidence_cluster]]'''
* '''by_model_organism:''' For each species, the number of annotations are shown:  
+
* '''[[GO_stats-glossary#model_organism|by_model_organism]]:''' For each species, the number of annotations are shown:  
* '''by evidence:''' number of annotations for each individual evidence code, detailed by aspect  
+
** '''[http://geneontology.org/docs/guide-go-evidence-codes/ by evidence]:''' number of annotations for each individual evidence code, detailed by [[GO_stats-glossary#aspect |aspect]].
* '''by_evidence_cluster:''' number of annotations for each evidence cluster, detailed by aspect.
+
** '''[[GO_stats-glossary#evidence_cluster|by_evidence_cluster]]''': Number of annotations for each [[GO_stats-glossary#evidence_cluster|evidence cluster]] (PHYLO, IEA, OTHER, EXP, ND
, HTP), detailed by [[GO_stats-glossary#aspect |aspect]].
* '''by_group:''' Number of annotation by each group, obtained using the 'assigned_by' field.
+
** '''by_qualifier:''' contributes_to, colocalizes_with, NOT
* '''taxa'''
+
* '''by_group:''' Number of annotation for each contributing group, obtained using the <code>assigned_by</code> field of each input file.
** total: number of species with annotations.
+
 
** filtered: number of species with > 1,000 annotations.
+
==taxa==
* '''bioentities'''
+
* '''total:''' Number of species with annotations.  
** total: total number of annotated bioentities
+
* '''filtered:''' Number of species with at least 1,000 annotations.
** by_type
+
 
*** all (see list of bioentity types)
+
==bioentities==
*** cluster
+
*'''total: ''' Total number of annotated bioentities.
** by_filtered_taxon
+
* '''by_type'''
*** all: number of annotations for each species, by bioentity_type and by aspect.
+
** '''[[GO_stats-glossary#bioentity_type |all]]:''' Number of annotated bioentities by [[GO_stats-glossary#bioentity_type_cluster |bioentity type]].
*** cluster: number of annotations for each species, by bioentity_type_cluster and by aspect.
+
** '''[[GO_stats-glossary#bioentity_type_cluster |by_type_cluster]]:''' Number of annotated bioentities grouped by [[GO_stats-glossary#bioentity_type_cluster |clusters]].
* '''references'''
+
* '''[[GO_stats-glossary#filtered_taxa |by_filtered_taxon]]:
** all
+
** '''[[GO_stats-glossary#bioentity_type |all]]''': number of annotations for each species, by [[GO_stats-glossary#bioentity_type |bioentity type]], detailed by aspect.
*** total: total number of distinct annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome (note that for papers with both a PMID and an internal reference ID, the paper is counted twice).
+
** '''[[GO_stats-glossary#bioentity_type_cluster |by_type_cluster]]''': number of annotations for each species, by [[GO_stats-glossary#bioentity_type_cluster |bioentity_type_cluster]], detailed by aspect.
** by_filtered_taxon
+
 
** by group
+
==references==
** pmid: same as above, filtered for pmids.  
+
*'''all'''
*** total: total number of distinct pmids.
+
**'''total:''' Total number of distinct annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome (note that for papers with both a PMID and an internal reference ID, the paper is counted twice).
*** by_filtered_taxon
+
**'''[[GO_stats-glossary#filtered_taxa |by_filtered_taxon]]:''' Total number of annotated references by species.
*** by_group
+
**'''by_group:''' Total number of annotated references for each contributing group, obtained using the <code>assigned_by</code> field.
 +
*'''pmids'''
 +
**'''total:''' Total number of annotated PMIDs.  
 +
**'''[[GO_stats-glossary#filtered_taxa |by_filtered_taxon]]:''' Total number of annotated PMIDs by species.
 +
**'''by_group:''' Total number of annotated PMIDs for each contributing group, obtained using the <code>assigned_by</code> field.
 +
 
 +
=Direct access to files=
 +
==snapshot==
 +
http://snapshot.geneontology.org/release_stats/go-stats.json
  
 +
==current==
 +
http://current.geneontology.org/release_stats/go-stats.json
 
= Review Status =
 
= Review Status =
  
Last reviewed: October 17, 2018
+
Last reviewed: March 5, 2020
  
  
 
[[Category:Release Pipeline]]
 
[[Category:Release Pipeline]]

Latest revision as of 16:56, 5 March 2020

Usage

Primary stats file computed.

Input data

Annotation stats are obtained by querying the GOlr (GO Solr instance).

Format(s)

json

File description

The go-stats file contains the following information:

release_date

  • release_date: Obtained from release/metadata/release-date.json or snapshot/metadata/release-date.json.

ontology

  • valid_terms: Total number of valid terms (non-obsolete) in the ontology.
  • obsolete_terms: Total number of terms with obsolete status (ie, term_ids for which the is_obsolete field is true in the go.obo file) (this excludes merges).
  • merged_terms: Total number of merged terms (calculated by counting the term_ids for which the field is_obsolete is true in the go.obo file, and that also are are as alt_ids of a valid term).
  • biological_process_terms: Total number of valid terms for the biological_process aspect.
  • molecular_function_terms: Total number of valid terms for the molecular_function aspect.
  • cellular_component_terms: Total number of valid terms for the cellular_component aspect.
  • meta_statements: Total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term.
  • cross_references: Total number of cross_references, from the xref field of the go.obo file.
  • terms_relations: Total number of relations; the count of all relations, using the fields is_a, intersection_of and relationship of the go.obo file.
  • changes_created_terms: Number of created terms since the previous release.
  • changes_valid_terms: Number of valid terms since the previous release.
  • changes_obsolete_terms: Number of terms obsoleted since the previous release.
  • changes_merged_terms: Number of created merged since the previous release.
  • changes_biological_process_terms: Changes in the number of BP terms.
  • changes_molecular_function_terms": Changes in the number of MF terms.
  • changes_cellular_component_terms":Changes in the number of CC terms.

annotations

  • total: The total number of annotations.
  • by_aspect: P, F, C.
  • by_bioentity_type:
  • by_qualifier: contributes_to, colocalizes_with, NOT
  • by_taxon: Number of annotations for each of the annotated species in the database.
  • by_evidence
  • by_model_organism: For each species, the number of annotations are shown:
  • by_group: Number of annotation for each contributing group, obtained using the assigned_by field of each input file.

taxa

  • total: Number of species with annotations.
  • filtered: Number of species with at least 1,000 annotations.

bioentities

references

  • all
    • total: Total number of distinct annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome (note that for papers with both a PMID and an internal reference ID, the paper is counted twice).
    • by_filtered_taxon: Total number of annotated references by species.
    • by_group: Total number of annotated references for each contributing group, obtained using the assigned_by field.
  • pmids
    • total: Total number of annotated PMIDs.
    • by_filtered_taxon: Total number of annotated PMIDs by species.
    • by_group: Total number of annotated PMIDs for each contributing group, obtained using the assigned_by field.

Direct access to files

snapshot

http://snapshot.geneontology.org/release_stats/go-stats.json

current

http://current.geneontology.org/release_stats/go-stats.json

Review Status

Last reviewed: March 5, 2020