Difference between revisions of "File Description: go-stats"

From GO Wiki
Jump to: navigation, search
m (File usage)
m
Line 1: Line 1:
 +
 +
  IN PROGRESS
 +
 
=File usage=
 
=File usage=
 
Primary stat file computed.
 
Primary stat file computed.
  
 
=Input data =
 
=Input data =
 
+
Annotation stats are obtained by querying GOlr[http://golr-aux.geneontology.io/].
 
   
 
   
  
 
=File format=
 
=File format=
 +
json
 +
 +
=File description=
 +
release_date
 +
obtained from release/metadata/release-date.json (or snapshot/metadata/release-date.json).
 +
ontology
 +
valid_terms: total number of valid terms (non-obsolete) in the ontology.
 +
obsolete_terms:total number of terms with "obsolete" status (ie, term_ids for which the is_obsolete field is true in the go.obo file) (this excludes merges).
 +
merged_terms:total number of merged terms (calculated by counting the term_ids for which the field is_obsolete is true in the go.obo file, and that also are are as alt_ids of a valid term).
 +
biological_process_terms: total number of valid terms for the biological_process aspect.
 +
molecular_function_terms: total number of valid terms for the molecular_function aspect.
 +
cellular_component_terms: total number of valid terms for the cellular_component aspect.
 +
meta_statements: total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term.
 +
cross_references: same as above, excluding cross-references. Corresponds to the "xref" field in the go.obo file.
 +
terms_relations: number of relations; the count of all relations, using the fields is_a, intersection_of and relationship of the go.obo file.
 +
changes_created_terms: number of created terms since the previous release.
 +
changes_obsolete_terms: number of terms obsoleted since the previous release.
 +
changes_merged_terms: number of created merged since the previous release.
  
  
 +
annotations
 +
total: The total number of annotations.
 +
by_aspect
 +
P: all annotations in the database for biological_process.
 +
F:all annotations in the database for molecular_function.
 +
C:all annotations in the database for cellular_component.
  
=File description=
+
 
 +
by_bioentity_type
 +
all (same as bioentities > total > by_type>all)
 +
cluster (same as bioentities > total > by_type>by_bioentity_type_cluster)
 +
 
 +
by_taxon: Number of annotations for each of the annotated species in the database.
 +
 
 +
 
 +
by_evidence
 +
all
 +
cluster
 +
by_model_organism: For each species, the number of annotations are shown:
 +
by evidence: number of annotations for each individual evidence code, detailed by aspect
 +
by_evidence_cluster: number of annotations for each evidence cluster, detailed by aspect.
 +
by_group: Number of annotation by each group, obtained using the 'assigned_by' field.
 +
taxa
 +
total: number of species with annotations.
 +
filtered: number of species with > 1,000 annotations.
 +
bioentities
 +
total: total number of annotated bioentities
 +
by_type
 +
all (see list of bioentity types)
 +
cluster
 +
by_filtered_taxon
 +
all: number of annotations for each species, by bioentity_type and by aspect.
 +
cluster: number of annotations for each species, by bioentity_type_cluster and by aspect.
 +
references
 +
all
 +
total: total number of distinct annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome (note that for papers with both a PMID and an internal reference ID, the paper is counted twice).
 +
by_filtered_taxon
 +
by group
 +
pmid: same as above, filtered for pmids.
 +
total: total number of distinct pmids.
 +
by_filtered_taxon
 +
by group
 +
  
  

Revision as of 05:07, 17 October 2019

  IN PROGRESS 

File usage

Primary stat file computed.

Input data

Annotation stats are obtained by querying GOlr[1].


File format

json

File description

release_date obtained from release/metadata/release-date.json (or snapshot/metadata/release-date.json). ontology valid_terms: total number of valid terms (non-obsolete) in the ontology. obsolete_terms:total number of terms with "obsolete" status (ie, term_ids for which the is_obsolete field is true in the go.obo file) (this excludes merges). merged_terms:total number of merged terms (calculated by counting the term_ids for which the field is_obsolete is true in the go.obo file, and that also are are as alt_ids of a valid term). biological_process_terms: total number of valid terms for the biological_process aspect. molecular_function_terms: total number of valid terms for the molecular_function aspect. cellular_component_terms: total number of valid terms for the cellular_component aspect. meta_statements: total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term. cross_references: same as above, excluding cross-references. Corresponds to the "xref" field in the go.obo file. terms_relations: number of relations; the count of all relations, using the fields is_a, intersection_of and relationship of the go.obo file. changes_created_terms: number of created terms since the previous release. changes_obsolete_terms: number of terms obsoleted since the previous release. changes_merged_terms: number of created merged since the previous release.


annotations total: The total number of annotations. by_aspect P: all annotations in the database for biological_process. F:all annotations in the database for molecular_function. C:all annotations in the database for cellular_component.


by_bioentity_type all (same as bioentities > total > by_type>all) cluster (same as bioentities > total > by_type>by_bioentity_type_cluster)

by_taxon: Number of annotations for each of the annotated species in the database.


by_evidence all cluster by_model_organism: For each species, the number of annotations are shown: by evidence: number of annotations for each individual evidence code, detailed by aspect by_evidence_cluster: number of annotations for each evidence cluster, detailed by aspect. by_group: Number of annotation by each group, obtained using the 'assigned_by' field. taxa total: number of species with annotations. filtered: number of species with > 1,000 annotations. bioentities total: total number of annotated bioentities by_type all (see list of bioentity types) cluster by_filtered_taxon all: number of annotations for each species, by bioentity_type and by aspect. cluster: number of annotations for each species, by bioentity_type_cluster and by aspect. references all total: total number of distinct annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome (note that for papers with both a PMID and an internal reference ID, the paper is counted twice). by_filtered_taxon by group pmid: same as above, filtered for pmids. total: total number of distinct pmids. by_filtered_taxon by group



Review Status

Last reviewed: October 17, 2018