Difference between revisions of "File Description: go-stats"

From GO Wiki
Jump to: navigation, search
m (Input data)
m (File description)
Line 12: Line 12:
  
 
=File description=
 
=File description=
release_date
+
==release_date==
 
obtained from release/metadata/release-date.json (or snapshot/metadata/release-date.json).
 
obtained from release/metadata/release-date.json (or snapshot/metadata/release-date.json).
ontology
+
==ontology==
valid_terms: total number of valid terms (non-obsolete) in the ontology.
+
* '''valid_terms:''' total number of valid terms (non-obsolete) in the ontology.
obsolete_terms:total number of terms with "obsolete" status (ie, term_ids for which the is_obsolete field is true in the go.obo file) (this excludes merges).
+
* '''obsolete_terms:''' total number of terms with "obsolete" status (ie, term_ids for which the is_obsolete field is true in the go.obo file) (this excludes merges).
 
merged_terms:total number of merged terms (calculated by counting the term_ids for which the field is_obsolete is true in the go.obo file, and that also are are as alt_ids of a valid term).
 
merged_terms:total number of merged terms (calculated by counting the term_ids for which the field is_obsolete is true in the go.obo file, and that also are are as alt_ids of a valid term).
biological_process_terms: total number of valid terms for the biological_process aspect.  
+
* '''biological_process_terms:''' total number of valid terms for the biological_process aspect.  
molecular_function_terms: total number of valid terms for the molecular_function aspect.
+
* '''molecular_function_terms:''' total number of valid terms for the molecular_function aspect.
cellular_component_terms: total number of valid terms for the cellular_component aspect.
+
* '''cellular_component_terms:''' total number of valid terms for the cellular_component aspect.
meta_statements: total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term.
+
* '''meta_statements:''' total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term.
cross_references: same as above, excluding cross-references. Corresponds to the "xref" field in the go.obo file.  
+
* '''cross_references:''' Corresponds to the "xref" field in the go.obo file.  
terms_relations: number of relations; the count of all relations, using the fields is_a, intersection_of and relationship of the go.obo file.  
+
* '''terms_relations:''' number of relations; the count of all relations, using the fields is_a, intersection_of and relationship of the go.obo file.  
changes_created_terms: number of created terms since the previous release.  
+
* '''changes_created_terms:''' number of created terms since the previous release.  
changes_obsolete_terms: number of terms obsoleted since the previous release.  
+
* '''changes_obsolete_terms:''' number of terms obsoleted since the previous release.  
changes_merged_terms: number of created merged since the previous release.  
+
* '''changes_merged_terms:''' number of created merged since the previous release.  
 
 
 
 
annotations
 
total: The total number of annotations.
 
by_aspect
 
P: all annotations in the database for biological_process.
 
F:all annotations in the database for molecular_function.
 
C:all annotations in the database for cellular_component.
 
 
 
 
 
by_bioentity_type
 
all (same as bioentities > total > by_type>all)
 
cluster (same as bioentities > total > by_type>by_bioentity_type_cluster)
 
 
 
by_taxon: Number of annotations for each of the annotated species in the database.
 
 
 
 
 
by_evidence
 
all
 
cluster
 
by_model_organism: For each species, the number of annotations are shown:
 
by evidence: number of annotations for each individual evidence code, detailed by aspect
 
by_evidence_cluster: number of annotations for each evidence cluster, detailed by aspect.
 
by_group: Number of annotation by each group, obtained using the 'assigned_by' field.
 
taxa
 
total: number of species with annotations.
 
filtered: number of species with > 1,000 annotations.
 
bioentities
 
total: total number of annotated bioentities
 
by_type
 
all (see list of bioentity types)
 
cluster
 
by_filtered_taxon
 
all: number of annotations for each species, by bioentity_type and by aspect.
 
cluster: number of annotations for each species, by bioentity_type_cluster and by aspect.
 
references
 
all
 
total: total number of distinct annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome (note that for papers with both a PMID and an internal reference ID, the paper is counted twice).
 
by_filtered_taxon
 
by group
 
pmid: same as above, filtered for pmids.
 
total: total number of distinct pmids.
 
by_filtered_taxon
 
by group
 
 
  
  
 +
==annotations==
 +
* '''total:''' The total number of annotations.
 +
* '''by_aspect'''
 +
** P: all annotations in the database for biological_process.
 +
** F:all annotations in the database for molecular_function.
 +
** C:all annotations in the database for cellular_component.
 +
* '''by_bioentity_type'''
 +
** all (same as bioentities > total > by_type>all)
 +
** cluster (same as bioentities > total > by_type>by_bioentity_type_cluster)
 +
** by_taxon: Number of annotations for each of the annotated species in the database.
 +
* '''by_evidence'''
 +
** '''all'''
 +
** '''cluster'''
 +
* '''by_model_organism:''' For each species, the number of annotations are shown:
 +
* '''by evidence:''' number of annotations for each individual evidence code, detailed by aspect
 +
* '''by_evidence_cluster:''' number of annotations for each evidence cluster, detailed by aspect.
 +
* '''by_group:''' Number of annotation by each group, obtained using the 'assigned_by' field.
 +
* '''taxa'''
 +
** total: number of species with annotations.
 +
** filtered: number of species with > 1,000 annotations.
 +
* '''bioentities'''
 +
** total: total number of annotated bioentities
 +
** by_type
 +
*** all (see list of bioentity types)
 +
*** cluster
 +
** by_filtered_taxon
 +
*** all: number of annotations for each species, by bioentity_type and by aspect.
 +
*** cluster: number of annotations for each species, by bioentity_type_cluster and by aspect.
 +
* '''references'''
 +
** all
 +
*** total: total number of distinct annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome (note that for papers with both a PMID and an internal reference ID, the paper is counted twice).
 +
** by_filtered_taxon
 +
** by group
 +
** pmid: same as above, filtered for pmids.
 +
*** total: total number of distinct pmids.
 +
*** by_filtered_taxon
 +
*** by_group
  
 
= Review Status =
 
= Review Status =

Revision as of 06:59, 17 October 2019

  IN PROGRESS 

File usage

Primary stat file computed.

Input data

Annotation stats are obtained by querying GOlr[1]. ***IS THIS THE RIGHT LINK???***

File format

json

File description

release_date

obtained from release/metadata/release-date.json (or snapshot/metadata/release-date.json).

ontology

  • valid_terms: total number of valid terms (non-obsolete) in the ontology.
  • obsolete_terms: total number of terms with "obsolete" status (ie, term_ids for which the is_obsolete field is true in the go.obo file) (this excludes merges).

merged_terms:total number of merged terms (calculated by counting the term_ids for which the field is_obsolete is true in the go.obo file, and that also are are as alt_ids of a valid term).

  • biological_process_terms: total number of valid terms for the biological_process aspect.
  • molecular_function_terms: total number of valid terms for the molecular_function aspect.
  • cellular_component_terms: total number of valid terms for the cellular_component aspect.
  • meta_statements: total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term.
  • cross_references: Corresponds to the "xref" field in the go.obo file.
  • terms_relations: number of relations; the count of all relations, using the fields is_a, intersection_of and relationship of the go.obo file.
  • changes_created_terms: number of created terms since the previous release.
  • changes_obsolete_terms: number of terms obsoleted since the previous release.
  • changes_merged_terms: number of created merged since the previous release.


annotations

  • total: The total number of annotations.
  • by_aspect
    • P: all annotations in the database for biological_process.
    • F:all annotations in the database for molecular_function.
    • C:all annotations in the database for cellular_component.
  • by_bioentity_type
    • all (same as bioentities > total > by_type>all)
    • cluster (same as bioentities > total > by_type>by_bioentity_type_cluster)
    • by_taxon: Number of annotations for each of the annotated species in the database.
  • by_evidence
    • all
    • cluster
  • by_model_organism: For each species, the number of annotations are shown:
  • by evidence: number of annotations for each individual evidence code, detailed by aspect
  • by_evidence_cluster: number of annotations for each evidence cluster, detailed by aspect.
  • by_group: Number of annotation by each group, obtained using the 'assigned_by' field.
  • taxa
    • total: number of species with annotations.
    • filtered: number of species with > 1,000 annotations.
  • bioentities
    • total: total number of annotated bioentities
    • by_type
      • all (see list of bioentity types)
      • cluster
    • by_filtered_taxon
      • all: number of annotations for each species, by bioentity_type and by aspect.
      • cluster: number of annotations for each species, by bioentity_type_cluster and by aspect.
  • references
    • all
      • total: total number of distinct annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome (note that for papers with both a PMID and an internal reference ID, the paper is counted twice).
    • by_filtered_taxon
    • by group
    • pmid: same as above, filtered for pmids.
      • total: total number of distinct pmids.
      • by_filtered_taxon
      • by_group

Review Status

Last reviewed: October 17, 2018