File Description: go-stats: Difference between revisions
m (→File usage) |
mNo edit summary |
||
Line 1: | Line 1: | ||
IN PROGRESS | |||
=File usage= | =File usage= | ||
Primary stat file computed. | Primary stat file computed. | ||
=Input data = | =Input data = | ||
Annotation stats are obtained by querying GOlr[http://golr-aux.geneontology.io/]. | |||
=File format= | =File format= | ||
json | |||
=File description= | |||
release_date | |||
obtained from release/metadata/release-date.json (or snapshot/metadata/release-date.json). | |||
ontology | |||
valid_terms: total number of valid terms (non-obsolete) in the ontology. | |||
obsolete_terms:total number of terms with "obsolete" status (ie, term_ids for which the is_obsolete field is true in the go.obo file) (this excludes merges). | |||
merged_terms:total number of merged terms (calculated by counting the term_ids for which the field is_obsolete is true in the go.obo file, and that also are are as alt_ids of a valid term). | |||
biological_process_terms: total number of valid terms for the biological_process aspect. | |||
molecular_function_terms: total number of valid terms for the molecular_function aspect. | |||
cellular_component_terms: total number of valid terms for the cellular_component aspect. | |||
meta_statements: total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term. | |||
cross_references: same as above, excluding cross-references. Corresponds to the "xref" field in the go.obo file. | |||
terms_relations: number of relations; the count of all relations, using the fields is_a, intersection_of and relationship of the go.obo file. | |||
changes_created_terms: number of created terms since the previous release. | |||
changes_obsolete_terms: number of terms obsoleted since the previous release. | |||
changes_merged_terms: number of created merged since the previous release. | |||
annotations | |||
total: The total number of annotations. | |||
by_aspect | |||
P: all annotations in the database for biological_process. | |||
F:all annotations in the database for molecular_function. | |||
C:all annotations in the database for cellular_component. | |||
by_bioentity_type | |||
all (same as bioentities > total > by_type>all) | |||
cluster (same as bioentities > total > by_type>by_bioentity_type_cluster) | |||
by_taxon: Number of annotations for each of the annotated species in the database. | |||
by_evidence | |||
all | |||
cluster | |||
by_model_organism: For each species, the number of annotations are shown: | |||
by evidence: number of annotations for each individual evidence code, detailed by aspect | |||
by_evidence_cluster: number of annotations for each evidence cluster, detailed by aspect. | |||
by_group: Number of annotation by each group, obtained using the 'assigned_by' field. | |||
taxa | |||
total: number of species with annotations. | |||
filtered: number of species with > 1,000 annotations. | |||
bioentities | |||
total: total number of annotated bioentities | |||
by_type | |||
all (see list of bioentity types) | |||
cluster | |||
by_filtered_taxon | |||
all: number of annotations for each species, by bioentity_type and by aspect. | |||
cluster: number of annotations for each species, by bioentity_type_cluster and by aspect. | |||
references | |||
all | |||
total: total number of distinct annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome (note that for papers with both a PMID and an internal reference ID, the paper is counted twice). | |||
by_filtered_taxon | |||
by group | |||
pmid: same as above, filtered for pmids. | |||
total: total number of distinct pmids. | |||
by_filtered_taxon | |||
by group | |||
Revision as of 09:07, 17 October 2019
IN PROGRESS
File usage
Primary stat file computed.
Input data
Annotation stats are obtained by querying GOlr[1].
File format
json
File description
release_date obtained from release/metadata/release-date.json (or snapshot/metadata/release-date.json). ontology valid_terms: total number of valid terms (non-obsolete) in the ontology. obsolete_terms:total number of terms with "obsolete" status (ie, term_ids for which the is_obsolete field is true in the go.obo file) (this excludes merges). merged_terms:total number of merged terms (calculated by counting the term_ids for which the field is_obsolete is true in the go.obo file, and that also are are as alt_ids of a valid term). biological_process_terms: total number of valid terms for the biological_process aspect. molecular_function_terms: total number of valid terms for the molecular_function aspect. cellular_component_terms: total number of valid terms for the cellular_component aspect. meta_statements: total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term. cross_references: same as above, excluding cross-references. Corresponds to the "xref" field in the go.obo file. terms_relations: number of relations; the count of all relations, using the fields is_a, intersection_of and relationship of the go.obo file. changes_created_terms: number of created terms since the previous release. changes_obsolete_terms: number of terms obsoleted since the previous release. changes_merged_terms: number of created merged since the previous release.
annotations
total: The total number of annotations.
by_aspect
P: all annotations in the database for biological_process.
F:all annotations in the database for molecular_function.
C:all annotations in the database for cellular_component.
by_bioentity_type
all (same as bioentities > total > by_type>all)
cluster (same as bioentities > total > by_type>by_bioentity_type_cluster)
by_taxon: Number of annotations for each of the annotated species in the database.
by_evidence
all
cluster
by_model_organism: For each species, the number of annotations are shown:
by evidence: number of annotations for each individual evidence code, detailed by aspect
by_evidence_cluster: number of annotations for each evidence cluster, detailed by aspect.
by_group: Number of annotation by each group, obtained using the 'assigned_by' field.
taxa
total: number of species with annotations.
filtered: number of species with > 1,000 annotations.
bioentities
total: total number of annotated bioentities
by_type
all (see list of bioentity types)
cluster
by_filtered_taxon
all: number of annotations for each species, by bioentity_type and by aspect.
cluster: number of annotations for each species, by bioentity_type_cluster and by aspect.
references
all
total: total number of distinct annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome (note that for papers with both a PMID and an internal reference ID, the paper is counted twice).
by_filtered_taxon
by group
pmid: same as above, filtered for pmids.
total: total number of distinct pmids.
by_filtered_taxon
by group
Review Status
Last reviewed: October 17, 2018