Difference between revisions of "File Description: go-stats"
From GO Wiki
m (→Input data) |
m (→File description) |
||
Line 12: | Line 12: | ||
=File description= | =File description= | ||
− | release_date | + | ==release_date== |
obtained from release/metadata/release-date.json (or snapshot/metadata/release-date.json). | obtained from release/metadata/release-date.json (or snapshot/metadata/release-date.json). | ||
− | ontology | + | ==ontology== |
− | valid_terms: total number of valid terms (non-obsolete) in the ontology. | + | * '''valid_terms:''' total number of valid terms (non-obsolete) in the ontology. |
− | obsolete_terms:total number of terms with "obsolete" status (ie, term_ids for which the is_obsolete field is true in the go.obo file) (this excludes merges). | + | * '''obsolete_terms:''' total number of terms with "obsolete" status (ie, term_ids for which the is_obsolete field is true in the go.obo file) (this excludes merges). |
merged_terms:total number of merged terms (calculated by counting the term_ids for which the field is_obsolete is true in the go.obo file, and that also are are as alt_ids of a valid term). | merged_terms:total number of merged terms (calculated by counting the term_ids for which the field is_obsolete is true in the go.obo file, and that also are are as alt_ids of a valid term). | ||
− | biological_process_terms: total number of valid terms for the biological_process aspect. | + | * '''biological_process_terms:''' total number of valid terms for the biological_process aspect. |
− | molecular_function_terms: total number of valid terms for the molecular_function aspect. | + | * '''molecular_function_terms:''' total number of valid terms for the molecular_function aspect. |
− | cellular_component_terms: total number of valid terms for the cellular_component aspect. | + | * '''cellular_component_terms:''' total number of valid terms for the cellular_component aspect. |
− | meta_statements: total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term. | + | * '''meta_statements:''' total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term. |
− | cross_references: | + | * '''cross_references:''' Corresponds to the "xref" field in the go.obo file. |
− | terms_relations: number of relations; the count of all relations, using the fields is_a, intersection_of and relationship of the go.obo file. | + | * '''terms_relations:''' number of relations; the count of all relations, using the fields is_a, intersection_of and relationship of the go.obo file. |
− | changes_created_terms: number of created terms since the previous release. | + | * '''changes_created_terms:''' number of created terms since the previous release. |
− | changes_obsolete_terms: number of terms obsoleted since the previous release. | + | * '''changes_obsolete_terms:''' number of terms obsoleted since the previous release. |
− | changes_merged_terms: number of created merged since the previous release. | + | * '''changes_merged_terms:''' number of created merged since the previous release. |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | ==annotations== | ||
+ | * '''total:''' The total number of annotations. | ||
+ | * '''by_aspect''' | ||
+ | ** P: all annotations in the database for biological_process. | ||
+ | ** F:all annotations in the database for molecular_function. | ||
+ | ** C:all annotations in the database for cellular_component. | ||
+ | * '''by_bioentity_type''' | ||
+ | ** all (same as bioentities > total > by_type>all) | ||
+ | ** cluster (same as bioentities > total > by_type>by_bioentity_type_cluster) | ||
+ | ** by_taxon: Number of annotations for each of the annotated species in the database. | ||
+ | * '''by_evidence''' | ||
+ | ** '''all''' | ||
+ | ** '''cluster''' | ||
+ | * '''by_model_organism:''' For each species, the number of annotations are shown: | ||
+ | * '''by evidence:''' number of annotations for each individual evidence code, detailed by aspect | ||
+ | * '''by_evidence_cluster:''' number of annotations for each evidence cluster, detailed by aspect. | ||
+ | * '''by_group:''' Number of annotation by each group, obtained using the 'assigned_by' field. | ||
+ | * '''taxa''' | ||
+ | ** total: number of species with annotations. | ||
+ | ** filtered: number of species with > 1,000 annotations. | ||
+ | * '''bioentities''' | ||
+ | ** total: total number of annotated bioentities | ||
+ | ** by_type | ||
+ | *** all (see list of bioentity types) | ||
+ | *** cluster | ||
+ | ** by_filtered_taxon | ||
+ | *** all: number of annotations for each species, by bioentity_type and by aspect. | ||
+ | *** cluster: number of annotations for each species, by bioentity_type_cluster and by aspect. | ||
+ | * '''references''' | ||
+ | ** all | ||
+ | *** total: total number of distinct annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome (note that for papers with both a PMID and an internal reference ID, the paper is counted twice). | ||
+ | ** by_filtered_taxon | ||
+ | ** by group | ||
+ | ** pmid: same as above, filtered for pmids. | ||
+ | *** total: total number of distinct pmids. | ||
+ | *** by_filtered_taxon | ||
+ | *** by_group | ||
= Review Status = | = Review Status = |
Revision as of 06:59, 17 October 2019
IN PROGRESS
Contents
File usage
Primary stat file computed.
Input data
Annotation stats are obtained by querying GOlr[1]. ***IS THIS THE RIGHT LINK???***
File format
json
File description
release_date
obtained from release/metadata/release-date.json (or snapshot/metadata/release-date.json).
ontology
- valid_terms: total number of valid terms (non-obsolete) in the ontology.
- obsolete_terms: total number of terms with "obsolete" status (ie, term_ids for which the is_obsolete field is true in the go.obo file) (this excludes merges).
merged_terms:total number of merged terms (calculated by counting the term_ids for which the field is_obsolete is true in the go.obo file, and that also are are as alt_ids of a valid term).
- biological_process_terms: total number of valid terms for the biological_process aspect.
- molecular_function_terms: total number of valid terms for the molecular_function aspect.
- cellular_component_terms: total number of valid terms for the cellular_component aspect.
- meta_statements: total number of identifiers, alternative identifiers, namespace, term label, comments, synonyms, definitions, subsets, for each valid term.
- cross_references: Corresponds to the "xref" field in the go.obo file.
- terms_relations: number of relations; the count of all relations, using the fields is_a, intersection_of and relationship of the go.obo file.
- changes_created_terms: number of created terms since the previous release.
- changes_obsolete_terms: number of terms obsoleted since the previous release.
- changes_merged_terms: number of created merged since the previous release.
annotations
- total: The total number of annotations.
- by_aspect
- P: all annotations in the database for biological_process.
- F:all annotations in the database for molecular_function.
- C:all annotations in the database for cellular_component.
- by_bioentity_type
- all (same as bioentities > total > by_type>all)
- cluster (same as bioentities > total > by_type>by_bioentity_type_cluster)
- by_taxon: Number of annotations for each of the annotated species in the database.
- by_evidence
- all
- cluster
- by_model_organism: For each species, the number of annotations are shown:
- by evidence: number of annotations for each individual evidence code, detailed by aspect
- by_evidence_cluster: number of annotations for each evidence cluster, detailed by aspect.
- by_group: Number of annotation by each group, obtained using the 'assigned_by' field.
- taxa
- total: number of species with annotations.
- filtered: number of species with > 1,000 annotations.
- bioentities
- total: total number of annotated bioentities
- by_type
- all (see list of bioentity types)
- cluster
- by_filtered_taxon
- all: number of annotations for each species, by bioentity_type and by aspect.
- cluster: number of annotations for each species, by bioentity_type_cluster and by aspect.
- references
- all
- total: total number of distinct annotated references (includes PMIDs, GO_REFs, DOIs, internal IDs for Model Organism Databases and Reactome (note that for papers with both a PMID and an internal reference ID, the paper is counted twice).
- by_filtered_taxon
- by group
- pmid: same as above, filtered for pmids.
- total: total number of distinct pmids.
- by_filtered_taxon
- by_group
- all
Review Status
Last reviewed: October 17, 2018