From GO Wiki
Revision as of 09:54, 14 April 2008 by Peterd (talk | contribs)

Jump to: navigation, search


This item grew out of work on adding terms to the function ontology for enzyme activities, based on EC entries that don't have corresponding GO terms. EC classifies enzymes mainly on the basis of reaction mechanism, which fits nicely with GO's usual criteria for including function terms, and allows us to capture most EC entries and the EC hierachical organization.

For petidase and protease activities, however, all bets are off. EC includes not only various peptide-bond-cleaving mechanisms, it also has a rather large number of entries whose names and reactions make them seem much more like gene products than descriptions of distinct activities. Two examples that are very gene-producty:

name: thermomycolin activity
namespace: molecular_function
def: "Catalysis of the reaction: Rather nonspecific hydrolysis of proteins. Preferential cleavage:
 Ala-|-, Tyr-|-, Phe-|- in small molecule substrates." [EC:]

name: streptogrisin B activity
namespace: molecular_function
def: "Catalysis of the reaction: Hydrolysis of proteins with trypsin-like specificity."

Others are in a grey area, e.g.

name: PepB aminopeptidase activity
namespace: molecular_function
def: "Catalysis of the reaction: Release of an N-terminal amino acid, Xaa, from a peptide or
arylamide. Xaa is preferably Glu or Asp but may be other amino acids, including Leu, Met, His,
Cys and Gln." [EC:]
synonym: "PepB aminopeptidase activity" EXACT [EC:]

name: metridin activity
namespace: molecular_function
def: "Catalysis of the reaction: Preferential cleavage: Tyr-|-, Phe-|-, Leu-|-; little action on
Trp-|-." [EC:]

Question: should GO continue to add new function terms corresponding to the gene-product-ish EC entries? GO has already included quite a few, e.g.

id: GO:0004213
name: cathepsin B activity
namespace: molecular_function
def: "Catalysis of the hydrolysis of peptide bonds with a broad specificity. Preferentially
cleaves the terminal bond of -Arg-Arg-Xaa motifs in small molecule substrates (thus differing
from cathepsin L). In addition to being an endopeptidase, shows peptidyl-dipeptidase activity,
liberating C-terminal dipeptides." [EC:]
synonym: "cathepsin B1 activity" EXACT [EC:]
synonym: "cathepsin II" RELATED [EC:]
xref: EC:
xref: MetaCyc:
is_a: GO:0004197 ! cysteine-type endopeptidase activity

(and the other cathepsin terms)


From Ben Hitz:


Proteases Produce Pernicious Problems Periodically.

The reason for this is just historical, they are such an early and important class of proteins that their nomenclature is FUBAR.

I would not add the "gene_product" like activities from EC. Unfortunately this means either clean up EC or clean up GO.

I just spent a few minutes at this page:

Contents EC 3.4 to EC 3.12

I can't see that cleaning that mess up would be fun. Basically you can classify proteases by catalytic mechanism (serine, cysteine, metallo-, aspartyl-) or substrate (X-Y) where X and Y are different amino acids. Other distinguishing characteristics: endopeptidase vs. exopeptidase, D- vs. L- amino acids.

Furthermore, there is evolutionary classification which completely overlaps these boundries (the catalytic triad of Ser-Asp-His is _the_ classic example of convergent evolution to a common enzyme mechanism).

Perusing the go dag, I would say that it would be best off dumping 90% of the substrate specific terms. It may be worthwhile distinguishing between proteases (act on "protein") and peptidases (act only on short peptides) or endo/exo peptidases, but no further. I would also include things like EC and EC where the enzyme acts on "atypical" peptides.

I would probably go ahead and distinguish based on catalytic mechanism, if just to reduce the number.

Proposed organization

Here is a rough cut: -> (reverse is_a) == (reverse part_of)

catalytic activity -> hydrolase activity ->  peptidase activity
peptidase activity -> "regular" (i.e, L,L alpha-alpha peptide bond found in proteins) peptidase
activity (most of 3.4.-)
	-> D-D peptidase activity -> D-Ala-D-Ala peptidase activity (,
	-> Beta (L,L) peptidase activty (;
	-> Gamma (L, L) peptidase -> ( tricky because I don't want to say Gamma-glutamyl ..
	-> Gamma (D, L) peptidase -> (

"regular" peptidase activity -> endopeptidase activity
	-> exopeptidase activity -> aminopeptidase activity
	-> carboxypeptidase activity

"regular" peptidase activity -> serine peptidase activity
	-> cysteine peptidase activity
	-> aspartyl peptidase activity
	-> threonine peptidease activity (see, e.g., 3.4.25.-)
	-> metallopeptidase activity

BUT note: EC zinc D-Ala-D-Ala carboxypeptidase
and EC serine-type D-Ala-D-Ala carboxypeptidase

So you could need many "cross products", but I suppose we could only add them as needed (i.e, don't need threonine exopeptidase until someone discovers one).

One area I didn't cover are the ATP dependent proteases (Lon, CliP, La) - GO:0004176. The Lon family I think are all serine proteases, but it's certainly not guarenteed. Not sure it's worth splitting even up higher into "ATP-dependent" and "ATP-independent"!

Hope this helps.


Very similar proposal

From: Colin Batchelor (RSC)

I pretty much second everything Ben has to say, especially about the D-amino acids.

From a text-mining point of view I don't want to see any single words ending in -ase disappearing altogether from the part of the ontology we scoop up (names, EXACT synonyms and potentially NARROW synonyms if I can be sure that there's no duplication).

So that means I want to keep "exopeptidase", "endopeptidase", "metallopeptidase", "metalloendopeptidase" and so forth. That would ideally mean a new tree, something like "molecular function attribute" with "metal-catalysed" (or even a has_catalyst relation pointing to ChEBI) but I'm happy to wait for the revolution for that one.

Likewise keep the substrate-based terms like "cyanophycinase" (though I don't see that cleavage of cyanophycin is necessarily a serine-type peptidase activity), "elastase" and "fibrolase".

On the other hand, for example, procollagen N-endopeptidase activity (GO:0017074) is, at least according to the definition, not intrinsically a metalloendopeptidase activity; that's a statement about the gene products that realize that activity.

So I'm not convinced that metalloexopeptidase, metalloendopeptidase, serine-type peptidase and so on should have any children. Does that sound fair?

I can't see the case for keeping the cathepsin terms in GO because I can't see how you would write genus--differentia definitions for them. I'd like to see the bare gene product names remain in GO as RELATED synonyms for their parents, though. Astacin activity (GO:0008533) could go, but bontoxilysin activity (GO:0033264) looks substrate-based so can stay.

A rule-of-thumb that feels right is that if something ends in -in and is qualified by a letter or a number at the end (cathepsin B activity, stromelysin 1 activity for example)

I certainly can't see the case for adding, for example, thermomycolin activity and metridin activity.

best wishes, Colin.


From: Peter D'Eustachio (Reactome)

EC recognizes that there is a mess at the bottom of the hierarchy, but has an organization essentially identical to the one proposed here two levels up (so the issue here is mostly one of granularity rather than of enzyme classification):

clip taken from

"The nomenclature of the peptidases is troublesome. Their specificity is commonly difficult to define, depending upon the nature of several amino acid residues around the peptide bond to be hydrolysed and also on the conformation of the substrate polypeptide chain. A classification involving the additional criterion of catalytic mechanism is therefore used.

"Two sets of sub-subclasses of peptidases are recognised, those of the exopeptidases (EC 3.4.11-19) and those of the endopeptidases (EC 3.4.21-24 and EC 3.4.99). The exopeptidases act only near the ends of polypeptide chains, and those acting at a free N-terminus liberate a single amino-acid residue (aminopeptidases, EC 3.4.11), or a dipeptide or a tripeptide (dipeptidyl-peptidases and tripeptidyl-peptidases, EC 3.4.14). The exopeptidases acting at a free C-terminus liberate a single residue (carboxypeptidases, EC 3.4.16-18) or a dipeptide (peptidyl-dipeptidases, EC 3.4.15). The carboxypeptidases are allocated to four groups on the basis of catalytic mechanism: the serine-type carboxypeptidases (EC 3.4.16), the metallocarboxypeptidases (EC 3.4.17) and the cysteine-type carboxypeptidases (EC 3.4.18). Other exopeptidases are specific for dipeptides (dipeptidases, EC 3.4.13), or remove terminal residues that are substituted, cyclized or linked by isopeptide bonds (peptide linkages other than those of α-carboxyl to α-amino groups) (omega peptidases, EC 3.4.19).

"The endopeptidases are divided into sub-subclasses on the basis of catalytic mechanism, and specificity is used only to identify individual enzymes within the groups. These are the sub-subclasses of serine endopeptidases (EC 3.4.21), cysteine endopeptidases (EC 3.4.22), aspartic endopeptidases (EC 3.4.23), metalloendopeptidases (EC 3.4.24) and threonine endopeptidases (EC 3.4.25). Endopeptidases that could not be assigned to any of the sub-subclasses EC 3.4.21-25 were listed in sub-subclass EC 3.4.99."