Proteases
Background
This item grew out of work on adding terms to the function ontology for enzyme activities, based on EC entries that don't have corresponding GO terms. EC classifies enzymes mainly on the basis of reaction mechanism, which fits nicely with GO's usual criteria for including function terms, and allows us to capture most EC entries and the EC hierachical organization.
For petidase and protease activities, however, all bets are off. EC includes not only various peptide-bond-cleaving mechanisms, it also has a rather large number of entries whose names and reactions make them seem much more like gene products than descriptions of distinct activities. Two examples that are very gene-producty:
[Term] name: thermomycolin activity namespace: molecular_function def: "Catalysis of the reaction: Rather nonspecific hydrolysis of proteins. Preferential cleavage: Ala-|-, Tyr-|-, Phe-|- in small molecule substrates." [EC:3.4.21.65] [Term] name: streptogrisin B activity namespace: molecular_function def: "Catalysis of the reaction: Hydrolysis of proteins with trypsin-like specificity." [EC:3.4.21.81]
Others are in a grey area, e.g.
[Term] name: PepB aminopeptidase activity namespace: molecular_function def: "Catalysis of the reaction: Release of an N-terminal amino acid, Xaa, from a peptide or arylamide. Xaa is preferably Glu or Asp but may be other amino acids, including Leu, Met, His, Cys and Gln." [EC:3.4.11.23] synonym: "PepB aminopeptidase activity" EXACT [EC:3.4.11.23] [Term] name: metridin activity namespace: molecular_function def: "Catalysis of the reaction: Preferential cleavage: Tyr-|-, Phe-|-, Leu-|-; little action on Trp-|-." [EC:3.4.21.3]
Question: should GO continue to add new function terms corresponding to the gene-product-ish EC entries? GO has already included quite a few, e.g.
[Term] id: GO:0004213 name: cathepsin B activity namespace: molecular_function def: "Catalysis of the hydrolysis of peptide bonds with a broad specificity. Preferentially cleaves the terminal bond of -Arg-Arg-Xaa motifs in small molecule substrates (thus differing from cathepsin L). In addition to being an endopeptidase, shows peptidyl-dipeptidase activity, liberating C-terminal dipeptides." [EC:3.4.22.1] synonym: "cathepsin B1 activity" EXACT [EC:3.4.22.1] synonym: "cathepsin II" RELATED [EC:3.4.22.1] xref: EC:3.4.22.1 xref: MetaCyc:3.4.22.1-RXN is_a: GO:0004197 ! cysteine-type endopeptidase activity
(and the other cathepsin terms)
Proposal
From Ben Hitz:
Comments
Proteases Produce Pernicious Problems Periodically.
The reason for this is just historical, they are such an early and important class of proteins that their nomenclature is FUBAR.
I would not add the "gene_product" like activities from EC. Unfortunately this means either clean up EC or clean up GO.
I just spent a few minutes at this page:
I can't see that cleaning that mess up would be fun. Basically you can classify proteases by catalytic mechanism (serine, cysteine, metallo-, aspartyl-) or substrate (X-Y) where X and Y are different amino acids. Other distinguishing characteristics: endopeptidase vs. exopeptidase, D- vs. L- amino acids.
Furthermore, there is evolutionary classification which completely overlaps these boundries (the catalytic triad of Ser-Asp-His is _the_ classic example of convergent evolution to a common enzyme mechanism).
Perusing the go dag, I would say that it would be best off dumping 90% of the substrate specific terms. It may be worthwhile distinguishing between proteases (act on "protein") and peptidases (act only on short peptides) or endo/exo peptidases, but no further. I would also include things like EC 3.4.13.20 and EC 3.4.13.22 where the enzyme acts on "atypical" peptides.
I would probably go ahead and distinguish based on catalytic mechanism, if just to reduce the number.
Proposed organization
Here is a rough cut: -> (reverse is_a) == (reverse part_of) catalytic activity -> hydrolase activity -> peptidase activity peptidase activity -> "regular" (i.e, L,L alpha-alpha peptide bond found in proteins) peptidase activity (most of 3.4.-) -> D-D peptidase activity -> D-Ala-D-Ala peptidase activity (3.4.13.22, 3.4.11.19) -> Beta (L,L) peptidase activty (3.4.13.20; 3.4.19.5) -> Gamma (L, L) peptidase -> (3.4.19.9) tricky because I don't want to say Gamma-glutamyl .. -> Gamma (D, L) peptidase -> (3.14.19.11) "regular" peptidase activity -> endopeptidase activity -> exopeptidase activity -> aminopeptidase activity -> carboxypeptidase activity and/or "regular" peptidase activity -> serine peptidase activity -> cysteine peptidase activity -> aspartyl peptidase activity -> threonine peptidease activity (see, e.g., 3.4.25.-) -> metallopeptidase activity BUT note: EC 3.4.17.14 zinc D-Ala-D-Ala carboxypeptidase and EC 3.4.16.4 serine-type D-Ala-D-Ala carboxypeptidase
So you could need many "cross products", but I suppose we could only add them as needed (i.e, don't need threonine exopeptidase until someone discovers one).
One area I didn't cover are the ATP dependent proteases (Lon, CliP, La) - GO:0004176. The Lon family I think are all serine proteases, but it's certainly not guarenteed. Not sure it's worth splitting even up higher into "ATP-dependent" and "ATP-independent"!
Hope this helps.
Ben