Predicting Protein Secondary Structure Using Stochastic Tree Grammars

作者:Naoki Abe, Hiroshi Mamitsuka

摘要

We propose a new method for predicting protein secondary structure of a given amino acid sequence, based on a training algorithm for the probability parameters of a stochastic tree grammar. In particular, we concentrate on the problem of predicting β-sheet regions, which has previously been considered difficult because of the unbounded dependencies exhibited by sequences corresponding to β-sheets. To cope with this difficulty, we use a new family of stochastic tree grammars, which we call Stochastic Ranked Node Rewriting Grammars, which are powerful enough to capture the type of dependencies exhibited by the sequences of β-sheet regions, such as the ‘parallel’ and ‘anti-parallel’ dependencies and their combinations. The training algorithm we use is an extension of the ‘inside-outside’ algorithm for stochastic context-free grammars, but with a number of significant modifications. We applied our method on real data obtained from the HSSP database (Homology-derived Secondary Structure of Proteins Ver 1.0) and the results were encouraging: Our method was able to predict roughly 75 percent of the β-strands correctly in a systematic evaluation experiment, in which the test sequences not only have less than 25 percent identity to the training sequences, but are totally unrelated to them. This figure compares favorably to the predictive accuracy of the state-of-the-art prediction methods in the field, even though our experiment was on a restricted type of β-sheet structures and the test was done on a relatively small data size. We also stress that our method can predict the structure as well as the location of β-sheet regions, which was not possible by conventional methods for secondary structure prediction. Extended abstracts of parts of the work presented in this paper have appeared in (Abe & Mamitsuka, 1994) and (Mamitsuka & Abe, 1994).

论文关键词:Stochastic tree grammars, protein secondary structure prediction, beta-sheets, maximum likelihood estimation, minimum description length principle, unsupervised learning

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1007477814995