Pattern recognition methods for advanced stochastic protein sequence analysis using HMMs

作者:

Highlights:

摘要

Currently, Profile Hidden Markov Models (Profile HMMs) are the methodology of choice for probabilistic protein family modeling. Unfortunately, despite substantial progress the general problem of remote homology analysis is still far from being solved. In this article we propose new approaches for robust protein family modeling by consequently exploiting general pattern recognition techniques. A new feature based representation of amino acid sequences serves as the basis for semi-continuous protein family HMMs. Due to this paradigm shift in processing biological sequences the complexity of family models can be reduced substantially resulting in less parameters which need to be trained. This is especially favorable when only little training data is available as in most current tasks of molecular biology research. In various experiments we prove the superior performance of advanced stochastic protein family modeling for remote homology analysis which is especially relevant for e.g. drug discovery applications.

论文关键词:Protein sequence analysis,Probabilistic protein family modeling,HMM

论文评审过程:Received 5 July 2005, Revised 13 September 2005, Available online 22 November 2005.

论文官网地址:https://doi.org/10.1016/j.patcog.2005.10.007