Prediction of DNA-binding proteins by interaction fusion feature representation and selective ensemble

作者:

Highlights:

摘要

DNA-binding proteins play important roles in various cellular processes, and the identification of DNA-binding proteins is important for understanding and interpreting protein function. This manuscript presents algorithms for feature representation based on primary protein sequences and selective ensemble classification. We first propose a multi-source interaction fusion feature representation model that simultaneously considers interactions among physicochemical properties, evolutionary information, and gap distances between residues. We also provide a selective ensemble algorithm based on gap distances that yields differential base classifiers by selecting the feature subspaces. The selective ensemble algorithm improves the generalization ability of the integrated classifiers. We then compare the proposed algorithms with some state-of-the-art methods using multiple datasets. The experimental results show that the proposed algorithms are competitive and effectively identify DNA-binding proteins. The major contributions of the present study are the establishment of a model and algorithm for feature representation that involves interaction efforts and the development of a selective ensemble classification algorithm based on parameter perturbation. The proposed algorithms can also be applied to other biological questions related to amino acid sequences.

论文关键词:Combined fusion,Interaction fusion,Feature representation,Selective ensemble

论文评审过程:Received 7 May 2018, Revised 12 August 2018, Accepted 13 September 2018, Available online 18 September 2018, Version of Record 21 November 2018.

论文官网地址:https://doi.org/10.1016/j.knosys.2018.09.023