Predicting essential proteins based on RNA-Seq, subcellular localization and GO annotation datasets

作者:

Highlights:

摘要

Essential proteins are indispensable to cellular life. Identification of essential proteins plays a critical role in the survival and development of life process and understanding the function of cell machinery. The experimental methods are usually costly and time-consuming, in order to overcome these limitations, many computational methods have been proposed to discover essential proteins based on the topological features of PPI networks and other biological information. In this paper, a novel method named RSG is proposed to predict essential proteins based on RNA-Seq, subcellular localization and GO annotation datasets. First, the experiments show that the RNA-Seq data is more advantageous than traditional gene expression data in predicting essential proteins, meanwhile, the protein essentiality is closely related to the subcellular localization information and protein GO terms through data analysis. A new weighted PPI network is constructed, which integrates the GO terms information with Pearson correlation coefficient of RNA-Seq data. Then, the weighted edge clustering coefficient is developed to measure the connectivity of protein nodes. RSG determines the essentiality based on not only the subcellular localization information but also the co-expressed level and functional similarity characterized by RNA-Seq and GO annotation data. The experimental results on two species (Saccharomyces cerevisiae and Drosophila melanogaster), the performance of RSG was compared with other centrality methods, the results show that RSG has a better performance in predicting essential proteins.

论文关键词:Essential proteins,RNA-Seq data,Subcellular localization,GO annotation data,Weighted PPI network

论文评审过程:Received 11 July 2017, Revised 25 February 2018, Accepted 20 March 2018, Available online 21 March 2018, Version of Record 11 May 2018.

论文官网地址:https://doi.org/10.1016/j.knosys.2018.03.027