Exploiting homogeneity in protein sequence clusters for construction of protein family hierarchies

作者:

Highlights:

摘要

In the field of proteomics, protein hierarchies based on sequence analysis have been extensively applied to automate the annotations of new proteins and facilitate the discovery and analysis of protein families. However, the presence of ambiguous similarities in large databases increases the difficulty of delivering protein family hierarchies with favorable sensitivity and specificity. This work develops the HomoClust algorithm that exploits the homogeneity of protein sequences in generating protein family hierarchies. HomoClust improves the clustering quality of traditional hierarchical clustering algorithms by adopting different clustering mechanisms for different levels of sequence similarity. With considering homogeneity detection during clustering process, HomoClust increases the sensitivity of protein clusters without a drop in high specificity.

论文关键词:Protein sequence clustering,Family analysis,Twilight zone,Hierarchical algorithm

论文评审过程:Received 5 July 2005, Revised 8 December 2005, Accepted 12 December 2005, Available online 3 February 2006.

论文官网地址:https://doi.org/10.1016/j.patcog.2005.12.008