Exact memory–constrained UPGMA for large scale speaker clustering

作者:

Highlights:

• We focus on exact hierarchical clustering of large sets of utterances.

• Hierarchical clustering is challenging due to memory constraints.

• We propose an efficient, exact and parallel implementation of UPGMA clustering.

• We extend the Clustering Features concept to speaker recognition scoring functions.

• We assess the efficiency of our method on datasets including 4 million utterances.

摘要

•We focus on exact hierarchical clustering of large sets of utterances.•Hierarchical clustering is challenging due to memory constraints.•We propose an efficient, exact and parallel implementation of UPGMA clustering.•We extend the Clustering Features concept to speaker recognition scoring functions.•We assess the efficiency of our method on datasets including 4 million utterances.

论文关键词:Clustering,UPGMA,Similarity measures,Reciprocal Nearest Neighbor,PLDA,PSVM,Silhouette,Cluster quality measures

论文评审过程:Received 12 February 2019, Revised 7 April 2019, Accepted 24 June 2019, Available online 25 June 2019, Version of Record 28 June 2019.

论文官网地址:https://doi.org/10.1016/j.patcog.2019.06.018