ChronoClust: Density-based clustering and cluster tracking in high-dimensional time-series data

作者:

Highlights:

摘要

In many scientific disciplines, the advent of new high-throughput technologies is giving rise to vast quantities of high-dimensional time-series data. A common requirement is to identify clusters of data-points with similar characteristics in this experimental data, and track their development over time. In this article we present ChronoClust, a novel density-based clustering algorithm for processing a time-series of discrete datasets, generating arbitrarily shaped clusters, and explicitly tracking their temporal evolution. We provide a conceptualisation of ChronoClust’s parameters, and guidelines for selecting their values. The development of ChronoClust was motivated by the need to characterise the immune response to disease. As such, we demonstrate and evaluate ChronoClust’s operation on two immune-related datasets: (1) a synthetic dataset exhibiting the temporal evolution qualities of the immune response as they would be observed through mass cytometry, a cutting edge high-throughput technology, and (2) a Flow cytometry dataset capturing the immune response in West Nile Virus (WNV)-infected mice. Our comprehensive qualitative and quantitative analyses confirm ChronoClust’s suitability for this type of problem: the temporal relationships engineered into the synthetic dataset are successfully recovered, and the cell populations and dynamics unveiled in the WNV dataset match those identified through a domain expert. ChronoClust is applicable beyond Immunology, and we provide an open source Python implementation to support its adoption more widely. We additionally make our two datasets publicly available to promote reproducible research and third-party work on temporal clustering and cluster tracking.

论文关键词:Density based clustering,Data mining,Temporal cluster tracking,Cytometry,Immunology,West Nile virus,Bioinformatics,Exploratory data analysis

论文评审过程:Received 9 October 2018, Revised 17 January 2019, Accepted 15 February 2019, Available online 20 February 2019, Version of Record 18 April 2019.

论文官网地址:https://doi.org/10.1016/j.knosys.2019.02.018