Generating insights through data preparation, visualization, and analysis: Framework for combining clustering and data visualization techniques for low-cardinality sequential data

作者:

Highlights:

• a framework is introduced for identifying patterns on the types of sequential data that are broadly present in real-world

• the purpose and details of the framework are described in a formal way, and also via a series of illustrative examples

• the framework is tested and validated through a comprehensive example using a large formally-maintained data set

• the broad applicability of the introduced network is discussed

摘要

In this paper, we introduce a novel approach for identifying and testing relationships and patterns on the types of sequential data that are broadly present in a number of different real-world scenarios and environments. The proposed two-phase framework combines data preparation, data visualization and clustering techniques in an innovative way. The first phase of the framework explores the large amount of sequential data in stages that can be undertaken iteratively. Those stages include data preparation, counting and value-based ordering, distribution visualization, and subsequence length determination, confirmation and re-visualization. The second phase of the framework explores sequence differences, based on motifs, between data cohorts that are created using descriptive attributes, and visualizes the changes over time and different attribute values. To illustrate the analytical power of the proposed framework, we present a comprehensive example that applies the framework on a large formally-maintained research data set collected and managed by the US Census Bureau. The framework, and the presented example, utilize visualization as an analytics tool and not just a presentation accessory.

论文关键词:Data visualization,Sequential data,Low-cardinality data,Data preparation,Clustering,Motifs

论文评审过程:Received 27 March 2019, Revised 2 July 2019, Accepted 1 August 2019, Available online 8 August 2019, Version of Record 31 August 2019.

论文官网地址:https://doi.org/10.1016/j.dss.2019.113119