Synthetic sampling from small datasets: A modified mega-trend diffusion approach using k-nearest neighbors

作者:

Highlights:

• Tackle small dataset challenges for both supervised and unsupervised learning tasks using synthetic data generation.

• A nearest neighbor-based megatrend diffusion is proposed in this research.

• The proposed method generates synthetic data for both supervised and unsupervised learning tasks.

• Focuses on retaining the attribute relations similar to the original dataset, reducing the information gap.

• The training data is used only to identify the domain ranges which in turn improves the privacy of any sensitive data.

摘要

•Tackle small dataset challenges for both supervised and unsupervised learning tasks using synthetic data generation.•A nearest neighbor-based megatrend diffusion is proposed in this research.•The proposed method generates synthetic data for both supervised and unsupervised learning tasks.•Focuses on retaining the attribute relations similar to the original dataset, reducing the information gap.•The training data is used only to identify the domain ranges which in turn improves the privacy of any sensitive data.

论文关键词:Small dataset,Mega-trend diffusion,k-nearest neighbor,Artificial sample generation,Correlation difference,Sampling method

论文评审过程:Received 7 May 2021, Revised 21 September 2021, Accepted 2 November 2021, Available online 14 November 2021, Version of Record 29 December 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107687