Enhancing learning algorithms to support data with short sequence features by automated feature discovery

作者：

Highlights：

•

摘要

In this paper, we propose a VECtor DIScovery approach, called VECDIS, which enhances the learning performance of existing classifiers directly from various data types and is able to discover features made of multiple feature types for explanatory purposes. The data types could be combinations of multivariate, short time-series or short sequential data. The features in the dataset could have single item or/and a list of ordered items of different sizes. The present approach allows handling raw vector data without prior manipulation (i.e., preprocessing). The discovered features are made of vector and non-vector mathematical relations. The algorithm generates new vector features and mathematical expression features that are transmitted or exchanged with previously generated features, to the next iterative step. The approach is able to search and automatically discover thousands of different features (sequence manipulation), performed on the sequence features. We performed large number of experiments with various synthetic and simulated datasets and with a wide range of classifiers. The results show that VECDIS enhanced significantly the classification performance of existing classifiers to handle datasets having multiple feature types with short sequence features. Nevertheless, there is no guarantee that the mathematical library as presented in this paper is suitable to all sequence datasets and would lead to discovering a valuable feature set. Therefore, VECDIS enables expanding or exchanging the mathematical library as desire.

论文关键词：Feature discovery,Preprocessing,Sequential data,Feature construction,Feature selection,Short sequence

论文评审过程：Received 11 February 2013, Revised 21 July 2013, Accepted 21 July 2013, Available online 31 July 2013.

论文官网地址：https://doi.org/10.1016/j.knosys.2013.07.013