Sequence graph transform (SGT): a feature embedding function for sequence data mining

作者:Chitta Ranjan, Samaneh Ebrahimi, Kamran Paynabar

摘要

Sequence feature embedding is a challenging task due to the unstructuredness of sequences—arbitrary strings of arbitrary length. Existing methods are efficient in extracting short-term dependencies but typically suffer from computation issues for the long-term. Sequence Graph Transform (SGT), a feature embedding function, that can extract a varying amount of short- to long-term dependencies without increasing the computation is proposed. SGT’s properties are analytically proved for interpretation under normal and uniform distribution assumptions. SGT features yield significantly superior results in sequence clustering and classification with higher accuracy and lower computation as compared to the existing methods, including the state-of-the-art sequence/string Kernels and LSTM.

论文关键词:Classification, Clustering, Feature extraction, Search, Sequence

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10618-021-00813-0