Unsupervised traffic classification using flow statistical properties and IP packet payload
作者:
Highlights:
•
摘要
In network traffic classification, “unknown applications” is a difficult problem unsolved. Conventional supervised classification methods classify any traffic flow into predefined classes, while cannot handle unknown applications without corresponding supervised data. Some unsupervised clustering algorithms, such as k-means, have been applied to group traffic flows automatically, but a large number of resulting clusters are unable to correctly represent a small number of real applications. To address the problem of unknown applications, we propose a novel unsupervised approach which has the capability to discover application-based traffic classes and classify traffic flows according to their generation applications. In the proposed approach, flow statistical properties and IP packet payload are used in combination to discover traffic classes in the training stage. We introduce a bag-of-words (BoW) model to represent the content of clusters constructed by using flow statistical features, and apply the latent semantic analysis (LSA) to aggregate similar traffic clusters based on their payload content. In the testing stage, only flow statistical features are used to classify traffic flows, that can protect user privacy and deal with known encrypted applications without inspecting IP packets. A number of experiments are carried out on a real-world traffic dataset to demonstrate the effectiveness and robustness of the proposed approach.
论文关键词:Traffic classification,Network security,Unsupervised learning,Cluster aggregation
论文评审过程:Received 17 February 2012, Revised 18 September 2012, Accepted 8 November 2012, Available online 13 December 2012.
论文官网地址:https://doi.org/10.1016/j.jcss.2012.11.004