Addressing the train–test gap on traffic classification combined subflow model with ensemble learning

作者:

Highlights:

摘要

Previous machine learning-based network traffic classification approaches hold the assumption that training and testing network environment are of the same. This assumption is invalid in most real cases due to the changes in traffic features and leads to the train–test gap issue: the model trained in the training environment performs poorly in the testing environment. In this paper, to address the gap, we propose CSA: a traffic classification approach based on packet-wise segmentation and aggregation. Firstly, we observe that some specific fragments of network flows – subflows – are robust against the gap. Therefore, we are motivated to segment the traffic flows into different subflows. Afterward, with the justification of our feature selection, 26 statistical features are extracted from each subflow and input into its corresponding sub-classifier. Secondly, with the results from sub-classifiers, we develop an aggregation method based on their classification accuracy to increase the overall classification performance. We experiment on five real datasets, including three collected from the Northwest Center of CERNET (China Education and Research Network) and two from public traces. By comparing with state-of-the-art baselines, the experiment results demonstrate the effectiveness of our CSA against the gap.

论文关键词:Network traffic classification,train–test gap,Subflow,Ensemble learning

论文评审过程:Received 27 February 2020, Revised 24 June 2020, Accepted 26 June 2020, Available online 1 July 2020, Version of Record 2 July 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.106192