Real-time steganalysis for streaming media based on multi-channel convolutional sliding windows

作者：

Highlights：

•

摘要

In recent years, covert communication technologies based on Voice over Internet Protocol (VoIP) have received more and more attention, which meanwhile poses a significant threat to the security of cyberspace. In this paper, we are chiefly concerned with improving the accuracy and efficiency of detection of covert communications, and we propose a real-time VoIP steganalysis model to tackle these issues. Multi-channel convolutional sliding windows (CSW) are developed to analyze the correlations between a given frame and its neighboring frames in a VoIP signal. Within each sliding window, we employ two feature extraction channels to extract correlation features from the input signal. Each channel is constructed of multiple convolutional layers having a large number of convolution kernels. The extracted features are then fed to a forward fully connected network for feature fusion. By analyzing the statistical distribution of these features, the discriminator will determine whether the input speech signal contains covert information or not. We designed several experiments to test the proposed model’s detection performance under various conditions, including different embedding rates, different speech lengths, etc. Experimental results show that the proposed model can efficiently and accurately detect steganographic voice streams, especially in the case of low embedding rates. In addition, further experiments demonstrate that the proposed model can attain nearly real-time detection of VoIP speech signals and achieve state-of-the-art performance.

论文关键词：Voice over IP (VoIP),Real-time steganalysis,Convolutional sliding window,Multi-channel feature extraction

论文评审过程：Received 31 December 2020, Revised 2 August 2021, Accepted 30 September 2021, Available online 7 December 2021, Version of Record 16 December 2021.

论文官网地址：https://doi.org/10.1016/j.knosys.2021.107561