Adaptive watermark generation mechanism based on time series prediction for stream processing

作者:Yang Song, Yunchun Li, Hailong Yang, Jun Xu, Zerong Luan, Wei Li

摘要

The data stream processing framework processes the stream data based on event-time to ensure that the request can be responded to in real-time. In reality, streaming data usually arrives out-of-order due to factors such as network delay. The data stream processing framework commonly adopts the watermark mechanism to address the data disorderedness. Watermark is a special kind of data inserted into the data stream with a timestamp, which helps the framework to decide whether the data received is late and thus be discarded. Traditional watermark generation strategies are periodic; they cannot dynamically adjust the watermark distribution to balance the responsiveness and accuracy. This paper proposes an adaptive watermark generation mechanism based on the time series prediction model to address the above limitation. This mechanism dynamically adjusts the frequency and timing of watermark distribution using the disordered data ratio and other lateness properties of the data stream to improve the system responsiveness while ensuring acceptable result accuracy. We implement the proposed mechanism on top of Flink and evaluate it with real-world datasets. The experiment results show that our mechanism is superior to the existing watermark distribution strategies in terms of both system responsiveness and result accuracy.

论文关键词:data stream processing, watermark, time series based prediction, dynamic adjustment

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11704-020-0206-7