Text detection, localization, and tracking in compressed video

作者：

Highlights：

•

摘要

Video text information plays an important role in semantic-based video analysis, indexing and retrieval. Video texts are closely related to the content of a video. Usually, the fundamental steps of text-based video analysis, browsing and retrieval consist of video text detection, localization, tracking, segmentation and recognition. Video sequences are commonly stored in compressed formats where MPEG coding techniques are often adopted. In this paper, a unified framework for text detection, localization, and tracking in compressed videos using the discrete cosines transform (DCT) coefficients is proposed. A coarse to fine text detection method is used to find text blocks in terms of the block DCT texture intensity information. The DCT texture intensity of an 8×8 block of an intra-frame is approximately represented by seven AC coefficients. The candidate text block regions are further verified and refined. The text block region localization and tracking are carried out by virtue of the horizontal and vertical block texture intensity projection profiles. The appearing and disappearing frames of each text line are determined by the text tracking. The final experimental results show the effectiveness of the proposed methods.

论文关键词：Text detection,DCT coefficient,Text tracking,Compressed video,Text line,Text localization,MPEG

论文评审过程：Received 21 July 2006, Revised 3 June 2007, Accepted 11 June 2007, Available online 22 June 2007.

论文官网地址：https://doi.org/10.1016/j.image.2007.06.005