The use of attention and spatial information for rapid facial recognition in video

作者:

Highlights:

摘要

Bottom-up visual attention allows primates to quickly select regions of an image that contain salient objects. In artificial systems, restricting the task of object recognition to these regions allows faster recognition and unsupervised learning of multiple objects in cluttered scenes. A problem with this approach is that objects superficially dissimilar to the target are given the same consideration in recognition as similar objects. In video, objects recognized in previous frames at locations distant to the current fixation point are given the same consideration in recognition as objects previously recognized in locations closer to the current target of attention. Due to the continuity of smooth motion, objects recently recognized in previous frames at locations close to the current focus of attention have a high probability of matching the current target. Here we investigate rapid pruning of the facial recognition search space using the already-computed low-level features that guide attention and spatial information derived from previous video frames. For each video frame, Itti & Koch's bottom-up visual attention algorithm is used to select salient locations based on low-level features such as contrast, orientation, color, intensity, flicker and motion. This algorithm has shown to be highly effective in selecting faces as salient objects. Lowe's SIFT object recognition algorithm then extracts a signature of the attended object, for comparison with the facial database. The database search is prioritized for faces which better match the low-level features used to guide attention to the current candidate for recognition or those that were previously recognized near the current candidate's location. The SIFT signatures of the prioritized faces are then checked against the attended candidate for a match. By comparing performance of Lowe's recognition algorithm and Itti & Koch's bottom-up attention model with or without search space pruning we demonstrate that our pruning approach improves the speed of facial recognition in video footage.

论文关键词:Visual attention,Bottom-up,Face recognition,Video processing

论文评审过程:Received 16 December 2004, Revised 29 August 2005, Accepted 15 September 2005, Available online 11 November 2005.

论文官网地址:https://doi.org/10.1016/j.imavis.2005.09.008