Multimedia surrogates for video gisting: Toward combining spoken words and imagery

作者：

Highlights：

•

摘要

Good surrogates that allow people to quickly derive the gist of videos without taking the time to view the full video are crucial to video retrieval and browsing systems. Although there are many kinds of textual and visual surrogates used in video retrieval systems, there are few audio surrogates in practice. To evaluate the effectiveness of audio surrogates alone and in combination with one kind of visual surrogate, fast forwards, a user study with 48 participants was conducted. The study investigated the effects of manually and automatically generated spoken keywords and spoken descriptions, using a text-to-speech synthesizer, on six specific video gisting tasks. Results demonstrate that manually generated spoken descriptions are better than both manually generated spoken keywords and fast forwards for video gisting. Both spoken keywords, whether manually or automatically generated, and fast forwards are better than automatically extracted descriptions. High quality spoken summaries were found very effective for video gisting. Combining fast forwards with either type of spoken text was not significantly better than any of the individual spoken surrogates; however, the visual elements added subjective value to the user experience. Adding spoken descriptions or keywords as surrogates to video retrieval and browsing systems is recommended.

论文关键词：Video retrieval,Surrogates,Video summarization,Video browsing,Human factors

论文评审过程：Received 15 August 2008, Revised 20 April 2009, Accepted 1 May 2009, Available online 9 July 2009.

论文官网地址：https://doi.org/10.1016/j.ipm.2009.05.007