Learning, detection and representation of multi-agent events in videos
作者:
Highlights:
•
摘要
In this paper, we model multi-agent events in terms of a temporally varying sequence of sub-events, and propose a novel approach for learning, detecting and representing events in videos. The proposed approach has three main steps. First, in order to learn the event structure from training videos, we automatically encode the sub-event dependency graph, which is the learnt event model that depicts the conditional dependency between sub-events. Second, we pose the problem of event detection in novel videos as clustering the maximally correlated sub-events using normalized cuts. The principal assumption made in this work is that the events are composed of a highly correlated chain of sub-events that have high weights (association) within the cluster and relatively low weights (disassociation) between the clusters. The event detection does not require prior knowledge of the number of agents involved in an event and does not make any assumptions about the length of an event. Third, we recognize the fact that any abstract event model should extend to representations related to human understanding of events. Therefore, we propose an extension of CASE representation of natural languages that allows a plausible means of interface between users and the computer. We show results of learning, detection, and representation of events for videos in the meeting, surveillance, and railroad monitoring domains.
论文关键词:Event learning,Event detection,Temporal logic,Edge weighted directed hypergraph,Normalized cut,Event representation,P-CASE
论文评审过程:Received 5 July 2006, Revised 2 April 2007, Accepted 9 April 2007, Available online 14 April 2007.
论文官网地址:https://doi.org/10.1016/j.artint.2007.04.002