Grounding language in perception

作者：Jeffrey Mark Siskind

摘要

This paper describes an implemented computer program that recognizes the occurrence of simple spatial motion events in simulated video input. The program receives an animated line-drawing as input and produces as output a semantic representation of the events occurring in that animation. This paper suggests that the notions ofsupport, contact, andattachment are crucial to specifying many simple spatial motion event types and presents a logical notation for describing classes of events that incorporates such notions as primitives. It then suggests that the truth values of such primitives can be recovered from perceptual input by a process of counterfactual simulation, predicting the effect of hypothetical changes to the world on the immediate future. Finally, it suggests that such counterfactual simulation is performed using knowledge of naive physical constraints such assubstantiality, continuity, gravity, andground plane. This paper describes the algorithms that incorporate these ideas in the program and illustrates the operation of the program on sample input.

论文关键词：visual event perception, lexical semantics, motion analysis

论文评审过程：

论文官网地址：https://doi.org/10.1007/BF00849726