Research in multimedia and multimodal parsing and generation

作者：Mark T. Maybury

摘要

This overview introduces the emerging set of techniques for parsing and generating multiple media (e.g., text, graphics, maps, gestures) using multiple sensory modalities (e.g., auditory, visual, tactile). We first briefly introduce and motivate the value of such techniques. Next we describe various computational methods for parsing input from heterogeneous media and modalities (e.g., natural language, gesture, gaze). We subsequently overview complementary techniques for generating coordinated multimedia and multimodal output. Finally, we discuss systems that have integrated both parsing and generation to enable multimedia dialogue in the context of intelligent interfaces. The article concludes by outlining fundamental problems which require further research.

论文关键词：multimedia interfaces, multimodal interfaces, parsing, generation, intelligent interfaces

论文评审过程：

论文官网地址：https://doi.org/10.1007/BF00849175