Conceptual description of visual scenes from linguistic models

摘要

As model-based vision moves towards handling imprecise descriptions like a long bench is in front of the tree, it has to confront questions involving widely variable shapes in unclear positions. Such descriptions may be said to be “conceptual” in the sense that they provide a loose set of constraints permitting a range of instantiations for the scene. One of the validations of a computational system's ability to handle such descriptions is provided by immediate visualization, which tells the user whether the bench is of the right shape and has been positioned correctly. Such a visualization must handle impreciseness in Shape and Spatial Pose, and, for dynamic vision, Object Articulation and Motion Parameters as well. The visualization task is a concretization which consists of generating an “instance” of the scene/action being described.The principal requirement for concretizing the conceptual model is a large visual database of objects and actions, along with a set of constraints corresponding to default dependencies in the domain. In our work, the resulting set of constraints is combined using multi-dimensional fuzzy functions called continuum fields (potentials). A set of experiments was conducted to determine the parameters of these continuum fields. An instance is generated by identifying minima in the continuum fields involved is generated by identifying minima in the continuum fields involved in generating the shape, position and motion. These are then used to create default instantiations of the objects described. The resulting image/animation may be considered to be the “most likely” visualization, and if this matches the linguistic description, the continuum fields selected are a good model for the conceptual content in the linguistic model of the scene. We present examples of scene reconstruction from conceptual descriptions of urban parks.