3D Scene Retrieval from Text with Semantic Parsing

We look at the task of 3D scene retrieval: given a natural-language description and a set of 3D scenes, identify a scene matching the description. Geometric specifications of 3D scenes are part of the craft of many graphical computing applications, including computer animation, games, and simulators. Large databases of such scenes have become available in recent years as a result of improvements in the ease of use of tools for 3D scene design. A system that can identify a 3D scene from a natural language description is useful for making such databases of scenes readily accessible. Natural language has evolved to be well-suited to describing our (three-dimensional) world, and it provides a convenient way of specifying the space of acceptable scenes: a description of a physical environment encodes logical propositions about the space of environments a speaker is describing. This logical structure of scene descriptions suggests that semantic parsing is an appropriate framework in which to understand the problem of retrieving scenes based on their natural-language descriptions. Semantic parsing (Zelle and Mooney, 1996; Tang and Mooney, 2001; Zettlemoyer and Collins, 2005) is the problem of extracting logical forms from natural language.

