NCARAI ~ Interactive Systems
|NRL / Systems / ITD / NCARAI / interactive / spatial relations||NRL Resources|
Linguistics of Spatial Relations
Humans are naturally able to rapidly learn the names of new, unknown objects through the use of deictic gestures and explicit requests for the name of the novel object. This process is often observed when a new sailor points to a piece of equipment or a ship sailing nearby and asks a fellow crew member to assist in its identification. Similarly, we would like to investigate the use of a combination of deictic gestures and gaze information to label objects automatically in the environment; however, there are some barriers in the way of performing these actions. The first problem is that gaze and deictic gestures offer a very wide cone of possibilities that contain the actual object in question, as the gestures are not exact unless a person is in very close proximity to the target object. Second, we need to determine an underlying representation of the object that will allow the robot’s vision, cognitive, navigational, and spatial systems to be able to reason about the newly named object. Finally, we need to be able for the robot to articulate the position of the object to other objects using 3D spatial prepositions in a similar manner as performed in 2D with previous models of NRL’s robots. Solving these problems will allow for the various systems on the MDS robots to more fully interact with and learn from their human collaborator.
The topic of this research is to extend the capabilities of robots at the Navy Center for Applied Research in Artificial Intelligence in the domain of spatial referencing language. Humans naturally use two particular modalities effortlessly when communicating where objects are located in the environment; namely, deictic gestures and spatial prepositions. It’s quite natural for a person to state, “The book is over there,” and point in the general direction of the object. The goal of the research is to improve the current state of the art by giving the MDS robots the ability to use their vision systems to track deictic gestures and to be able to describe, using spatial prepositions, the location and relationships between objects encountered in the world. Ultimately, this research will allow for a more natural interaction between humans and their robot collaborators and decrease the amount of training a warfighter would need in order to interact with a robot.
We utilize the vision system on the MDS robots to recognize gestures in order to determine where a person is pointing using a combination of 3D motion tracking in concert with a Swiss Ranger sensor. By using either a priori object models or inferring the shape of the object through the data returned from the Swiss Ranger sensor (e.g. if it’s a new coffee cup, then a 3D shape is assumed and mirrored on the side of the cup not currently in view), we place these newly learned objects into a 3D representation of the environment. From there, we leverage previous work in developing 3D spatial preposition language to create useful descriptions of the environment. This language system can generate basic descriptions between objects in the horizontal or vertical planes; however, they need to be further developed to utilize 3D data from the vision systems on the MDS robots. This allows automatic generation of the appropriate spatial description between two objects from a myriad of choices based on egocentric, exocentric, or intrinsic perspectives, and fuse the horizontal and vertical planar descriptions.
Naval Research Laboratory
Washington, DC 20375