Multimodal Interactions with Dynamically Autonomous Robots
- Directorates & Divisions
- Nanoscience Institute
- Laboratory for Autonomous Systems Research
- NRL Review
- 2011 NRL Review
- 2010 NRL Review
- 2009 NRL Review
- 2008 NRL Review
- 2007 NRL Review
- 2006 NRL Review
- 2005 NRL Review
- 2004 NRL Review
- 2003 NRL Review
- Featured Research
- Atmospheric Science
- Chemical/Biochemical Research
- Electronics and Electromagnetics
- Energetic Particles, Plasmas & Beams
- Information Technology
- Materials Science and Technology
- Ocean Science and Technology
- Optical Sciences
- Remote Sensing
- Simulation, Computing, & Modeling
- Space Research
- 2002 NRL Review
- Future Naval Capabilities
- NRL Research Library
- Program Sponsors
- Accept the Challenge
- About NRL
- Doing Business
- Public Affairs & Media
- Field Sites
- Visitor Info
- Contact NRL
Information Technology Division
2University of Missouri-Columbia
Introduction: Intelligent interaction between humans and robots requires that they interact naturally, intelligently, and cooperatively to accomplish goals. This interaction is dependent on their roles. In other words, a sense of teamwork needs to be built in to the interface. This kind of interaction and the ability to act as cooperative or independent agents is known as dynamic autonomy.
Natural interactions, such as natural language and gestures, facilitate dynamic autonomy. They affect easy communication, allowing the participants to concentrate on the task and not on the ways to communicate. Awareness of the environment is also important. Thus, our interface incorporates both spoken utterances, natural gestures, and a cognitive model of spatial relations to indicate such elements as the location of objects and other spatial information about the environment. Given this model of the environment, humans and robots have a common ground for interacting with each other and the environment.
As robots and autonomous vehicles become more prevalent, human-robot interaction is becoming increasingly important. Current state of the art requires many humans to control a single, seemingly autonomous vehicle. For example, Global Hawk, a high-altitude, long-endurance unmanned air vehicle, currently requires a team of 10 operators to control it, while Predator, a medium-altitude, long-endurance unmanned aerial vehicle, requires three. Future autonomous systems must work closely and cooperatively with humans, sometimes exhibiting full autonomy while at other times collaborating with varying numbers of humans in close proximity. To facilitate collaboration and cooperation in such systems, we have designed a multimodal interface2 that incorporates both natural language and gestures, touch screen modalities, and a cognitive model of spatial relations (Fig. 1).
Robot Platforms: We are using several robots—Nomad 200s, a B21r, and several ATRV-Jrs (Fig. 1). They are equipped with range sensors (sonars, structured light or LIDAR rangefinders, etc.) to enable environment mapping, crude object detection, and gesture detection. The robots are also equipped with a wireless microphone for speech input and an optional camera to provide the user with a real-time video of the environment.
Multimodal Interface: When using the interface, human users need not conform to predetermined methods of interaction to complete a task. Speaking a command and gesturing may seem appropriate and natural at times (Fig. 2). Or, the human user can use graphical modes, such as a hand-held personal digital assistant (PDA) (shown in Fig. 1) or an end user terminal (EUT) (Fig. 3). Menu buttons on the PDA and EUT (top right-hand screen in Fig. 3) replace spoken commands and queries. An EUT satellite image (bottom right-hand screen in Fig. 3) provides an aerial view of the robot's environment. The lower left-hand screen (Fig. 3) shows a live robot-eye-view of the immediate environment, and a mapped representation of the latter is on both the PDA and EUT (middle left-hand screen in Fig. 3). A text window (upper left-hand screen in Fig. 3) displays the human-robot dialog. Users can combine any of the various modalities to interact with the robot, e.g., speaking and clicking on a location on the robot's map.
Commands or queries are linguistically parsed,4 and the resulting representation is correlated with gesture data, knowledge of other participating agents, and with spatial information from the robot sensors. The result is then mapped to a robot command, which produces either the requested action or invokes a further interchange of information. Thus, humans and robots become cooperative and collaborative agents in completing a task.
The spatial reasoning component3 clusters the sonar data to define discreet objects. Objects can be named for easy reference, and spatial information, such as left of and behind, is derived, which can then be used for further interactions.
Finally, human-robot interaction is facilitated by shared cognitive models of behavior. Humans communicate, cooperate, and collaborate because they share these models. Using ACT-R, a cognitive architecture for simulating and understanding human cognition and behavior,1 the robots can reason about spatial relations and objects, and behave in ways analogous to humans. With a similar model of behavior, humans and robots can interact and communicate more effectively and efficiently.
Thus, the robot can understand complex navigational commands, such as "Go between the two buildings on your left and hide on the northwest corner behind the storage container." Not only must the robot understand what the various objects in these utterances are, but it must also be able to identify significant locations on or near those objects. With this information, it can then perform an action, such as hiding, which involves a complex set of heuristics.
Conclusions: We are concentrating on two research areas to facilitate cooperation and collaboration in human-robot interaction. The first area is the design and implementation of a multimodal interface. By providing a natural or intuitive multimodal interface, users can concentrate on the task, not on the modes of interaction. The second research area is the use of computational models of human cognition to facilitate spatial reasoning in robots that share information about the environment, objects, and locations with humans and with each other. By incorporating human cognitive models, we enable collaborative and cooperative interactions that enhance dynamic autonomy in robots.
[Sponsored by ONR and DARPA]