Guidelines for Bimodal Virtual Assistants

Multimodality is a widely studied topic that involves communication modes such as speech, writing, gestures, touching, etc. In the field of Conversational User Interfaces, researchers emphasize both advantages and disadvantages in developing either text-based virtual assistants (writing or visual mode) or voice-based virtual assistants (speech or auditory mode). However, there is an evident gap in Multimodal Virtual Assistants (MVAs) research, since MVAs are usually able to interact in writing and speech modalities, but few recent papers have studied the effects of the combination of several modalities. Moreover, although distinct guidelines for text- and voice-based assistants were established, there are not specific guidelines for the development or the design of Bimodal Virtual Assistants (BVAs), that combine both writing and speech modes. Previous research has examined the effect of modality in conversational assistants in terms of cognitive effort, memory load, satisfaction or enjoyment. We also find well-known guidelines for GUIs that can be applied to text-based assistants as well as specific guidelines In voice research, studies highlight the importance of designing precise guidelines for this mode. In this paper, we review, examine, analyze and synthesize these existing guidelines for the development of Virtual Assistants that are focused just to one mode (either auditory or visual). The main purpose of our study is to provide a unified model with a comprehensive but concise set of guidelines that help raise awareness among designers who want to develop BVAs. In our model, we propose 22 guidelines sorted in four categories concerning key design issues in the development of Virtual Assistants.

keywords: Multimodal Virtual Assistants, Bimodal Virtual Assistants, Conversational User Interfaces, CUI, design guidelines, multimodal interaction