Alohomora: Motion-Based Hotword Detection in Head-Mounted Displays
PubDate: October 2019
Teams: Northwestern Polytechnical University;Alibaba Group
Writers: Jiaxi Gu; Zhiwen Yu; Kele Shen
With the development of multimedia and computer graphics technologies, virtual reality (VR) is attracting more and more attention from both the academic communities and industrial companies. A head-mounted display (HMD) is the core equipment of VR. It envelops the entire sight of the wearer and reacts to some specific actions, mainly the head movement. Different from common video watching or game playing, VR poses the strict requirement of immersion so interaction methods need to be carefully designed. The hotword-based interaction as a typical hands-free method is very suitable for VR scenarios. However, the traditional hotword detection methods use a microphone to permit audio signal analysis. They not only incur significant recording overheads but are also susceptible to the surrounding noises. Instead of using the audio signals, we propose a motion-based hotword detection method called Alohomora. A multivariate time series (MTS) classification is formulated for processing the sensor data from multiple dimensions and types of motion sensors. We use a word extraction method for extracting and selecting patterns from MTS of motion data. Then, a classification model is trained using those discriminative patterns and finally the hotword can be detected in time. Alohomora is purely based on the motion sensors in HMDs without using any extra components such as microphone. As head tracking is always necessary in VR applications themselves, the overhead of Alohomora is nearly negligible. Finally, through extensive experiments, the final detection accuracy of Alohomora can exceed 90%.