PRECYSE: Predicting Cybersickness using Transformer for Multimodal Time-Series Sensor Data
Date:May 2024
Teams:Hanyang University
Writers:Dayoung Jeong,Kyungsik Han
PDF:PRECYSE: Predicting Cybersickness using Transformer for Multimodal Time-Series Sensor Data
Abstract
Cybersickness, a factor that hinders user immersion in VR, has been the subject of ongoing attempts to predict it using AI. Previous studies have used CNN and LSTM for prediction models and used attention mechanisms and XAI for data analysis, yet none explored a transformer that can better reflect the spatial and temporal characteristics of the data, beneficial for enhancing prediction and feature importance analysis. In this paper, we propose cybersickness prediction models using multimodal time-series sensor data (i.e., eye movement, head movement, and physiological signals) based on a transformer algorithm, considering sensor data pre-processing and multimodal data fusion methods. We constructed the MSCVR dataset consisting of normalized sensor data, spectrogram formatted sensor data, and cybersickness levels collected from 45 participants through a user study. We proposed two methods for embedding multimodal time-series sensor data into the transformer: modality-specific spatial and temporal transformer encoders for normalized sensor data (MS-STTN) and modality-specific spatial-temporal transformer encoder for spectrogram (MS-STTS). MS-STTN yielded the highest performance in the ablation study and the comparison of the existing models. Furthermore, by analyzing the importance of data features, we determined their relevance to cybersickness over time, especially the salience of eye movement features. Our results and insights derived from multimodal time-series sensor data and the transformer model provide a comprehensive understanding of cybersickness and its association with sensor data. Our MSCVR dataset and code are publicly available: https://github.com/dayoung-jeong/PRECYSE.git.