Predicting Future Eye Gaze Using Inertial Sensors
PubDate: July 2023
Teams: Seoul National University of Science and Technology
Writers: Ardianto Satriawan; Airlangga Adi Hermawan; Yakub Fahim Luckyarno; Ji-Hoon Yun
PDF: Predicting Future Eye Gaze Using Inertial Sensors
Abstract
Eye tracking is a technology that is in high demand, especially for next-generation virtual reality (VR), because it enables foveated rendering, which significantly reduces computational costs by rendering only the area at which a user is gazing at a high resolution and the rest at a lower resolution. However, the conventional eye-tracking technique requires per-eye camera hardware attached near the eyes within a VR headset. Moreover, the detected eye gaze follows the actual eye gaze with a finite delay because of the camera latency, the need for image processing, and the VR system’s native latency. This paper proposes an eye-tracking solution that predicts a user’s future eye gaze using only the inertial sensors that are already built into VR headsets for head tracking. To this end, we formulate three time-series regression problems to predict (1) the current eye gaze using past head orientation data, (2) the future eye gaze using past head orientation and eye gaze data, and (3) the future eye gaze using past head orientation data only. We solve the first and second problems using machine learning models and develop two solutions for the final problem: two-stage and single-stage approaches. The two-stage approach for the final problem relies on two machine learning models connected in series, one for the first problem and the other for the second problem. The single-stage approach uses a single model to predict the future eye gaze directly from past head orientation data. We evaluate the proposed solutions based on real eye-tracking traces captured from a VR headset for multiple test players, considering various combinations of machine learning models. The experimental results show that the proposed solutions for the final problem reduce the error for a center-fixed gaze by up to 50% and 20% for anticipation times of 50 and 150 ms, respectively.