GEARS: Generalizable Multi-Purpose Embeddings for Gaze and Hand Data in VR Interactions
Date:June 2024
Teams:TU Munich;Meta;University of Stuttgart
Writers:Philipp Hallgarten, Naveen Sendhilnathan, Ting Zhang, Ekta Sood, and Tanya R. Jonker
PDF:GEARS: Generalizable Multi-Purpose Embeddings for Gaze and Hand Data in VR Interactions
Abstract
Machine learning models using users’ gaze and hand data to encode user interaction behavior in VR are often tailored to a single task and sensor set, limiting their applicability in settings with constrained compute resources. We propose GEARS, a new paradigm that learns a shared feature extraction mechanism across multiple tasks and sensor sets to encode gaze and hand tracking data of users VR behavior into multi-purpose embeddings. GEARS leverages a contrastive learning framework to learn these embeddings, which we then use to train linear models to predict task labels. We evaluated our paradigm across four VR datasets with eye tracking that comprise different sensor sets and task goals. The performance of GEARS was comparable to results from models trained for a single task with data of a single sensor set. Our research advocates a shift from using sensor set and task specific models towards using one shared feature extraction mechanism to encode users’ interaction behavior in VR.