FER-former: Multi-modal Transformer for Facial Expression Recognition

编辑：映维 | 分类：CV / XR | 2023年3月29日

Note: We don't have the ability to review paper

PubDate: Mar 2023

Teams: Lanzhou University；Zhejiang Sci-Tech University

Writers: Yande Li, Mingjie Wang, Minglun Gong, Yonggang Lu, Li Liu

PDF: FER-former: Multi-modal Transformer for Facial Expression Recognition

FER-former: Multi-modal Transformer for Facial Expression Recognition

Abstract

The ever-increasing demands for intuitive interactions in Virtual Reality has triggered a boom in the realm of Facial Expression Recognition (FER). To address the limitations in existing approaches (e.g., narrow receptive fields and homogenous supervisory signals) and further cement the capacity of FER tools, a novel multifarious supervision-steering Transformer for FER in the wild is proposed in this paper. Referred as FER-former, our approach features multi-granularity embedding integration, hybrid self-attention scheme, and heterogeneous domain-steering supervision. In specific, to dig deep into the merits of the combination of features provided by prevailing CNNs and Transformers, a hybrid stem is designed to cascade two types of learning paradigms simultaneously. Wherein, a FER-specific transformer mechanism is devised to characterize conventional hard one-hot label-focusing and CLIP-based text-oriented tokens in parallel for final classification. To ease the issue of annotation ambiguity, a heterogeneous domains-steering supervision module is proposed to make image features also have text-space semantic correlations by supervising the similarity between image features and text features. On top of the collaboration of multifarious token heads, diverse global receptive fields with multi-modal semantic cues are captured, thereby delivering superb learning capability. Extensive experiments on popular benchmarks demonstrate the superiority of the proposed FER-former over the existing state-of-the-arts.

本文链接：https://paper.nweon.com/14216

FER-former: Multi-modal Transformer for Facial Expression Recognition

您可能还喜欢...

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘

FER-former: Multi-modal Transformer for Facial Expression Recognition

您可能还喜欢...

eyemR-Talk: Using Speech to Visualise Shared MR Gaze Cues

Pedagogical Strategies for Classroom-based Mixed Reality (MR) Technology Curriculum

Holographic Optics for Thin and Lightweight Virtual Reality

最新AR/VR行业分享

最新AR/VR专利

最新AR/VR行业招聘