Two-Stream Spatial-Temporal Fusion Graph Convolutional Network for Dynamic Gesture Recognition

Note: We don't have the ability to review paper

PubDate: Aug 2022

Teams:  Inner Mongolia University of Science & Technology

Writers: Ji-kai Zhang; Qi Li; Xiao-qi Lyu; Yong Liang

PDFTwo-Stream Spatial-Temporal Fusion Graph Convolutional Network for Dynamic Gesture Recognition

Abstract

As a compelling field of computer vision, dynamic gesture recognition lays the foundation for interactive interactions of virtual reality (VR) and augmented reality (AR). Compared with other body joints, hand joints feature a smaller range of movement, faster movement speed, and more movement details. It is necessary to further explore the local spatial-temporal information and global dependencies in the process of action execution. On that basis, we propose a two-stream spatial-temporal fusion graph convolutional network, 2s -STFGCN, for dynamic gesture recognition. In order to enrich detailed joint features, the second-order bone information is introduced to the model. The local spatial-temporal information is fused in the one graph to capture the complex spatial-temporal relationship. At the same time, the gated dilated convolution is employed to ensure the correlation of long sequence to be better noticed. Additionally, by simulating actions in the VR interactive applications, we collect and make the dynamic gesture skeleton data set, VR-DHG, based on different grain sizes. Experimental results suggest that the model proposed by us can achieve better recognition effects of the public data set, DHG-14/28. Compared with the DeepGRU algorithm, the recognition rate of our algorithm, when being used to recognize 14 kinds and 28 kinds of gestures, can be improved by 1.21% and 2.64%, respectively. Our algorithm also outperforms in fine-grained gesture recognition. All this provides solid evidence for the effectiveness of our algorithm.

You may also like...

Paper