Human Action Recognition Based on a Two-stream Convolutional Network Classifier
PubDate: January 2018
Teams: University of Brasília
Writers: Vinicius de Oliveira Silva; Flavio de Barros Vidal; Alexandre Ricardo Soares Romariz
PDF: Human Action Recognition Based on a Two-stream Convolutional Network Classifier
Abstract
Currently, video generation devices are simpler to manipulate, more portable and with lower prices. This allowed easy storage and transmission of large amounts of media, such as videos, which has facilitated the analysis of information, independent of human assistance for evaluation and exhaustive search of videos. Virtual reality, robotics, tele-medicine, humanmachine interface and tele-surveillance are applications for these techniques. This paper describes a method for human action recognition in videos using two convolutional neural networks (CNNs). The first one Spatial Stream (trained with frames of the video) and the second one Temporal Stream, trained with stacks of Dense Optical Flow (DOF). Both streams were trained separately and from both of them we generated a classification histogram based on the most frequent class assignment. For final classification, those histograms were combined to produce a single output. The technique was tested in two public action video datasets: Weizmann and UCF Sports. We achieve 84.44% of accuracy on Weizmann dataset for Spatial stream and 78.46% on UCF Sports dataset. For the Weizmann dataset we obtained 91.11% with networks combination.