Extract Accurate 3D Human Skeleton from Video
PubDate: October 2020
Teams: Beihang University
Writers: Tao Hu; Wenming Meng; Shuai Li
Extracting 3D skeletons from video faces more problems than extracting 3D skeletons from images. For example, there will be more motion blur and occlusion in the video. But video also has its own unique properties and there is a strong similarity between frames in the video. In this article, we focus on maximizing the utilization of temporal information in the video to extract the accurate 3D human skeleton. Our system is divided into three parts. The first part utilizes unsupervised learning method to split the video into a serious of sub-video according to the content. Human pose has high similarity in this sub-video. The second part is to detect the 2D skeleton in this divided sub-video. In order to establish the connection between the frames and ensure the efficiency, we adopt the convLSTM model  in this module. The last part is to map the 2D skeleton sequence detected in the previous step into 3D space, the input is 2D joint point sequence, and the output is the corresponding 3D joint point sequence. In this module, we choose one-dimensional convolution model. This model can build the relationship between frames in the nearby areas of each frame.