Exploring Weakly Labeled Images for Video Object Segmentation With Submodular Proposal Selection
PubDate: February 2018
Teams: Beihang University
Writers: Yu Zhang; Xiaowu Chen; Jia Li; Wei Teng; Haokun Song
Video object segmentation (VOS) is important for various computer vision problems, and handling it with minimal human supervision is highly desired for the large-scale applications. To bring down the supervision, existing approaches largely follow a data mining perspective by assuming the availability of multiple videos sharing the same object categories. It, however, would be problematic for the tasks that consume a single video. To address this problem, this paper proposes a novel approach that explores weakly labeled images to solve video object segmentation. Given a video labeled with a target category, images labeled with the same category are collected, from which noisy object exemplars are automatically discovered. After that the proposed approach extracts a set of region proposals on various frames and efficiently matches them with massive noisy exemplars in terms of appearance and spatial context. We then jointly select the best proposals across the video by solving a novel submodular problem that combines region voting and global region matching. Finally, the localization results are leveraged as strong supervision to guide pixel-level segmentation. Extensive experiments are conducted on two challenging public databases: Youtube-Objects and DAVIS. The results suggest that the proposed approach improves over previous weakly supervised/unsupervised approaches significantly, showing a performance even comparable with the several approaches supervised by the costly manual segmentations.