Video Segmentation by Tracking Many Figure-Ground Segments

Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, and James M. Rehg
Download paper in PDF (927 KB)

Download SPT+CSI Code with SegTrack v2 Dataset (200 MB)
(Latest Update: December 9th, 2013)
Description of the SegTrack v2 Dataset


Segment Pool Tracking + Composite Statistical Inference
Summary of the method

Segment Pool Tracking is the framework we presented for the video segmentation problem. The figure above illustrates the core parts of our approach. First, we generate a pool of segmentation for each frame using the CPMC method. Then, image color features (eg. Color-SIFT) are extracted and appearance models are trained incrementally to track multiple segments in consecutive frames. A main contribution is an efficient least-squares formulation to make simultaneously tracking 1,000 targets almost as efficient as tracking a single target. Since the appearance model for each target is learnt over multiple frames on many segments, it is robust to appearance changes and partial occlusions. Lastly, we use Composite Statistical Inference (CSI) to refine segment tracks by infering on high-order appearance terms while imposing temporal consistency.

During tracking, greedy assignment is applied that serves as non-maximum suppression on segment tracks (see figure below). Tracks that are not consistent in appearance are filtered out automatically. Therefore, although we initialize with more than 1,000 tracks, on average only 60 tracks remain at the end of each sequence, while capturing most of the interesting objects.



Results of SPT and CSI on the SegTrack v2 dataset
Image sequenceSPTSPT+CSIPairwise Appearance Lee et al.Grundmann et al.CPMC Best
(Using Pirsiavash et al.) (Averaged Per-frame)
Mean per object62.765.955.445.351.878.6
Mean per sequence68.071.258.657.350.880.5
Girl89.189.283.487.731.993.5
Birdfall62.062.547.849.057.472.2
Parachute93.293.491.3 96.369.195.5
Cheetah-Deer40.137.3 18.344.518.867.0
Cheetah-Cheetah 41.340.922.211.724.466.6
Monkeydog-Monkey58.871.3 24.174.368.383.0
Monkeydog-Dog17.418.9 16.54.918.844.6
Penguin-#151.451.5 59.312.672.075.8
Penguin-#273.276.579.111.380.790.4
Penguin-#369.675.2 75.611.375.285.4
Penguin-#457.657.847.17.780.667.6
Penguin-#563.466.7 45.84.262.768.1
Penguin-#648.650.2 56.78.575.576.6
Drifting Car-#173.874.8 65.463.755.282.1
Drifting Car-#258.460.6 59.830.127.275.3
Hummingbird-#145.454.435.046.313.770.0
Hummingbird-#265.272.365.8 74.025.282.2
Frog65.872.8 69.0067.187.1
Worm75.682.859.584.434.789.8
Soldier83.083.8 50.766.666.584.3
Monkey84.184.870.979.061.988.3
Bird of Paradise88.294.081.1 92.286.894.7
BMX-Person75.185.474.587.439.286.9
BMX-Bike24.624.930.9 38.632.558.5
Avg. Number of Tracks60.060.0 702.810.6336.61219.3
Citation
@inproceedings{FliICCV2013,
author = {Fuxin Li and Taeyoung Kim and Ahmad Humayun and David Tsai and James M. Rehg},
title = { Video Segmentation by Tracking Many Figure-Ground Segments},
booktitle = {ICCV},
year = {2013} }