Closed-Loop Adaptation for Robust Tracking

Jialue Fan    Xiaohui Shen    Ying Wu

Northwestern University

European Conference on Computer Vision (ECCV) 2010 [pdf]


Model updating is a critical problem in tracking. Inaccurate extraction of the foreground and background information in model adaptation would cause the model to drift and degrade the tracking performance. In this paper, we studied this challenging problem: how to avoid incorrect information for model updating in order to alleviate model drift?


The most direct but yet difficult solution to the drift problem is to obtain accurate boundaries of the target. We approach such a solution by proposing a novel closed-loop model adaptation framework based on the combination of matting and tracking (Figure 1). In our framework, the scribbles for matting are all automatically generated, which makes matting applicable in a tracking system. Meanwhile, accurate boundaries of the target can be obtained from matting results even when the target has large deformation. An effective model is further constructed and successfully updated based on such accurate boundaries.

There are mainly three contributions in our work

Without the constraints of object shapes and motion continuities, our tracking framework can well handle several difficult tracking scenarios such as large deformation, fast motion and severe occlusion.


Figure 1. The framework of closed-loop adaptation for tracking.


Figure 2. Our model for tracking. (a) Short-term salient points, (b) discriminative color lists, (c) long-term bags of patches.


We list a few notes to highlight some issues in the paper.

Experimental Results

We compared our method with Collins' method [1], in which they perform feature selection to discriminate the foreground and background. In the "Tom and Jerry" sequence, our approach can accurately obtain the boundary of Jerry, especially when he is holding a spoon or carrying a gun, while Collins' method drifted in the very beginning due to the fast motion.

We also compared our method with video matting [2]. To make their method work in the tracking scenario (i.e. automatic processing), all the user input except for the first frame is removed. We both use the closed-form matting method [3] for fair comparison. As we can see in the "Book" sequence, in video matting the estimation of optical flow is not accurate at motion discontinuities and in homogeneous regions, therefore their cutout result is not satisfying. Furthermore, they cannot handle occlusion. By contrast, our method can always adaptively keep the boundary of the book. In this sequence, blue squares means that this bag is not occluded and will be updated, while purple squares means that this bag is currently under occlusion.

Click images to play the video. If video does not play, please install DivX video codec.

Tom and Jerry (Collins' method [1]). Tom and Jerry (our method). We also show matting results by our method. Book. we show both results together for comparison. The left part is our result. The right part is the video matting result [2].


[1] R. Collins, Y. Liu, and M. Leordeanu. On-line selection of discriminative tracking features. IEEE Trans. on PAMI., 2005.

[2] Y. Chuang, A. Agarwala, B. Curless, D. Salesin, and R. Szeliski. Video matting of complex scenes. In SIGGRAPH, 2002.

[3] A. Levin, D. Lischinski, and Y. Weiss. A closed-form solution to natural image matting. IEEE trans. on PAMI, page 228-242, 2008.

[4] K. He, J. Sun, and X. Tang. Fast matting using large kernel matting laplacian matrices. In CVPR, 2010.

Return Home

Updated 7/2010. Copyright © 2010 Jialue Fan, Xiaohui Shen, and Ying Wu