Closed-Loop Adaptation for Robust Tracking
Jialue Fan
Xiaohui Shen
Ying Wu
Northwestern University
European Conference on Computer Vision (ECCV) 2010 [pdf]
Motivation
Model updating is a critical problem in tracking. Inaccurate
extraction of the foreground and background information in model
adaptation would cause the model to drift and degrade the tracking
performance. In this paper, we studied this challenging problem:
how to avoid incorrect information for model updating in order to alleviate model drift?
Solution
The most direct but yet difficult solution to the drift
problem is to obtain accurate boundaries of the target. We approach
such a solution by proposing a novel closed-loop model adaptation
framework based on the combination of matting and tracking (Figure 1). In our
framework, the scribbles for matting are all automatically
generated, which makes matting applicable in a tracking system.
Meanwhile, accurate boundaries of the target can be obtained from
matting results even when the target has large deformation. An
effective model is further constructed and successfully updated
based on such accurate boundaries.
There are mainly three contributions in our work
- We address the automatic scribble generation problem for
matting in the tracking process. A coarse but correct partition of
foreground and background is estimated during tracking, which is then used to
automatically generate suitable scribbles for matting. The
supply of scribbles is non-trivial. A small false scribble
may lead to matting failure, while deficient scribbles could
also impair the performance. In our system, the generation of
scribbles is designed carefully to be correct and sufficient,
which can yield comparable matting results to the methods
based on user input.
- We construct a simple but practical model (Figure 2) for tracking,
which not only captures short-term dynamics and appearances of the
target, but also keeps long-term appearance variations, which allows us to
accurately track the target in a long range under various situations such as
large deformation, out of plane rotation and illumination change, even
when the target reappears after complete occlusion.
- Unlike other methods that tried to alleviate the aftereffects
caused by inaccurate labeling of the foreground, we successfully extract
the accurate boundary of the target and obtain refined tracking
results based on alpha mattes.
Under the guidance of such a boundary,
the short-term features of the model are updated. Moreover,
occlusion is inferred to determine the adaptation of the long-term
model. Benefiting from the matting results, our model adaptation
largely excludes the ambiguity of foreground and background,
thus significantly alleviating the drift problem in tracking.
Besides, object scaling and rotation can also be handled by
obtaining the boundary.
Without the constraints of object shapes and motion continuities, our tracking framework can well
handle several difficult tracking scenarios such as large deformation, fast motion and severe occlusion.
|
|
 |
Figure 1. The framework of closed-loop adaptation for
tracking. |
|
Figure 2. Our model for tracking. (a) Short-term salient points, (b)
discriminative color lists, (c) long-term bags of
patches.
|
|
Note
We list a few notes to highlight some issues in the paper.
- The method is robust to
some fluctuations in salient points tracking.
In our coarse tracking process, the tracking
results of these salient points are not necessary to be very
accurate. It only requires that the points in the object at the previous frame still stay in the
object and the points in the background at the previous frame remain in the background. In our experiments we
found that such requirements are easily satisfied by the tracking
results. (Section 4.1 in the paper)
- Computational cost.
The computational cost of our approach is mostly ascribed
to the matting algorithm. It is related to the amount of the pixels
with uncertain alpha values before matting, which is generally
dependent on the object size. In our method, much more scribbles are
provided compared with user input, which makes matting faster.
For example, our method can averagely process one frame per second in
"Tom and Jerry" sequence without code optimization in our Matlab
implementation, where the object size is around 150x150. This
implies a great potential of a real-time implementation in C++. As a
fast matting technique [4] has been proposed recently,
the computational complexity is no longer a critical issue in our
algorithm.
- A fundamental guideline.
The proposed framework can be considered
as a fundamental guideline on the combination of matting and
tracking. In such a framework, each component in the closed loop can
be further explored to improve the tracking performance.
Experimental Results
We compared our method with Collins'
method [1], in which they perform feature
selection to discriminate the foreground and background. In the
"Tom and Jerry" sequence, our approach can accurately obtain the
boundary of Jerry, especially when he is holding a spoon or carrying
a gun, while Collins' method drifted in the very beginning due to
the fast motion.
We also compared our method with video
matting [2]. To make their method work in the
tracking scenario (i.e. automatic processing), all the user
input except for the first frame is removed. We both use the
closed-form matting method [3] for fair comparison.
As we can see in the "Book" sequence, in video matting the estimation
of optical flow is not accurate at motion discontinuities and in
homogeneous regions, therefore their cutout result is not
satisfying. Furthermore, they cannot handle occlusion. By contrast,
our method can always adaptively keep the boundary of the book. In
this sequence, blue squares means that this bag is not occluded and
will be updated, while purple squares means that this bag is
currently under occlusion.
Click images to play the video. If video does not play, please install DivX video codec.
 |
 |
 |
| Tom and Jerry (Collins' method [1]). |
Tom and Jerry (our method). We also show matting results by our method. |
Book. we show both results together for comparison. The left part is our result. The right part is the video matting result [2]. |
References
[1] R. Collins, Y. Liu, and M. Leordeanu. On-line selection of
discriminative tracking features. IEEE Trans. on PAMI., 2005.
[2]
Y. Chuang, A. Agarwala, B. Curless, D. Salesin, and R. Szeliski.
Video matting of complex scenes. In SIGGRAPH, 2002.
[3]
A. Levin, D. Lischinski, and Y. Weiss. A closed-form solution to
natural image matting. IEEE trans. on PAMI, page 228-242, 2008.
[4]
K. He, J. Sun, and X. Tang. Fast matting using large kernel matting
laplacian matrices. In CVPR, 2010.
Updated 7/2010. Copyright ©
2010 Jialue Fan, Xiaohui Shen, and Ying Wu