Research Statement
Zhu Li, PhD
Principal Staff Research
Engineer
Multimedia Research Lab
(MRL), Motorola Labs
The motivation for my research is to develop mobile
multimedia computing and communication technologies that can serve people’s need
to search, retrieve, communicate, and consume multimedia content any time,
anywhere, in a more intelligent and efficient manner. More specifically, I am
interested in solutions for video content analysis, search and communication in
a networked, mobile environment. In the proposed framework, a better
understanding of the video content will be derived at the signal, object,
syntactic, and semantic level, and intelligent communication, networking, and
content management solutions will be built with techniques and tools from
optimization, game theory, statistics, and machine learning.
The goal of this work is to deliver better
end-to-end quality in video communication and achieve more efficient
communication resource utilization, through distributed optimization in video
content adaptation and communication resource allocation. In the vertical
direction, content awareness can be derived from video sequences to define
richer video quality metrics and utility, which will help intelligent video
coding and communication decisions across application, networking and link
layers. In the horizontal direction, video communication rarely happens
in a single-user and single-hop situation. Conflicts in communication resource
sharing need to be addressed. We are therefore looking at efficient ways to
exploit multi-user diversity in channel states and content utility-resource
tradeoffs. We developed distributed and
collaborative solutions for resource allocation and management, as well as,
video adaptation. Pricing and game theoretical approaches from Economics have
been applied, resulting in a distributed solution with the computation burden
spread among entities in the network.
More specifically, at PHY/MAC layer, collaborative
streaming should incorporate content awareness in video content structure,
rate-distortion characteristics, into utility modeling, and apply pricing/game
theoretical approaches in joint resource allocation and management. This
applies to wireless multiple access channels, e.g. 802.11 access point, base
stations, etc. For broadcast applications, limited APP layer feedback schemes
can be implemented along with digital fountain codes and various scalability
options of video source, to achieve much better system capacity (number of
channels) than non-feedback solutions like MediaFLO. Cognitive radio techniques
can also be exploited to allow collaboration among mobiles in joint decoding.
At NET/APP layer, focus should be on content-aware
session management, for a multi-cast session, how to design content dispatching
and buffering scheme that guarantee a sliding content access window, exploiting
scalable and multiple description coding, as well as network coding schemes to
achieve efficient content distribution. For content retrieving and play back,
we look at how to identify peer nodes, pull appropriate descriptions/layers of
video content, how to do joint network decoding/source combining, graceful
degradation, achieving scalability and fairness. Optimization decomposition
techniques, game theoretical approaches for peer to peer streaming are
investigated for resource allocation and session management .
This work is an ongoing collaboration with IVPL at
Northwestern University, Network Advanced Tech (NAT) group of Motorola, and
CommNet Group at Princeton University.
Selected publications:
1.1 On Video Summarization and Streaming:
[1] Z. Li, F. Zhai
and A. K. Katsaggelos, “Video Summarization for Energy Efficient Wireless
Streaming”, invited special session paper, Proceedings
of SPIE Visual Communication and Image Processing (VCIP), Beijing, China,
2005.
[2] P. V. Pahalawatta, Z. Li, F. Zhai, and A. K. Katsaggelos,
“Rate-Distortion Optimized Video Summary Generation and Transmission Over
Packet Lossy Networks”, Proceedings of
SPIE Image / Video Communication and Processing (IVCP), San Jose, 2005.
[3] R. Ansorge, E. Kirchmeier, Z.
Li, G. M. Schuster, and A. K. Katsaggelos, “Optimal Intra-Coded Video
Summarization for Two Bandwidth Limited Channels”, Proceedings of Workshop on Image Analysis for Multimedia Interactive
Services (WIAMIS), Montreux, Switzerland, 2005.
[4] Z. Li, G. M.
Schuster, and A. K. Katsaggelos, “Video summarization for multipath
communication”, Proceedings of IEEE Int’l Conference on Image Processing
(ICIP), Geona, Italy, 2005.
1.2 On Multi-Access Multimedia Communication:
[1] Z. Li, Q. Cheng,
A. K. Katsaggelos, and F. Ishtiaq, “Video Summarization and Transmission Adaptation
for Very Low Bit Rate Multi-user Wireless Uplink Video Communication”, invited
special session paper, Proceedings of IEEE Int’l Workshop on Multimedia
Signal Processing (MMSP), Shanghai, 2005.
[2] Z. Li, J. Huang,
and A. K. Katsaggelos, “Pricing Based Collaborative Mutli-User Video Streaming
over Power Constrained Wireless Down Link”, oral paper, IEEE Int’l
Conference on Acoustics, Speech and Signal Processing (ICASSP), Toulouse,
France, 2006.
[3] J. Huang, Z. Li,
M. Chiang, and A. K. Katsaggelos, “Pricing-based Rate Control and Joint Packet
Scheduling for Multi-user Wireless Uplink Video Streaming”, invited paper, 16th
Packet Video Workshop (PV), 2006.
[4] Z. Li, J. Huang, M. Chiang, and A. K. Katsaggelos, “Intelligent
Video Communication: Source Adaptation and Multi-User Collaboration”, invited
paper, China Journal on Communication (CJC), special issue on Multimedia
Communication, Ed. Changwen Chen, Dec. 2006.
[5] J. Huang, Z. Li, M. Chiang, and A. K.
Katsaggelos, “Pricing Based Efficient Multi-User Wireless Video Communication
over a CDMA Downlink”, submitted to IEEE Trans. on Circuits & System for
Video Tech.
[6] W. Shi, Z. Li, and Y. Yu, “Network-Aware Mobile
Gaming State Update Traffic Shaping”, submitted to IEEE Int’l Conf on Computer
Communication and Networking (ICCCN), 2007.
1.3 On Video Broadcasting:
[1] Z. Li, Y. Chen, Y. Wang, and A. K. Katsaggelos, “Scalable Video Broadcasting with Digital
Fountain Code and Limited Feedback over WiMAX Networks“, in prep for IEEE
GLOBECOM, 2007.
[2] Z. Li , J. Huang, A. K. Katsaggelos, and M. Chiang, “Distributed Pricing for Peer-to-Peer Video
Streaming”, in prep for IEEE GLOBECOM, 2007.
For video indexing and retrieval, a hierarchical
subspace structure is built for efficient video representation/indexing and
robust retrieval. The luminance field trajectory representation and indexing
scheme achieves very high retrieval speed, while the localized subspace
representation achieves high retrieval accuracy. The resulting solutions offer
state-of-art performance in both speed and accuracy, and received a best paper
award at ICME ’06.
For a large set of media understanding problems, the
number of classes and samples are very large, e.g., large candidate set face
recognition, head pose estimation, and human activity recognition. A global
model for appearance based modeling lacks expressive power in characterizing
rich local structure and geometry of appearance manifold. In this work, we
developed a data and model localization method that can address this problem
through a piece-wise linear approximation of a complex global non-linear model.
This approach is applied to problems of large set face recognition and head
pose estimation, with very encouraging results in recognition accuracy. A new
theory based on graph computing and embedding is developed to characterizing
the trade-offs between data localization and model accuracy.
This work is an ongoing collaboration with
Northwestern University’s Image and Video Processing Lab (IVPL), and summer
interns from IFP at UIUC.
Selected publications:
2.1 On Video Indexing and Retrieval
[1] Z. Li, A. K.
Katsaggelos, and B. Gandhi " Fast video shot retrieval by trace geometry
matching in principal component space", Proceedings of Intl' Conf. on
Image Processing (ICIP), Singapore, 2004.
[2] Z. Li, A. K.
Katsaggelos and B. Bandhi, "Fast Video Shot Segmentation and Retrieval
Based on Trace Geometry in Principal Component Space", IEE Proceedings
on Vision, Image and Signal Processing, vol. 152, no. 3, May, 2005.
[3] L. Gao, Z. Li, A.
K. Katsaggelos, “Fast Video Shot Retrieval with Luminance Field Trace Indexing
and Geometry Matching”, IEEE Int’l Conference on Image Processing
(ICIP), Atlanta, USA, 2006.
[4] Z. Li, L. Gao,
and A. K. Katsaggelos, “Embedded Local Linear Spaces for Efficient Video Shot
Indexing and Retrieval”, Proc. of
IEEE Int’l Confernece on Multimedia & Expo (ICME), 2006. This work received a best paper award,
5 out of 520 accepted papers.
[5] L. Gao, Z. Li, A.
K. Katsaggelos, “LUFT (LUminance Field Trace) Tree: A Video Shot Segmentation
and Indexing Scheme for Fast Retrieval”, Proceedings of Int’l Conference on
Visual Info Engineering (VIE), Sept., 2006.
[6] Z. Li, L. Gao and
A. K. Katsaggelos, “Locally Embedded Linear Subspaces for Efficient Video
Indexing and Retrieval “, in prep for submission to IEEE Trans. on Image
Processing.
2.2 On Manifold Modeling, Subspace/Metric Learning for Multimedia
Understanding
[1] Y. Fu, Z.Li, T.
S. Huang, and A. K. Katsaggelos, “Locally Embedded Metrics for Image / Video
Clustering and Retrieval”, accepted, Journal of Computer Vision and Image
Understanding (CVIU).
[2] Y. Fu, J. Yuan, Z. Li,
Y. Wu, and T. S. Huang, “Query-Driven Locally Adaptive Fisher Faces for Face
Recognition”, submitted to IEEE Int’l Conf. On Image Processing, 2007.
[3] Z. Li, Y. Fu, J.
Yuan, Y. Wu and T. S. Huang, “Query Driven Locally Discriminantive Linear
Models for Head Pose Estimation”, submitted to IEEE Int’l Conf on Multimedia
& Expo. (ICME), 2007
[4] Y. Fu, Z. Li, X.
Zhou, and T. S. Huang “Laplacian Affinity Propagation For Semi-Supervised
Object Classification”, submitted, IEEE Int’l Conference on Image Processing,
(ICIP), 2007.
[5] Y. Fu, J. Yuan, Z. Li,
T. S. Huang, and Y. Wu, “Expert-Model Face Recognition with Query-Driven
Locally Adaptive Fisher Faces”, submitted, IEEE Int’l Conference on Image
Processing, 2007.
[6] J. Yuan, Z. Li,
Y. Fu, Y. Wu, and T. S. Huang, “Discover Spatial Patterns By Efficient
Candidate Pruning”, submitted, IEEE Int’l Conference on Image Processing, 2007.
Video summary is a shorter version or a “sparse”
temporal representation of the original video sequence. How to select a subset
of frames to best represent the sequence, while meeting constraints in storage,
communication, and viewing time preferences is the goal of video summarization.
In this work we developed a color and motion based metric, as well as, a
Principal Component Space metric for expressing the distortion in a video
summary, and an optimal Dynamic Programming (DP) based framework for generating
a summary. The solution can provide good visual quality video summaries at bit
rates ranging from 8 kbps to 48 kbps. Video summarization demos are available
for subjective evaluation.
Selected publications:
[1] Z. Li, A. K. Katsaggelos, G. Schuster and
B. Gandhi, "Rate-Distortion Optimal Video Summary Generation", IEEE
Trans. on Image Processing, vol. 14, no. 10, October, 2005.
[2] Z. Li, G. M. Schuster, A. K. Katsaggelos,
"MINMAX Optimal Video Summarization and Coding", special issue on
Analysis & Understanding for Media Adaptation, IEEE Trans. on Circuits
and System for Video Technology, vol. 15, no. 10, October, 2005.
[3] Z. Li,
A. K. Katsaggelos, and G. M. Schuster, "Rate-Distortion Optimal Video
Summarization and Coding", book chapter in Intelligent Multimedia
Processing with Soft Computing, Ed. Y. P. Tan, et. al, Springer-Verlag,
Heidelberg, 2005.
Collaborating
with experts of diverse background is especially rewarding and productive. It
is my honor and pleasure to collaborate with a number of excellent researchers
in the multimedia computing and communication area, including,