Research Statement

 

Zhu Li, PhD

 

Principal Staff Research Engineer

Multimedia Research Lab (MRL), Motorola Labs

 

 

Motivation

 

The motivation for my research is to develop mobile multimedia computing and communication technologies that can serve people’s need to search, retrieve, communicate, and consume multimedia content any time, anywhere, in a more intelligent and efficient manner. More specifically, I am interested in solutions for video content analysis, search and communication in a networked, mobile environment. In the proposed framework, a better understanding of the video content will be derived at the signal, object, syntactic, and semantic level, and intelligent communication, networking, and content management solutions will be built with techniques and tools from optimization, game theory, statistics, and machine learning.

 

Current and Past Research

 

1. Content-Aware, Collaborative Multimedia Communication with Cross Layer Optimization

 

The goal of this work is to deliver better end-to-end quality in video communication and achieve more efficient communication resource utilization, through distributed optimization in video content adaptation and communication resource allocation. In the vertical direction, content awareness can be derived from video sequences to define richer video quality metrics and utility, which will help intelligent video coding and communication decisions across application, networking and link layers. In the horizontal direction, video communication rarely happens in a single-user and single-hop situation. Conflicts in communication resource sharing need to be addressed. We are therefore looking at efficient ways to exploit multi-user diversity in channel states and content utility-resource tradeoffs.  We developed distributed and collaborative solutions for resource allocation and management, as well as, video adaptation. Pricing and game theoretical approaches from Economics have been applied, resulting in a distributed solution with the computation burden spread among entities in the network.

 

More specifically, at PHY/MAC layer, collaborative streaming should incorporate content awareness in video content structure, rate-distortion characteristics, into utility modeling, and apply pricing/game theoretical approaches in joint resource allocation and management. This applies to wireless multiple access channels, e.g. 802.11 access point, base stations, etc. For broadcast applications, limited APP layer feedback schemes can be implemented along with digital fountain codes and various scalability options of video source, to achieve much better system capacity (number of channels) than non-feedback solutions like MediaFLO. Cognitive radio techniques can also be exploited to allow collaboration among mobiles in joint decoding.

 

At NET/APP layer, focus should be on content-aware session management, for a multi-cast session, how to design content dispatching and buffering scheme that guarantee a sliding content access window, exploiting scalable and multiple description coding, as well as network coding schemes to achieve efficient content distribution. For content retrieving and play back, we look at how to identify peer nodes, pull appropriate descriptions/layers of video content, how to do joint network decoding/source combining, graceful degradation, achieving scalability and fairness. Optimization decomposition techniques, game theoretical approaches for peer to peer streaming are investigated for resource allocation and session management .

 

This work is an ongoing collaboration with IVPL at Northwestern University, Network Advanced Tech (NAT) group of Motorola, and CommNet Group at Princeton University.

 

Selected publications:

 

1.1 On Video Summarization and Streaming:

 

[1] Z. Li, F. Zhai and A. K. Katsaggelos, “Video Summarization for Energy Efficient Wireless Streaming”, invited special session paper, Proceedings of SPIE Visual Communication and Image Processing (VCIP), Beijing, China, 2005.

[2] P. V. Pahalawatta, Z. Li, F. Zhai, and A. K. Katsaggelos, “Rate-Distortion Optimized Video Summary Generation and Transmission Over Packet Lossy Networks”, Proceedings of SPIE Image / Video Communication and Processing (IVCP), San Jose, 2005.

[3]  R. Ansorge, E. Kirchmeier, Z. Li, G. M. Schuster, and A. K. Katsaggelos, “Optimal Intra-Coded Video Summarization for Two Bandwidth Limited Channels”, Proceedings of Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), Montreux, Switzerland, 2005.

[4] Z. Li, G. M. Schuster, and A. K. Katsaggelos, “Video summarization for multipath communication”, Proceedings of IEEE Int’l Conference on Image Processing (ICIP), Geona, Italy, 2005.

 

1.2 On Multi-Access Multimedia Communication:

 

[1] Z. Li, Q. Cheng, A. K. Katsaggelos, and F. Ishtiaq, “Video Summarization and Transmission Adaptation for Very Low Bit Rate Multi-user Wireless Uplink Video Communication”, invited special session paper, Proceedings of IEEE Int’l Workshop on Multimedia Signal Processing (MMSP), Shanghai, 2005.

[2] Z. Li, J. Huang, and A. K. Katsaggelos, “Pricing Based Collaborative Mutli-User Video Streaming over Power Constrained Wireless Down Link”, oral paper, IEEE Int’l Conference on Acoustics, Speech and Signal Processing (ICASSP), Toulouse, France, 2006.

[3] J. Huang, Z. Li, M. Chiang, and A. K. Katsaggelos, “Pricing-based Rate Control and Joint Packet Scheduling for Multi-user Wireless Uplink Video Streaming”, invited paper, 16th Packet Video Workshop (PV), 2006.

[4] Z. Li, J. Huang, M. Chiang, and A. K. Katsaggelos, “Intelligent Video Communication: Source Adaptation and Multi-User Collaboration”, invited paper, China Journal on Communication (CJC), special issue on Multimedia Communication, Ed. Changwen Chen, Dec. 2006.

[5] J. Huang, Z. Li, M. Chiang, and A. K. Katsaggelos, “Pricing Based Efficient Multi-User Wireless Video Communication over a CDMA Downlink”, submitted to IEEE Trans. on Circuits & System for Video Tech.

[6] W. Shi, Z. Li, and Y. Yu, “Network-Aware Mobile Gaming State Update Traffic Shaping”, submitted to IEEE Int’l Conf on Computer Communication and Networking (ICCCN), 2007.

 

1.3 On Video Broadcasting:

 

[1] Z. Li, Y. Chen, Y. Wang, and A. K. Katsaggelos,  “Scalable Video Broadcasting with Digital Fountain Code and Limited Feedback over WiMAX Networks“, in prep for IEEE GLOBECOM, 2007.

[2] Z. Li , J. Huang, A. K. Katsaggelos, and M. Chiang,  “Distributed Pricing for Peer-to-Peer Video Streaming”, in prep for IEEE GLOBECOM, 2007.

 

2. Multimedia Content Analysis, Understanding and Mining.

 

For video indexing and retrieval, a hierarchical subspace structure is built for efficient video representation/indexing and robust retrieval. The luminance field trajectory representation and indexing scheme achieves very high retrieval speed, while the localized subspace representation achieves high retrieval accuracy. The resulting solutions offer state-of-art performance in both speed and accuracy, and received a best paper award at ICME ’06.

 

For a large set of media understanding problems, the number of classes and samples are very large, e.g., large candidate set face recognition, head pose estimation, and human activity recognition. A global model for appearance based modeling lacks expressive power in characterizing rich local structure and geometry of appearance manifold. In this work, we developed a data and model localization method that can address this problem through a piece-wise linear approximation of a complex global non-linear model. This approach is applied to problems of large set face recognition and head pose estimation, with very encouraging results in recognition accuracy. A new theory based on graph computing and embedding is developed to characterizing the trade-offs between data localization and model accuracy.

 

This work is an ongoing collaboration with Northwestern University’s Image and Video Processing Lab (IVPL), and summer interns from IFP at UIUC.

 

Selected publications:

 

2.1 On Video Indexing and Retrieval

 

[1] Z. Li, A. K. Katsaggelos, and B. Gandhi " Fast video shot retrieval by trace geometry matching in principal component space", Proceedings of Intl' Conf. on Image Processing (ICIP), Singapore, 2004.

[2] Z. Li, A. K. Katsaggelos and B. Bandhi, "Fast Video Shot Segmentation and Retrieval Based on Trace Geometry in Principal Component Space", IEE Proceedings on Vision, Image and Signal Processing, vol. 152, no. 3, May, 2005.

[3] L. Gao, Z. Li, A. K. Katsaggelos, “Fast Video Shot Retrieval with Luminance Field Trace Indexing and Geometry Matching”, IEEE Int’l Conference on Image Processing (ICIP), Atlanta, USA, 2006.

[4] Z. Li, L. Gao, and A. K. Katsaggelos, “Embedded Local Linear Spaces for Efficient Video Shot Indexing and Retrieval”,  Proc. of IEEE Int’l Confernece on Multimedia & Expo (ICME), 2006.  This work received a best paper award, 5 out of 520 accepted papers.

[5] L. Gao, Z. Li, A. K. Katsaggelos, “LUFT (LUminance Field Trace) Tree: A Video Shot Segmentation and Indexing Scheme for Fast Retrieval”, Proceedings of Int’l Conference on Visual Info Engineering (VIE), Sept., 2006.

[6] Z. Li, L. Gao and A. K. Katsaggelos, “Locally Embedded Linear Subspaces for Efficient Video Indexing and Retrieval “, in prep for submission to IEEE Trans. on Image Processing.

 

2.2 On Manifold Modeling, Subspace/Metric Learning for Multimedia Understanding

 

[1] Y. Fu, Z.Li, T. S. Huang, and A. K. Katsaggelos, “Locally Embedded Metrics for Image / Video Clustering and Retrieval”, accepted, Journal of Computer Vision and Image Understanding (CVIU).

[2] Y. Fu, J. Yuan, Z. Li, Y. Wu, and T. S. Huang, “Query-Driven Locally Adaptive Fisher Faces for Face Recognition”, submitted to IEEE Int’l Conf. On Image Processing, 2007.

[3] Z. Li, Y. Fu, J. Yuan, Y. Wu and T. S. Huang, “Query Driven Locally Discriminantive Linear Models for Head Pose Estimation”, submitted to IEEE Int’l Conf on Multimedia & Expo. (ICME), 2007

[4] Y. Fu, Z. Li, X. Zhou, and T. S. Huang “Laplacian Affinity Propagation For Semi-Supervised Object Classification”, submitted, IEEE Int’l Conference on Image Processing, (ICIP), 2007.

[5] Y. Fu, J. Yuan, Z. Li, T. S. Huang, and Y. Wu, “Expert-Model Face Recognition with Query-Driven Locally Adaptive Fisher Faces”, submitted, IEEE Int’l Conference on Image Processing, 2007.

[6] J. Yuan, Z. Li, Y. Fu, Y. Wu, and T. S. Huang, “Discover Spatial Patterns By Efficient Candidate Pruning”, submitted, IEEE Int’l Conference on Image Processing, 2007.

 

3. Video Summarization, Coding and Wireless Streaming

 

Video summary is a shorter version or a “sparse” temporal representation of the original video sequence. How to select a subset of frames to best represent the sequence, while meeting constraints in storage, communication, and viewing time preferences is the goal of video summarization. In this work we developed a color and motion based metric, as well as, a Principal Component Space metric for expressing the distortion in a video summary, and an optimal Dynamic Programming (DP) based framework for generating a summary. The solution can provide good visual quality video summaries at bit rates ranging from 8 kbps to 48 kbps. Video summarization demos are available for subjective evaluation.

 

Selected publications:

 

[1] Z. Li, A. K. Katsaggelos, G. Schuster and B. Gandhi, "Rate-Distortion Optimal Video Summary Generation", IEEE Trans. on Image Processing, vol. 14, no. 10, October, 2005.

[2] Z. Li, G. M. Schuster, A. K. Katsaggelos, "MINMAX Optimal Video Summarization and Coding", special issue on Analysis & Understanding for Media Adaptation, IEEE Trans. on Circuits and System for Video Technology, vol. 15, no. 10, October, 2005.

[3]  Z. Li, A. K. Katsaggelos, and G. M. Schuster, "Rate-Distortion Optimal Video Summarization and Coding", book chapter in Intelligent Multimedia Processing with Soft Computing, Ed. Y. P. Tan, et. al, Springer-Verlag, Heidelberg, 2005.

 

Research Collaboration

 

Collaborating with experts of diverse background is especially rewarding and productive. It is my honor and pleasure to collaborate with a number of excellent researchers in the multimedia computing and communication area, including,