USTC Future Media Computing Lab

University of Science and Technology of China

Welcome to the Future Media Computing Lab

The USTC Future Media Computing Lab is run by Prof Xiaojun Chang and Prof Xun Yang. It is a cutting-edge research center dedicated to advancing the frontiers of multimedia and AI technologies. Focused on areas such as video content analysis, multimodal intelligence, 3D vision, and human-computer interaction, the lab aims to revolutionize how media is processed, understood, and generated. Through interdisciplinary collaboration, the lab develops innovative algorithms and systems that address real-world challenges, from video-based recognition tasks to intelligent media creation, fostering breakthroughs in both academic and industrial applications.

The USTC Future Media Computing Lab is always looking for talented undergraduate, graduate students and postdocs. If you’re interested in working in the exciting field of future media computing, feel free to reach out!

Featured Publications

Guangrun Wang, Changlin Li, Liuchun Yuan, Jiefeng Peng, Xiaoyu Xian, Xiaodan Liang, Xiaojun Chang, Liang Lin

May 2024 IEEE Transactions on Pattern Analysis and Machine Intelligence

DNA Family: Boosting Weight-Sharing NAS With Block-Wise Supervisions

This paper presents the DNA Family, a new framework for boosting the effectiveness of weight-sharing Neural Architecture Search (NAS) by dividing large search spaces into smaller blocks and applying block-wise supervisions. The approach demonstrates high performance on benchmarks such as ImageNet, surpassing previous NAS techniques in accuracy and efficiency.

PDF DOI

Changlin Li, Guangrun Wang, Bing Wang, Xiaodan Liang, Zhihui Li, Xiaojun Chang

April 2024 IEEE Transactions on Pattern Analysis and Machine Intelligence

DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Vision Transformers

This paper presents DS-Net++, a novel framework for efficient inference in neural networks. Dynamic weight slicing allows for scalable performance across multiple architectures like CNNs and vision transformers. The method delivers up to 61.5% real-world acceleration with minimal accuracy drops on models like MobileNet, ResNet-50, and Vision Transformer, showing its potential in hardware-efficient dynamic networks.

PDF DOI

Caixia Yan, Xiaojun Chang, Minnan Luo, Huan Liu, Xiaoqin Zhang, Qinghua Zheng

March 2024 IEEE Transactions on Pattern Analysis and Machine Intelligence

Semantics-Guided Contrastive Network for Zero-Shot Object Detection

This paper presents ContrastZSD, a semantics-guided contrastive network for zero-shot object detection (ZSD). The framework improves visual-semantic alignment and mitigates the bias problem towards seen classes by incorporating region-category and region-region contrastive learning. ContrastZSD demonstrates superior performance in both ZSD and generalized ZSD tasks across PASCAL VOC and MS COCO datasets.

PDF DOI

Lingling Zhang, Xiaojun Chang, Jun Liu, Minnan Luo, Zhihui Li, Lina Yao, Alex Hauptmann

March 2024 IEEE Transactions on Pattern Analysis and Machine Intelligence

TN-ZSTAD: Transferable Network for Zero-Shot Temporal Activity Detection

TN-ZSTAD introduces a novel approach to zero-shot temporal activity detection (ZSTAD) in long untrimmed videos. By integrating an activity graph transformer with zero-shot detection techniques, it addresses the challenge of recognizing and localizing unseen activities. Experiments on THUMOS'14, Charades, and ActivityNet datasets validate its superior performance in detecting unseen activities.

PDF DOI

Zhe Liu, Yun Li, Lina Yao, Xiaojun Chang, Wei Fang, Xiaojun Wu, Abdulmotaleb El Saddik

January 2024 IEEE Transactions on Pattern Analysis and Machine Intelligence

Simple Primitives With Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-Shot Learning

This paper proposes the SAD-SP model, which improves open-world compositional zero-shot learning by capturing contextuality and feasibility dependencies between states and objects. Using semantic attention and knowledge disentanglement, the approach enhances performance on benchmarks like MIT-States and C-GQA by predicting unseen compositions more accurately.

PDF DOI

Haoyu Zhang, Meng Liu, Yuhong Li, Ming Yan, Zan Gao, Xiaojun Chang, Liqiang Nie

December 2023 IEEE Transactions on Pattern Analysis and Machine Intelligence

Attribute-Guided Collaborative Learning for Partial Person Re-Identification

This paper introduces a novel framework for partial person re-identification, addressing the challenge of image spatial misalignment due to occlusions. The framework utilizes an adaptive threshold-guided masked graph convolutional network and incorporates human attributes to enhance the accuracy of pedestrian representations. Experimental results demonstrate its effectiveness across multiple public datasets.

PDF DOI

Caixia Yan, Xiaojun Chang, Zhihui Li, Weili Guan, Zongyuan Ge, Lei Zhu, Qinghua Zheng

December 2023 IEEE Transactions on Pattern Analysis and Machine Intelligence

ZeroNAS: Differentiable Generative Adversarial Networks Search for Zero-Shot Learning

ZeroNAS presents a differentiable generative adversarial network architecture search method specifically designed for zero-shot learning (ZSL). The approach optimizes both generator and discriminator architectures, leading to significant improvements in ZSL and generalized ZSL tasks across various datasets.

PDF DOI

Zhihui Li, Pengfei Xu, Xiaojun Chang, Luyao Yang, Yuanyuan Zhang, Lina Yao, Xiaojiang Chen

August 2023 IEEE Transactions on Pattern Analysis and Machine Intelligence

When Object Detection Meets Knowledge Distillation: A Survey

This paper provides a comprehensive review of the recent advancements in knowledge distillation (KD)-based object detection (OD) models. It covers different KD strategies for improving object detection tasks, such as incremental OD, small object detection, and weakly supervised OD. The paper also explores advanced distillation techniques and highlights future research directions in the field.

PDF DOI

Mingjie Li, Po-Yao Huang, Xiaojun Chang, Junjie Hu, Yi Yang, Alex Hauptmann

March 2023 IEEE Transactions on Pattern Analysis and Machine Intelligence

Video Pivoting Unsupervised Multi-Modal Machine Translation

This paper introduces a video pivoting method for unsupervised multi-modal machine translation (UMMT), which uses spatial-temporal graphs to align sentence pairs in the latent space. By leveraging visual content from videos, the approach enhances translation accuracy and generalization across multiple languages, as demonstrated on the VATEX and HowToWorld datasets.

PDF DOI

Xiaojun Chang, Pengzhen Ren, Pengfei Xu, Zhihui Li, Xiaojiang Chen, Alex Hauptmann

January 2023 IEEE Transactions on Pattern Analysis and Machine Intelligence

A Comprehensive Survey of Scene Graphs: Generation and Application

This survey provides a thorough exploration of the concept of scene graphs, discussing their role in visual understanding tasks. Scene graphs represent objects, their attributes, and relationships, helping improve tasks like visual reasoning and image captioning. The paper outlines various generation methods and applications, and also highlights key challenges like the long-tailed distribution of relationships.

PDF DOI

Miao Zhang, Huiqi Li, Shirui Pan, Xiaojun Chang, Chuan Zhou, Zongyuan Ge, Steven Su

September 2021 IEEE Transactions on Pattern Analysis and Machine Intelligence

One-Shot Neural Architecture Search: Maximising Diversity to Overcome Catastrophic Forgetting

This work addresses the catastrophic forgetting problem in one-shot neural architecture search by treating supernet training as a constrained optimization problem. The proposed method uses a novelty search-based architecture selection approach to enhance diversity and boost performance, achieving competitive results on CIFAR-10 and ImageNet datasets.

PDF DOI

Jianfeng Dong, Xirong Li, Chaoxi Xu, Xun Yang, Gang Yang, Xun Wang

February 2021 IEEE Transactions on Pattern Analysis and Machine Intelligence

Dual Encoding for Video Retrieval by Text

This paper attacks the challenging problem of video retrieval by text. In such a retrieval paradigm, an end user searches for unlabeled videos by ad-hoc queries described exclusively in the form of a natural-language sentence, with no visual example provided. Given videos as sequences of frames and queries as sequences of words, an effective sequence-to-sequence cross-modal matching is crucial. To that end, the two modalities need to be first encoded into real-valued vectors and then projected into a common space. In this paper we achieve this by proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own. Our novelty is two-fold. First, different from prior art that resorts to a specific single-level encoder, the proposed network performs multi-level encoding that represents the rich content of both modalities in a coarse-to-fine fashion. Second, different from a conventional common space learning algorithm which is either concept based or latent space based, we introduce hybrid space learning which combines the high performance of the latent space and the good interpretability of the concept space. Dual encoding is conceptually simple, practically effective and end-to-end trained with hybrid space learning. Extensive experiments on four challenging video datasets show the viability of the new method. Code and data are available at https://github.com/danieljf24/hybrid_space

PDF DOI

Xiaojun Chang, Yao-Liang Yu, Yi Yang, Eric P. Xing

August 2017 IEEE Transactions on Pattern Analysis and Machine Intelligence

Semantic Pooling for Complex Event Analysis in Untrimmed Videos

This paper introduces a novel semantic pooling method for event analysis tasks like detection, recognition, and recounting in long untrimmed Internet videos. Using semantic saliency, the approach ranks video shots to prioritize the most relevant ones, improving the classifier’s accuracy. The paper proposes a nearly-isotonic SVM classifier, validated with experiments on real-world datasets, showcasing significant performance improvements.

PDF DOI

USTC Future Media Computing Lab

University of Science and Technology of China

Welcome to the Future Media Computing Lab

Meet the Team

Researchers

Xiaojun Chang

Chair Professor / Director

Xun Yang

Professor / Deputy Director

Zhihui Li

Professor

Lei Chen

Associate Researcher

Featured Publications

DNA Family: Boosting Weight-Sharing NAS With Block-Wise Supervisions

DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Vision Transformers

Semantics-Guided Contrastive Network for Zero-Shot Object Detection

TN-ZSTAD: Transferable Network for Zero-Shot Temporal Activity Detection

Simple Primitives With Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-Shot Learning

Attribute-Guided Collaborative Learning for Partial Person Re-Identification

ZeroNAS: Differentiable Generative Adversarial Networks Search for Zero-Shot Learning

When Object Detection Meets Knowledge Distillation: A Survey

Video Pivoting Unsupervised Multi-Modal Machine Translation

A Comprehensive Survey of Scene Graphs: Generation and Application

One-Shot Neural Architecture Search: Maximising Diversity to Overcome Catastrophic Forgetting

Dual Encoding for Video Retrieval by Text

Semantic Pooling for Complex Event Analysis in Untrimmed Videos

Recent Publications

DNA Family: Boosting Weight-Sharing NAS With Block-Wise Supervisions

DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Vision Transformers

Semantics-Guided Contrastive Network for Zero-Shot Object Detection

TN-ZSTAD: Transferable Network for Zero-Shot Temporal Activity Detection

Simple Primitives With Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-Shot Learning

Contact

USTC Future Media Computing Lab

University of Science and Technology of China

Welcome to the Future Media Computing Lab

Meet the Team

Researchers

Chair Professor / Director

Professor / Deputy Director

Professor

Associate Researcher

Featured Publications

Recent Publications

Popular Topics

Contact