News
[2025-03] Two paper accepted by CVPR, thanks to all co-authors!
[2025-02] Awarded the Baidu Scholarship (10 Ph.D students worldwide)!
[2024-12] Awarded the China National Scholarship for Ph.D student!
[2024-12] Attended the Global PhD Gathering @ 2024 Pujiang AI Conference in Shanghai!
[2024-11] Gave a talk about "Balanced multimodal learning" @ Virginia Tech! Thanks the invitation from Prof. Chris Thomas!
[2024-09] One paper accepted by T-PAMI, thanks to all co-authors!
[2024-07] Gave a talk about "Balanced multimodal learning" @ TechBeat! [Record]
[2024-07] One paper accepted by ECCV, thanks to all co-authors!
[2024-05] We release a survey about fusion of low-quality multi-modal data! [arXiv]
[2024-05] One paper accepted by ICML, thanks to all co-authors!
[2024-02] One paper accepted by CVPR, thanks to all co-authors!
[2024-01] One paper accepted by ICLR, thanks to all co-authors!
[2023-12] Start visiting in Human Sensing Lab @ CMU!
[2023-10] One paper accepted by Pattern Recognition, thanks to all co-authors!
[2022-08] We release a survey about recent advances in audio-visual learning! [website]
[2022-05] Gave a talk @ 2022 BAAI Conference . Please find slides here!
[2022-03] Two papers accepted by CVPR, thanks to all co-authors!
[2021-12] One paper accepted by T-PAMI, thanks to all co-authors!
[2021-06] Graduate from University of Electronic Science and Technology of China (UESTC)!
|
Selected Honors
• Baidu Scholarship (10 Ph.D students worldwide) , 2024.
• China National Scholarship for Ph.D student (highest student honor in China) , 2024.
• Outstanding Graduate of Sichuan province (highest honor for graduates set by Sichuan province), 2021.
• Outstanding Graduate of University of Electronic Science and Technology of China, 2021.
|
Research Highlights
Interested in the inherent learning mechanism of perceiving, formulating, and understanding the environment with heterogeneous information from multiple modalities, e.g., vision, sound, text.
In the paper presented at CVPR 2022 (ORAL), introduce the research topic of "Balanced Multimodal Learning" for the first time. Highlight a pervasive issue in multimodal learning, where information utilization of certain modality can be undesirably suppressed by others.
Then conduct a series of systematic studies to alleviate this issue, covering empirical observations, algorithms, and theoretical analysis.
|
Survey
|
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei, Di Hu, Yapeng Tian, Xuelong Li
arXiv / website / awesome list
A systematical survey about the audio-visual learning field.
|
|
Multimodal Fusion on Low-quality Data: A Comprehensive Survey
Qingyang Zhang, Yake Wei, Zongbo Han, Huazhu Fu, Xi Peng, Cheng Deng, Qinghua Hu, Cai Xu, Jie Wen, Di Hu, Changqing Zhang
arXiv / awesome list
A systematical survey about fusion of low-quality multi-modal data.
|
Publications (* equal contribution)
|
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Chengxiang Huang*, Yake Wei*, Zequn Yang, Di Hu
CVPR, 2025
TBD
Analyze and modulate information acquisition process during multimodal training process.
|
|
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Ruotian Peng, Haiying He, Yake Wei, Yandong Wen, Di Hu
CVPR, 2025
TBD
Generate high-quality fine-grained image caption by divide-then-aggregate strategy.
|
|
On-the-fly Modulation for Balanced Multimodal Learning
Yake Wei, Di Hu, Henghui Du, Ji-Rong Wen
P.S. Thanks the valuable help from Zequn Yang
T-PAMI, 2024
arXiv / code
Analyze and modulate imbalanced uni-modal learning from both feed-forward and back-propagation stage.
|
|
Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning
Meng Shen, Yake Wei, Jianxiong Yin, Deepu Rajan, Di Hu, Simon See
ACM MM Asia, 2024
paper
Improve the quality of selected multimodal data pairs in active learning.
|
|
Diagnosing and Re-learning for Balanced Multimodal Learning
Yake Wei, Siwei Li, Ruoxuan Feng, Di Hu
ECCV, 2024
arXiv / code
Dynimically re-initialize uni-modal encoder to enhance both worse-learnt and well-learnt modalities.
|
|
MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
Yake Wei, Di Hu
ICML, 2024
arXiv / code
Solve conflicts between multi-modal and uni-modal gradients under multi-modal scenarios.
|
|
Enhancing Multimodal Cooperation via Sample-level Modality Valuation
Yake Wei, Ruoxuan Feng, Zihe Wang, Di Hu
CVPR, 2024
arXiv / code
Observe and improve the fine-grained cooperation between modalities at sample-level.
|
|
Quantifying and Enhancing Multi-modal Robustness with Modality Preference
Zequn Yang, Yake Wei, Ce Liang, Di Hu
ICLR, 2024
arXiv / code
Analyze essential components for multi-modal robustness and delve into the
limitations imposed by modality preference.
|
|
Geometric-inspired graph-based Incomplete Multi-view Clustering
Zequn Yang, Han Zhang, Yake Wei, Zheng Wang, Feiping Nie, Di Hu
Pattern Recognition, 2023
paper / code
Conduct geometric analyses to mitigate missing views in weight aggregation.
|
|
Balanced Multimodal Learning via On-the-fly Gradient Modulation
Xiaokang Peng*, Yake Wei*, Andong Deng, Dong Wang, Di Hu
CVPR, 2022   (Oral Presentation)
arXiv / code
Alleviate optimization imbalance in multi-modal learning via on-the-fly gradient modulation.
|
|
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li*, Yake Wei*, Yapeng Tian*, Chenliang Xu, Ji-Rong Wen, Di Hu
CVPR, 2022   (Oral Presentation)
arXiv / project page
Audio-Visual Question Answering and propose MUSIC-AVQA dataset.
|
|
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Di Hu, Yake Wei, Rui Qian, Weiyao Lin, Ruihua Song, Ji-Rong Wen
T-PAMI, 2021
arXiv / project page
Discriminative sounding objects localization.
|
Services
Conference Reviewer: CVPR 2022-2025, ECCV 2022/2024, ICCV 2023, AAAI 2023-2025
Journal Reviewer: T-PAMI, T-MM, T-CSVT
|
|