Yake WEI/卫雅珂

I am a Ph.D student at GeWu-Lab, Gaoling School of Artificial Intelligence, Renmin University of China. I am advised by Prof. Di Hu. My research interests focus on multimodal learning.

I received my bachelor's degree in Computer Science and Technology from University of Electronic Science and Technology of China (UESTC). Had a wondeful time with my friends in Chengdu, China from 2017-2021.

Email  /  Google Scholar  /  GitHub  /  CV

profile photo
News

[2025-03] Two paper accepted by CVPR, thanks to all co-authors!

[2025-02] Awarded the Baidu Scholarship (10 Ph.D students worldwide)!

[2024-12] Awarded the China National Scholarship for Ph.D student!

[2024-12] Attended the Global PhD Gathering @ 2024 Pujiang AI Conference in Shanghai!

[2024-11] Gave a talk about "Balanced multimodal learning" @ Virginia Tech! Thanks the invitation from Prof. Chris Thomas!

[2024-09] One paper accepted by T-PAMI, thanks to all co-authors!

[2024-07] Gave a talk about "Balanced multimodal learning" @ TechBeat! [Record]

[2024-07] One paper accepted by ECCV, thanks to all co-authors!

[2024-05] We release a survey about fusion of low-quality multi-modal data! [arXiv]

[2024-05] One paper accepted by ICML, thanks to all co-authors!

[2024-02] One paper accepted by CVPR, thanks to all co-authors!

[2024-01] One paper accepted by ICLR, thanks to all co-authors!

[2023-12] Start visiting in Human Sensing Lab @ CMU!

[2023-10] One paper accepted by Pattern Recognition, thanks to all co-authors!

[2022-08] We release a survey about recent advances in audio-visual learning! [website]

[2022-05] Gave a talk @ 2022 BAAI Conference . Please find slides here!

[2022-03] Two papers accepted by CVPR, thanks to all co-authors!

[2021-12] One paper accepted by T-PAMI, thanks to all co-authors!

[2021-06] Graduate from University of Electronic Science and Technology of China (UESTC)!

Selected Honors

• Baidu Scholarship (10 Ph.D students worldwide) , 2024.

• China National Scholarship for Ph.D student (highest student honor in China) , 2024.

• Outstanding Graduate of Sichuan province (highest honor for graduates set by Sichuan province), 2021.

• Outstanding Graduate of University of Electronic Science and Technology of China, 2021.

Research Highlights

Interested in the inherent learning mechanism of perceiving, formulating, and understanding the environment with heterogeneous information from multiple modalities, e.g., vision, sound, text.

In the paper presented at CVPR 2022 (ORAL), introduce the research topic of "Balanced Multimodal Learning" for the first time. Highlight a pervasive issue in multimodal learning, where information utilization of certain modality can be undesirably suppressed by others.

Then conduct a series of systematic studies to alleviate this issue, covering empirical observations, algorithms, and theoretical analysis. clean-usnob

Survey
clean-usnob Learning in Audio-visual Context: A Review, Analysis, and New Perspective

Yake Wei, Di Hu, Yapeng Tian, Xuelong Li


arXiv / website / awesome list

A systematical survey about the audio-visual learning field.

clean-usnob Multimodal Fusion on Low-quality Data: A Comprehensive Survey

Qingyang Zhang, Yake Wei, Zongbo Han, Huazhu Fu, Xi Peng, Cheng Deng, Qinghua Hu, Cai Xu, Jie Wen, Di Hu, Changqing Zhang


arXiv / awesome list

A systematical survey about fusion of low-quality multi-modal data.


Publications (* equal contribution)
clean-usnob Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition

Chengxiang Huang*, Yake Wei*, Zequn Yang, Di Hu

CVPR, 2025
TBD

Analyze and modulate information acquisition process during multimodal training process.

clean-usnob Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception

Ruotian Peng, Haiying He, Yake Wei, Yandong Wen, Di Hu

CVPR, 2025
TBD

Generate high-quality fine-grained image caption by divide-then-aggregate strategy.

clean-usnob On-the-fly Modulation for Balanced Multimodal Learning

Yake Wei, Di Hu, Henghui Du, Ji-Rong Wen
P.S. Thanks the valuable help from Zequn Yang

T-PAMI, 2024
arXiv / code

Analyze and modulate imbalanced uni-modal learning from both feed-forward and back-propagation stage.

clean-usnob Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning

Meng Shen, Yake Wei, Jianxiong Yin, Deepu Rajan, Di Hu, Simon See

ACM MM Asia, 2024
paper

Improve the quality of selected multimodal data pairs in active learning.

clean-usnob Diagnosing and Re-learning for Balanced Multimodal Learning

Yake Wei, Siwei Li, Ruoxuan Feng, Di Hu

ECCV, 2024
arXiv / code

Dynimically re-initialize uni-modal encoder to enhance both worse-learnt and well-learnt modalities.

clean-usnob MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

Yake Wei, Di Hu

ICML, 2024
arXiv / code

Solve conflicts between multi-modal and uni-modal gradients under multi-modal scenarios.

clean-usnob Enhancing Multimodal Cooperation via Sample-level Modality Valuation

Yake Wei, Ruoxuan Feng, Zihe Wang, Di Hu

CVPR, 2024
arXiv / code

Observe and improve the fine-grained cooperation between modalities at sample-level.

clean-usnob Quantifying and Enhancing Multi-modal Robustness with Modality Preference

Zequn Yang, Yake Wei, Ce Liang, Di Hu

ICLR, 2024
arXiv / code

Analyze essential components for multi-modal robustness and delve into the limitations imposed by modality preference.

clean-usnob Geometric-inspired graph-based Incomplete Multi-view Clustering

Zequn Yang, Han Zhang, Yake Wei, Zheng Wang, Feiping Nie, Di Hu

Pattern Recognition, 2023
paper / code

Conduct geometric analyses to mitigate missing views in weight aggregation.

clean-usnob Balanced Multimodal Learning via On-the-fly Gradient Modulation

Xiaokang Peng*, Yake Wei*, Andong Deng, Dong Wang, Di Hu

CVPR, 2022   (Oral Presentation)
arXiv / code

Alleviate optimization imbalance in multi-modal learning via on-the-fly gradient modulation.

clean-usnob Learning to Answer Questions in Dynamic Audio-Visual Scenarios

Guangyao Li*, Yake Wei*, Yapeng Tian*, Chenliang Xu, Ji-Rong Wen, Di Hu

CVPR, 2022   (Oral Presentation)
arXiv / project page

Audio-Visual Question Answering and propose MUSIC-AVQA dataset.

clean-usnob Class-aware Sounding Objects Localization via Audiovisual Correspondence

Di Hu, Yake Wei, Rui Qian, Weiyao Lin, Ruihua Song, Ji-Rong Wen

T-PAMI, 2021
arXiv / project page

Discriminative sounding objects localization.

Services

Conference Reviewer: CVPR 2022-2025, ECCV 2022/2024, ICCV 2023, AAAI 2023-2025

Journal Reviewer: T-PAMI, T-MM, T-CSVT



Updated at Mar. 2025
Thanks Jon Barron for this amazing template.