News
[2024-09] One paper accepted by T-PAMI, thanks to all co-authors!
[2024-07] Gave a talk about "Balanced multimodal learning" @ TechBeat! [Record]
[2024-07] One paper accepted by ECCV, thanks to all co-authors!
[2024-05] We release a survey about fusion of low-quality multi-modal data! [arXiv]
[2024-05] One paper accepted by ICML, thanks to all co-authors!
[2024-02] One paper accepted by CVPR, thanks to all co-authors!
[2024-01] One paper accepted by ICLR, thanks to all co-authors!
[2023-12] Start visiting in Human Sensing Lab @ CMU!
[2023-10] One paper accepted by Pattern Recognition, thanks to all co-authors!
[2022-08] We release a survey about recent advances in audio-visual learning! [website]
[2022-05] Gave a talk @ 2022 BAAI Conference . Please find slides here!
[2022-03] Two papers accepted by CVPR, thanks to all co-authors!
[2021-12] One paper accepted by T-PAMI, thanks to all co-authors!
[2021-06] Graduate from University of Electronic Science and Technology of China (UESTC)!
|
Services
Conference Reviewer: CVPR 2022-2024, ECCV 2022/2024, ICCV 2023, AAAI 2023-2025
Journal Reviewer: TMM, TPAMI, TCSVT
|
Survey
|
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei, Di Hu, Yapeng Tian, Xuelong Li
arXiv / website / awesome list
A systematical survey about the audio-visual learning field.
|
|
Multimodal Fusion on Low-quality Data: A Comprehensive Survey
Qingyang Zhang, Yake Wei, Zongbo Han, Huazhu Fu, Xi Peng, Cheng Deng, Qinghua Hu, Cai Xu, Jie Wen, Di Hu, Changqing Zhang
arXiv / awesome list
A systematical survey about fusion of low-quality multi-modal data.
|
Publications(* equal contribution)
|
On-the-fly Modulation for Balanced Multimodal Learning
Yake Wei, Di Hu, Henghui Du, Ji-Rong Wen
P.S. Thanks the valuable help from Zequn Yang
T-PAMI, 2024
arXiv / code
Analyze and modulate imbalanced uni-modal learning from both feed-forward and back-propagation stage.
|
|
Diagnosing and Re-learning for Balanced Multimodal Learning
Yake Wei, Siwei Li, Ruoxuan Feng, Di Hu
ECCV, 2024
arXiv / code
Dynimically re-initialize uni-modal encoder to enhance both worse-learnt and well-learnt modalities.
|
|
MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
Yake Wei, Di Hu
ICML, 2024
arXiv / code
Solve conflicts between multi-modal and uni-modal gradients under multi-modal scenarios.
|
|
Enhancing Multimodal Cooperation via Sample-level Modality Valuation
Yake Wei, Ruoxuan Feng, Zihe Wang, Di Hu
CVPR, 2024
arXiv / code
Observe and improve the fine-grained cooperation between modalities at sample-level.
|
|
Quantifying and Enhancing Multi-modal Robustness with Modality Preference
Zequn Yang, Yake Wei, Ce Liang, Di Hu
ICLR, 2024
arXiv / code
Analyze essential components for multi-modal robustness and delve into the
limitations imposed by modality preference.
|
|
Geometric-inspired graph-based Incomplete Multi-view Clustering
Zequn Yang, Han Zhang, Yake Wei, Zheng Wang, Feiping Nie, Di Hu
Pattern Recognition, 2023
paper / code
Conduct geometric analyses to mitigate missing views in weight aggregation.
|
|
Balanced Multimodal Learning via On-the-fly Gradient Modulation
Xiaokang Peng*, Yake Wei*, Andong Deng, Dong Wang, Di Hu
CVPR, 2022   (Oral Presentation)
arXiv / code
Alleviate optimization imbalance in multi-modal learning via on-the-fly gradient modulation.
|
|
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li*, Yake Wei*, Yapeng Tian*, Chenliang Xu, Ji-Rong Wen, Di Hu
CVPR, 2022   (Oral Presentation)
arXiv / project page
Audio-Visual Question Answering and propose MUSIC-AVQA dataset.
|
|
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Di Hu, Yake Wei, Rui Qian, Weiyao Lin, Ruihua Song, Ji-Rong Wen
T-PAMI, 2021
arXiv / project page
Discriminative sounding objects localization.
|
|