Audio-Visual Scene Understanding

WACV 2021 Tutorial

Click here to join our tutorial

Time: January 9


Overview

Sight and hearing are two of the most important senses for human perception. From cognitive perspective, the visual and auditory information is actually slightly discrepant, but the percept is unified with multisensory integration. What’s more, when there are multiple input senses, human reactions usually perform more exactly or efficiently than single sense. Inspired by this, for computational models, our community has begun to explore marrying computer vision with audition, and targets to address some essential problems of audio-visual learning then further develops them into interesting and worthwhile tasks. In recent years, we were delighted to witness many developments in learning from both visual and auditory data.

This tutorial aims to cover recent advances in audio-visual learning, including audio-visual self-supervised learning, audio-visual sound separation, audio-visual cross-modal generation, and audio-visual video understanding. For each research sub-topic, we will give a concrete introduction of the contained problems/tasks, and the current research progress as well as the open problems. We hope the audience, not only the graduate students but also the researchers new in this area, can benefit from this tutorial and learn the principle problems and cutting-edge approaches of audio-visual learning.


Agenda

08:30 - 08:35      Welcome
08:35 - 09:20      Audio-Visual Self-supervised Learning   Slides
09:20 - 10:05      Audio-Visual Sound Separation   Slides
10:05 - 10:15      Coffee Break
10:15 - 11:00      Audio-Visual Cross-modal Generation    Slides
11:00 - 11:45      Audio-Visual Video Understanding    Slides
11:45 - 11:55      Q&A
11:55 - 12:00      Closing Remarks
January 10
00:30 - 00:35      Welcome
00:35 - 01:20      Audio-Visual Self-supervised Learning    Slides
01:20 - 02:05      Audio-Visual Sound Separation    Slides
02:05 - 02:15      Coffee Break
02:15 - 03:00      Audio-Visual Cross-modal Generation    Slides
03:00 - 03:45      Audio-Visual Video Understanding   Slides
03:45 - 03:55      Q&A
03:55 - 04:00      Closing Remarks
17:30 - 17:35      Welcome
17:35 - 18:20      Audio-Visual Self-supervised Learning   Slides
18:20 - 19:05      Audio-Visual Sound Separation   Slides
19:05 - 19:15      Coffee Break
19:15 - 20:00      Audio-Visual Cross-modal Generation   Slides
20:00 - 20:45      Audio-Visual Video Understanding   Slides
20:45 - 20:55      Q&A
20:55 - 21:00      Closing Remarks

Organizers




Website made by Yake Wei