Multimodal Reasoning

NeuMATCH is the first end-to-end neural network for matching multimodal sequences (e.g., text and video). The industrial movie production pipeline creates a movie script and a movie video, but no correspondence between these two modalities. By aligning the video sequence with the text sequence, we can establish such correspondence, which lays the groundwork for computational understanding of movie content.
End-to-end training is great, but is it panacea for everything? In this paper published at WACV 2021, we show that naive end-to-end training for a complex network like NeuMATCH is inefficient. We propose to align the pace of training and feature distributions across network components to improve training.
Publications
- Jianan Wang, Boyang Li, Xiangyu Fan, Jing Lin, and Yanwei Fu. Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions. The IEEE Winter Conference on Applications of Computer Vision (WACV). 2021. [Supplemental Material]
- Guoyun Tu, Yanwei Fu, Boyang Li, Jiarui Gao, Yu-Gang Jiang, and Xiangyang Xue. A Multi-task Neural Approach for Emotion Attribution, Classification and Summarization. IEEE Transaction on Multimedia. 2019.
- Huijuan Xu, Boyang Li, Vasili Ramanishka, Leonid Sigal, Kate Saenko. Joint Event Detection and Description in Continuous Video Streams. The IEEE Winter Conference on Applications of Computer Vision (WACV). 2019.
- Pelin Dogan, Boyang Li, Leonid Sigal, Markus Gross. A Neural Multi-sequence Alignment TeCHnique (NeuMATCH). The Conference on Computer Vision and Pattern Recognition (CVPR). 2018.
- Baohan Xu, Yanwei Fu, Yu-Gang Jiang, Boyang Li, and Leonid Sigal. Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization. IEEE Transaction on Affective Computing. 2016.
- Baohan Xu, Yanwei Fu, Yu-Gang Jiang, Boyang Li and Leonid Sigal. Video Emotion Recognition with Transferred Deep Feature Encodings. The 2016 ACM International Conference in Multimedia Retrieval. New York, NY. 2016.