Video Pivoting Unsupervised Multi-Modal Machine Translation

Mingjie Li, Po-Yao Huang, Xiaojun Chang, Junjie Hu, Yi Yang, Alex Hauptmann

March 2023

Abstract

This paper introduces a video pivoting method for unsupervised multi-modal machine translation (UMMT), which uses spatial-temporal graphs to align sentence pairs in the latent space. By leveraging visual content from videos, the approach enhances translation accuracy and generalization across multiple languages, as demonstrated on the VATEX and HowToWorld datasets.

Type

Journal article

Publication

IEEE Transactions on Pattern Analysis and Machine Intelligence