Unsupervised Learning | Future Media Computing Lab

Video Pivoting Unsupervised Multi-Modal Machine Translation

This paper introduces a video pivoting method for unsupervised multi-modal machine translation (UMMT), which uses spatial-temporal graphs to align sentence pairs in the latent space. By leveraging visual content from videos, the approach enhances …