He is working on embodiment learning and vision-text multimodal learning. Especially, he is interested in boosting robots with human videos from web.