"12-in-1: Multi-Task Vision and Language Representation Learning."

Jiasen Lu et al. (2019)
a service of Schloss Dagstuhl - Leibniz Center for Informatics