default search action
Zhe Gan
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j4]Chunyuan Li, Zhe Gan, Zhengyuan Yang, Jianwei Yang, Linjie Li, Lijuan Wang, Jianfeng Gao:
Multimodal Foundation Models: From Specialists to General-Purpose Assistants. Found. Trends Comput. Graph. Vis. 16(1-2): 1-214 (2024) - [c111]Jaemin Cho, Linjie Li, Zhengyuan Yang, Zhe Gan, Lijuan Wang, Mohit Bansal:
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation. CVPR Workshops 2024: 5280-5289 - [c110]Zhengfeng Lai, Haotian Zhang, Bowen Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao:
VeCLIP: Improving CLIP Training via Visual-Enriched Captions. ECCV (42) 2024: 111-127 - [c109]Jialian Wu, Jianfeng Wang, Zhengyuan Yang, Zhe Gan, Zicheng Liu, Junsong Yuan, Lijuan Wang:
GRiT: A Generative Region-to-Text Transformer for Object Understanding. ECCV (80) 2024: 207-224 - [c108]Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan:
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs. ECCV (64) 2024: 240-255 - [c107]Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang:
MM1: Methods, Analysis and Insights from Multimodal LLM Pre-training. ECCV (29) 2024: 304-323 - [c106]Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev:
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation. EMNLP 2024: 1281-1287 - [c105]Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan:
Guiding Instruction-based Image Editing via Multimodal Large Language Models. ICLR 2024 - [c104]Ajay Kumar Jaiswal, Zhe Gan, Xianzhi Du, Bowen Zhang, Zhangyang Wang, Yinfei Yang:
Compressing LLMs: The Truth is Rarely Pure and Never Simple. ICLR 2024 - [c103]Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang:
Ferret: Refer and Ground Anything Anywhere at Any Granularity. ICLR 2024 - [i128]Yusu Qian, Haotian Zhang, Yinfei Yang, Zhe Gan:
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts. CoRR abs/2402.13220 (2024) - [i127]Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang:
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training. CoRR abs/2403.09611 (2024) - [i126]Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan:
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs. CoRR abs/2404.05719 (2024) - [i125]Haotian Zhang, Haoxuan You, Philipp Dufter, Bowen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang:
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models. CoRR abs/2404.07973 (2024) - [i124]Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier, Peter Grasch, Yinfei Yang, Zhe Gan:
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs. CoRR abs/2407.01509 (2024) - [i123]Elmira Amirloo, Jean-Philippe Fauconnier, Christoph Roesmann, Christian Kerl, Rinu Boney, Yusu Qian, Zirui Wang, Afshin Dehghan, Yinfei Yang, Zhe Gan, Peter Grasch:
Understanding Alignment in Multimodal LLMs: A Comprehensive Study. CoRR abs/2407.02477 (2024) - [i122]Mingze Xu, Mingfei Gao, Zhe Gan, Hong-You Chen, Zhengfeng Lai, Haiming Gang, Kai Kang, Afshin Dehghan:
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models. CoRR abs/2407.15841 (2024) - [i121]Haotian Zhang, Mingfei Gao, Zhe Gan, Philipp Dufter, Nina Wenzel, Forrest Huang, Dhruti Shah, Xianzhi Du, Bowen Zhang, Yanghao Li, Sam Dodge, Keen You, Zhen Yang, Aleksei Timofeev, Mingze Xu, Hong-You Chen, Jean-Philippe Fauconnier, Zhengfeng Lai, Haoxuan You, Zirui Wang, Afshin Dehghan, Peter Grasch, Yinfei Yang:
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning. CoRR abs/2409.20566 (2024) - [i120]Zhengfeng Lai, Vasileios Saveris, Chen Chen, Hong-You Chen, Haotian Zhang, Bowen Zhang, Juan Lao Tebar, Wenze Hu, Zhe Gan, Peter Grasch, Meng Cao, Yinfei Yang:
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models. CoRR abs/2410.02740 (2024) - [i119]Hong-You Chen, Zhengfeng Lai, Haotian Zhang, Xinze Wang, Marcin Eichner, Keen You, Meng Cao, Bowen Zhang, Yinfei Yang, Zhe Gan:
Contrastive Localized Language-Image Pre-Training. CoRR abs/2410.02746 (2024) - [i118]Hanrong Ye, Haotian Zhang, Erik A. Daxberger, Lin Chen, Zongyu Lin, Yanghao Li, Bowen Zhang, Haoxuan You, Dan Xu, Zhe Gan, Jiasen Lu, Yinfei Yang:
MM-Ego: Towards Building Egocentric Multimodal LLMs. CoRR abs/2410.07177 (2024) - [i117]Ruohong Zhang, Bowen Zhang, Yanghao Li, Haotian Zhang, Zhiqing Sun, Zhe Gan, Yinfei Yang, Ruoming Pang, Yiming Yang:
Improve Vision Language Model Chain-of-thought Reasoning. CoRR abs/2410.16198 (2024) - [i116]Zhangheng Li, Keen You, Haotian Zhang, Di Feng, Harsh Agrawal, Xiujun Li, Mohana Prasad Sathya Moorthy, Jeff Nichols, Yinfei Yang, Zhe Gan:
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms. CoRR abs/2410.18967 (2024) - 2023
- [c102]Jinghao Zhou, Li Dong, Zhe Gan, Lijuan Wang, Furu Wei:
Non-Contrastive Learning Meets Language-Image Pre-Training. CVPR 2023: 11028-11038 - [c101]Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang:
ReCo: Region-Controlled Text-to-Image Generation. CVPR 2023: 14246-14255 - [c100]Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, Jianfeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee, Jianfeng Gao:
Generalized Decoding for Pixel, Image, and Language. CVPR 2023: 15116-15127 - [c99]Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu:
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling. CVPR 2023: 22898-22909 - [c98]Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang:
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling. CVPR 2023: 23119-23129 - [c97]Yi-Lin Sung, Linjie Li, Kevin Lin, Zhe Gan, Mohit Bansal, Lijuan Wang:
An Empirical Study of Multimodal Model Merging. EMNLP (Findings) 2023: 1563-1575 - [c96]Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev:
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation. ICBINB 2023: 127-133 - [c95]Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan L. Boyd-Graber, Lijuan Wang:
Prompting GPT-3 To Be Reliable. ICLR 2023 - [i115]Jaemin Cho, Linjie Li, Zhengyuan Yang, Zhe Gan, Lijuan Wang, Mohit Bansal:
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation. CoRR abs/2304.06671 (2023) - [i114]Yi-Lin Sung, Linjie Li, Kevin Lin, Zhe Gan, Mohit Bansal, Lijuan Wang:
An Empirical Study of Multimodal Model Merging. CoRR abs/2304.14933 (2023) - [i113]Wentao Wu, Aleksei Timofeev, Chen Chen, Bowen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jonathon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang:
MOFI: Learning Image Representations from Noisy Entity Annotated Images. CoRR abs/2306.07952 (2023) - [i112]Chunyuan Li, Zhe Gan, Zhengyuan Yang, Jianwei Yang, Linjie Li, Lijuan Wang, Jianfeng Gao:
Multimodal Foundation Models: From Specialists to General-Purpose Assistants. CoRR abs/2309.10020 (2023) - [i111]Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan:
Guiding Instruction-based Image Editing via Multimodal Large Language Models. CoRR abs/2309.17102 (2023) - [i110]Ajay Jaiswal, Zhe Gan, Xianzhi Du, Bowen Zhang, Zhangyang Wang, Yinfei Yang:
Compressing LLMs: The Truth is Rarely Pure and Never Simple. CoRR abs/2310.01382 (2023) - [i109]Zhengfeng Lai, Haotian Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao:
From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions. CoRR abs/2310.07699 (2023) - [i108]Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang:
Ferret: Refer and Ground Anything Anywhere at Any Granularity. CoRR abs/2310.07704 (2023) - [i107]Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev:
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation. CoRR abs/2311.16201 (2023) - [i106]Bingbing Wen, Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Bill Howe, Lijuan Wang:
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models. CoRR abs/2312.13503 (2023) - 2022
- [j3]Zhe Gan, Linjie Li, Chunyuan Li, Lijuan Wang, Zicheng Liu, Jianfeng Gao:
Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends. Found. Trends Comput. Graph. Vis. 14(3-4): 163-352 (2022) - [j2]Tianlong Chen, Yu Cheng, Zhe Gan, Jianfeng Wang, Lijuan Wang, Jingjing Liu, Zhangyang Wang:
Adversarial Feature Augmentation and Normalization for Visual Recognition. Trans. Mach. Learn. Res. 2022 (2022) - [j1]Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang:
GIT: A Generative Image-to-text Transformer for Vision and Language. Trans. Mach. Learn. Res. 2022 (2022) - [c94]Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang, Zicheng Liu:
Playing Lottery Tickets with Vision and Language. AAAI 2022: 652-660 - [c93]Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang:
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA. AAAI 2022: 3081-3089 - [c92]Jinghui Chen, Yu Cheng, Zhe Gan, Quanquan Gu, Jingjing Liu:
Efficient Robust Training via Backward Smoothing. AAAI 2022: 6222-6230 - [c91]Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang:
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning. CVPR 2022: 17928-17937 - [c90]Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang:
Scaling Up Vision-Language Pretraining for Image Captioning. CVPR 2022: 17959-17968 - [c89]Zhiyuan Fang, Jianfeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu:
Injecting Semantic Concepts into End-to-End Image Captioning. CVPR 2022: 17988-17998 - [c88]Zi-Yi Dou, Yichong Xu, Zhe Gan, Jianfeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael Zeng:
An Empirical Study of Training End-to-End Vision-and-Language Transformers. CVPR 2022: 18145-18155 - [c87]Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang:
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling. ECCV (36) 2022: 521-539 - [c86]Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, Jianfeng Wang, Linjie Li, Zicheng Liu, Ce Liu, Yann LeCun, Nanyun Peng, Jianfeng Gao, Lijuan Wang:
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone. NeurIPS 2022 - [c85]Jian Liang, Chenfei Wu, Xiaowei Hu, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan:
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis. NeurIPS 2022 - [c84]Sheng Shen, Chunyuan Li, Xiaowei Hu, Yujia Xie, Jianwei Yang, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao:
K-LITE: Learning Transferable Visual Models with External Knowledge. NeurIPS 2022 - [i105]Sheng Shen, Chunyuan Li, Xiaowei Hu, Yujia Xie, Jianwei Yang, Pengchuan Zhang, Anna Rohrbach, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Jianfeng Gao:
K-LITE: Learning Transferable Visual Models with External Knowledge. CoRR abs/2204.09222 (2022) - [i104]Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang:
GIT: A Generative Image-to-text Transformer for Vision and Language. CoRR abs/2205.14100 (2022) - [i103]Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang:
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling. CoRR abs/2206.07160 (2022) - [i102]Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, Jianfeng Wang, Linjie Li, Zicheng Liu, Ce Liu, Yann LeCun, Nanyun Peng, Jianfeng Gao, Lijuan Wang:
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone. CoRR abs/2206.07643 (2022) - [i101]Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan:
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis. CoRR abs/2207.09814 (2022) - [i100]Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu:
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling. CoRR abs/2209.01540 (2022) - [i99]Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan L. Boyd-Graber, Lijuan Wang:
Prompting GPT-3 To Be Reliable. CoRR abs/2210.09150 (2022) - [i98]Zhe Gan, Linjie Li, Chunyuan Li, Lijuan Wang, Zicheng Liu, Jianfeng Gao:
Vision-Language Pre-training: Basics, Recent Advances, and Future Trends. CoRR abs/2210.09263 (2022) - [i97]Jinghao Zhou, Li Dong, Zhe Gan, Lijuan Wang, Furu Wei:
Non-Contrastive Learning Meets Language-Image Pre-Training. CoRR abs/2210.09304 (2022) - [i96]Zixin Zhu, Yixuan Wei, Jianfeng Wang, Zhe Gan, Zheng Zhang, Le Wang, Gang Hua, Lijuan Wang, Zicheng Liu, Han Hu:
Exploring Discrete Diffusion Models for Image Captioning. CoRR abs/2211.11694 (2022) - [i95]Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang:
ReCo: Region-Controlled Text-to-Image Generation. CoRR abs/2211.15518 (2022) - [i94]Jialian Wu, Jianfeng Wang, Zhengyuan Yang, Zhe Gan, Zicheng Liu, Junsong Yuan, Lijuan Wang:
GRiT: A Generative Region-to-text Transformer for Object Understanding. CoRR abs/2212.00280 (2022) - [i93]Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, Jianfeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee, Jianfeng Gao:
Generalized Decoding for Pixel, Image, and Language. CoRR abs/2212.11270 (2022) - 2021
- [c83]Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu:
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding. AAAI 2021: 12776-12784 - [c82]Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Zhangyang Wang, Jingjing Liu:
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets. ACL/IJCNLP (1) 2021: 2195-2207 - [c81]Shuohang Wang, Luowei Zhou, Zhe Gan, Yen-Chun Chen, Yuwei Fang, Siqi Sun, Yu Cheng, Jingjing Liu:
Cluster-Former: Clustering-based Sparse Transformer for Question Answering. ACL/IJCNLP (Findings) 2021: 3958-3968 - [c80]Jie Lei, Linjie Li, Luowei Zhou, Zhe Gan, Tamara L. Berg, Mohit Bansal, Jingjing Liu:
Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling. CVPR 2021: 7331-7341 - [c79]Liqun Chen, Dong Wang, Zhe Gan, Jingjing Liu, Ricardo Henao, Lawrence Carin:
Wasserstein Contrastive Representation Distillation. CVPR 2021: 16296-16305 - [c78]Linjie Li, Jie Lei, Zhe Gan, Jingjing Liu:
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models. ICCV 2021: 2022-2031 - [c77]Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu:
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective. ICLR 2021 - [c76]Siyang Yuan, Pengyu Cheng, Ruiyi Zhang, Weituo Hao, Zhe Gan, Lawrence Carin:
Improving Zero-Shot Voice Style Transfer via Disentangled Representation Learning. ICLR 2021 - [c75]Shuyang Dai, Zhe Gan, Yu Cheng, Chenyang Tao, Lawrence Carin, Jingjing Liu:
APo-VAE: Text Generation in Hyperbolic Space. NAACL-HLT 2021: 416-431 - [c74]Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang:
Chasing Sparsity in Vision Transformers: An End-to-End Exploration. NeurIPS 2021: 19974-19988 - [c73]Tianlong Chen, Yu Cheng, Zhe Gan, Jingjing Liu, Zhangyang Wang:
Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective. NeurIPS 2021: 20941-20955 - [c72]Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Jingjing Liu, Zhangyang Wang:
The Elastic Lottery Ticket Hypothesis. NeurIPS 2021: 26609-26621 - [c71]Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Wang, William Yang Wang, Tamara L. Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu:
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation. NeurIPS Datasets and Benchmarks 2021 - [c70]Boxin Wang, Chejian Xu, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li:
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models. NeurIPS Datasets and Benchmarks 2021 - [c69]Chen Zhu, Yu Cheng, Zhe Gan, Furong Huang, Jingjing Liu, Tom Goldstein:
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients. ECML/PKDD (3) 2021: 628-643 - [c68]Wenhu Chen, Zhe Gan, Linjie Li, Yu Cheng, William Yang Wang, Jingjing Liu:
Meta Module Network for Compositional Visual Reasoning. WACV 2021: 655-664 - [i92]Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Zhangyang Wang, Jingjing Liu:
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets. CoRR abs/2101.00063 (2021) - [i91]Jie Lei, Linjie Li, Luowei Zhou, Zhe Gan, Tamara L. Berg, Mohit Bansal, Jingjing Liu:
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling. CoRR abs/2102.06183 (2021) - [i90]Tianlong Chen, Yu Cheng, Zhe Gan, Jingjing Liu, Zhangyang Wang:
Ultra-Data-Efficient GAN Training: Drawing A Lottery Ticket First, Then Training It Toughly. CoRR abs/2103.00397 (2021) - [i89]Siyang Yuan, Pengyu Cheng, Ruiyi Zhang, Weituo Hao, Zhe Gan, Lawrence Carin:
Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning. CoRR abs/2103.09420 (2021) - [i88]Tianlong Chen, Yu Cheng, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zhangyang Wang, Jingjing Liu:
Adversarial Feature Augmentation and Normalization for Visual Recognition. CoRR abs/2103.12171 (2021) - [i87]Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Jingjing Liu, Zhangyang Wang:
The Elastic Lottery Ticket Hypothesis. CoRR abs/2103.16547 (2021) - [i86]Luowei Zhou, Jingjing Liu, Yu Cheng, Zhe Gan, Lei Zhang:
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning. CoRR abs/2104.00285 (2021) - [i85]Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu:
Playing Lottery Tickets with Vision and Language. CoRR abs/2104.11832 (2021) - [i84]Linjie Li, Jie Lei, Zhe Gan, Jingjing Liu:
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models. CoRR abs/2106.00245 (2021) - [i83]Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang:
Chasing Sparsity in Vision Transformers: An End-to-End Exploration. CoRR abs/2106.04533 (2021) - [i82]Linjie Li, Jie Lei, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara Lee Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu:
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation. CoRR abs/2106.04632 (2021) - [i81]Junya Chen, Zhe Gan, Xuan Li, Qing Guo, Liqun Chen, Shuyang Gao, Tagyoung Chung, Yi Xu, Belinda Zeng, Wenlian Lu, Fan Li, Lawrence Carin, Chenyang Tao:
Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE. CoRR abs/2107.01152 (2021) - [i80]Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang:
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA. CoRR abs/2109.05014 (2021) - [i79]Zi-Yi Dou, Yichong Xu, Zhe Gan, Jianfeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael Zeng:
An Empirical Study of Training End-to-End Vision-and-Language Transformers. CoRR abs/2111.02387 (2021) - [i78]Boxin Wang, Chejian Xu, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li:
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models. CoRR abs/2111.02840 (2021) - [i77]Jianfeng Wang, Xiaowei Hu, Zhe Gan, Zhengyuan Yang, Xiyang Dai, Zicheng Liu, Yumao Lu, Lijuan Wang:
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning. CoRR abs/2111.10023 (2021) - [i76]Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang:
Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling. CoRR abs/2111.12085 (2021) - [i75]Xiaowei Hu, Zhe Gan, Jianfeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang:
Scaling Up Vision-Language Pre-training for Image Captioning. CoRR abs/2111.12233 (2021) - [i74]Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu:
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling. CoRR abs/2111.12681 (2021) - [i73]Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang:
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning. CoRR abs/2111.13196 (2021) - [i72]Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, Lijuan Wang:
MLP Architectures for Vision-and-Language Modeling: An Empirical Study. CoRR abs/2112.04453 (2021) - [i71]Zhiyuan Fang, Jianfeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu:
Injecting Semantic Concepts into End-to-End Image Captioning. CoRR abs/2112.05230 (2021) - 2020
- [c67]Wenlin Wang, Hongteng Xu, Zhe Gan, Bai Li, Guoyin Wang, Liqun Chen, Qian Yang, Wenqi Wang, Lawrence Carin:
Graph-Driven Generative Models for Heterogeneous Multi-Task Learning. AAAI 2020: 979-988 - [c66]Junjie Hu, Yu Cheng, Zhe Gan, Jingjing Liu, Jianfeng Gao, Graham Neubig:
What Makes A Good Story? Designing Composite Rewards for Visual Storytelling. AAAI 2020: 7969-7976 - [c65]Shuyang Dai, Yu Cheng, Yizhe Zhang, Zhe Gan, Jingjing Liu, Lawrence Carin:
Contrastively Smoothed Class Alignment for Unsupervised Domain Adaptation. ACCV (4) 2020: 268-283 - [c64]Yi Wei, Zhe Gan, Wenbo Li, Siwei Lyu, Ming-Ching Chang, Lei Zhang, Jianfeng Gao, Pengchuan Zhang:
MagGAN: High-Resolution Face Attribute Editing with Mask-Guided Generative Adversarial Network. ACCV (4) 2020: 661-678 - [c63]Ruiyi Zhang, Changyou Chen, Zhe Gan, Wenlin Wang, Dinghan Shen, Guoyin Wang, Zheng Wen, Lawrence Carin:
Improving Adversarial Text Generation by Modeling the Distant Future. ACL 2020: 2516-2531 - [c62]