


default search action
Zhaokai Wang
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[j4]Zhaokai Wang
, Xizhou Zhu, Xue Yang
, Gen Luo, Hao Li, Changyao Tian
, Wenhan Dou
, Junqi Ge, Lewei Lu
, Yu Qiao
, Jifeng Dai
:
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding. IEEE Trans. Pattern Anal. Mach. Intell. 47(11): 10142-10159 (2025)
[c14]Xueyu Hu, Tao Xiong, Biao Yi, Zishu Wei, Ruixuan Xiao, Yurun Chen, Jiasheng Ye, Meiling Tao, Xiangxin Zhou, Ziyu Zhao, Yuhuai Li, Shengze Xu, Shenzhi Wang, Xinchen Xu, Shuofei Qiao, Zhaokai Wang, Kun Kuang, Tieyong Zeng, Liang Wang, Jiwei Li, Yuchen Eleanor Jiang, Wangchunshu Zhou, Guoyin Wang, Keting Yin, Zhou Zhao, Hongxia Yang, Fan Wu, Shengyu Zhang, Fei Wu:
OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use. ACL (1) 2025: 7436-7465
[c13]Gen Luo, Xue Yang, Wenhan Dou, Zhaokai Wang, Jiawen Liu, Jifeng Dai, Yu Qiao, Xizhou Zhu:
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training. CVPR 2025: 24960-24971
[c12]Hao Li, Changyao Tian, Jie Shao, Xizhou Zhu, Zhaokai Wang, Jinguo Zhu, Wenhan Dou, Xiao-Gang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai:
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding. CVPR 2025: 29767-29779
[c11]Yihong Tang, Ao Qu, Zhaokai Wang, Dingyi Zhuang, Zhaofeng Wu, Wei Ma, Shenhao Wang, Yunhan Zheng, Zhan Zhao, Jinhua Zhao:
Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Spatial Reasoning. EMNLP (Findings) 2025: 4083-4103
[c10]Zhaokai Wang, Chenxi Bao, Le Zhuo, Jingrui Han, Yang Yue, Yihong Tang, Victor Shea-Jay Huang, Yue Liao:
A Survey on Vision-to-Music Generation: Methods, Datasets, Evaluation, and Challenges. ISMIR 2025: 223-234
[i19]Zhaokai Wang, Xizhou Zhu, Xue Yang, Gen Luo, Hao Li, Changyao Tian, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai:
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding. CoRR abs/2501.07783 (2025)
[i18]Victor Shea-Jay Huang, Le Zhuo, Yi Xin, Zhaokai Wang, Peng Gao, Hongsheng Li:
TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation. CoRR abs/2503.07050 (2025)
[i17]Zhaokai Wang, Chenxi Bao, Le Zhuo, Jingrui Han
, Yang Yue, Yihong Tang, Victor Shea-Jay Huang, Yue Liao:
Vision-to-Music Generation: A Survey. CoRR abs/2503.21254 (2025)
[i16]Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang:
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? CoRR abs/2504.13837 (2025)
[i15]Gen Luo, Wenhan Dou, Wenhao Li, Zhaokai Wang, Xue Yang, Changyao Tian, Hao Li, Weiyun Wang, Wenhai Wang, Xizhou Zhu, Yu Qiao, Jifeng Dai:
Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models. CoRR abs/2507.12566 (2025)
[i14]Xueyu Hu, Tao Xiong, Biao Yi, Zishu Wei, Ruixuan Xiao, Yurun Chen, Jiasheng Ye, Meiling Tao, Xiangxin Zhou, Ziyu Zhao, Yuhuai Li, Shengze Xu, Shenzhi Wang, Xinchen Xu, Shuofei Qiao, Zhaokai Wang, Kun Kuang, Tieyong Zeng, Liang Wang, Jiwei Li, Yuchen Eleanor Jiang, Wangchunshu Zhou, Guoyin Wang, Keting Yin, Zhou Zhao, Hongxia Yang, Fan Wu, Shengyu Zhang, Fei Wu:
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use. CoRR abs/2508.04482 (2025)
[i13]Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, Zhaokai Wang, Zhe Chen, Hongjie Zhang, Ganlin Yang, Haomin Wang
, Qi Wei, Jinhui Yin, Wenhao Li, Erfei Cui, Guanzhou Chen, Zichen Ding, Changyao Tian, Zhenyu Wu, JingJing Xie, Zehao Li, Bowen Yang, Yuchen Duan, Xuehui Wang, Zhi Hou, Haoran Hao, Tianyi Zhang
, Songze Li, Xiangyu Zhao, Haodong Duan, Nianchen Deng, Bin Fu, Yinan He, Yi Wang, Conghui He, Botian Shi, Junjun He, Yingtong Xiong, Han Lv, Lijun Wu, Wenqi Shao, Kaipeng Zhang
, Huipeng Deng, Biqing Qi, Jiaye Ge, Qipeng Guo, Wenwei Zhang, Songyang Zhang, Maosong Cao, Junyao Lin, Kexian Tang, Jianfei Gao, Haian Huang, Yuzhe Gu, Chengqi Lyu, Huanze Tang, Rui Wang, Haijun Lv, Wanli Ouyang, Limin Wang, Min Dou, Xizhou Zhu, Tong Lu, Dahua Lin, Jifeng Dai, Weijie Su, Bowen Zhou, Kai Chen, Yu Qiao, Wenhai Wang, Gen Luo:
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency. CoRR abs/2508.18265 (2025)
[i12]Zhaokai Wang, Penghao Yin, Xiangyu Zhao, Changyao Tian, Yu Qiao, Wenhai Wang, Jifeng Dai, Gen Luo:
GenExam: A Multidisciplinary Text-to-Image Exam. CoRR abs/2509.14232 (2025)
[i11]Zhenxin Lei, Zhangwei Gao, Changyao Tian, Erfei Cui, Guanzhou Chen, Danni Yang, Yuchen Duan, Zhaokai Wang, Wenhao Li, Weiyun Wang, Xiangyu Zhao, Jiayi Ji, Yu Qiao, Wenhai Wang, Gen Luo:
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites. CoRR abs/2510.12126 (2025)- 2024
[j3]Feixiang Tang
, Yifei Ji
, Yongsheng Zhang, Zhen Dong
, Zhaokai Wang, Qingjun Zhang
, Bingji Zhao
, Heli Gao:
Drifting Ionospheric Scintillation Simulation for L-Band Geosynchronous SAR. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 17: 842-854 (2024)
[c9]Hao Li, Xue Yang, Zhaokai Wang, Xizhou Zhu, Jie Zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li
, Lewei Lu, Jifeng Dai:
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft. CVPR 2024: 16426-16435
[c8]Yihong Tang
, Zhaokai Wang, Ao Qu, Yihao Yan, Zhaofeng Wu, Dingyi Zhuang, Jushi Kai, Kebing Hou, Xiaotong Guo, Jinhua Zhao, Zhan Zhao
, Wei Ma:
ItiNera: Integrating Spatial Optimization with Large Language Models for Open-domain Urban Itinerary Planning. EMNLP (Industry Track) 2024: 1413-1432
[c7]Xizhou Zhu, Xue Yang, Zhaokai Wang, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai:
Parameter-Inverted Image Pyramid Networks. NeurIPS 2024
[i10]Yihong Tang, Zhaokai Wang, Ao Qu, Yihao Yan, Kebing Hou, Dingyi Zhuang, Xiaotong Guo, Jinhua Zhao, Zhan Zhao, Wei Ma:
Synergizing Spatial Optimization with Large Language Models for Open-Domain Urban Itinerary Planning. CoRR abs/2402.07204 (2024)
[i9]Xizhou Zhu, Xue Yang, Zhaokai Wang, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai:
Parameter-Inverted Image Pyramid Networks. CoRR abs/2406.04330 (2024)
[i8]Gen Luo, Xue Yang, Wenhan Dou, Zhaokai Wang, Jifeng Dai, Yu Qiao, Xizhou Zhu:
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training. CoRR abs/2410.08202 (2024)
[i7]Yihong Tang, Ao Qu, Zhaokai Wang, Dingyi Zhuang, Zhaofeng Wu, Wei Ma, Shenhao Wang, Yunhan Zheng, Zhan Zhao, Jinhua Zhao:
Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning. CoRR abs/2410.16162 (2024)
[i6]Baisen Wang, Le Zhuo, Zhaokai Wang, Chenxi Bao, Wu Chengjing, Xuecheng Nie, Jiao Dai, Jizhong Han
, Yue Liao, Si Liu:
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation. CoRR abs/2412.09428 (2024)
[i5]Hao Li, Changyao Tian, Jie Shao, Xizhou Zhu, Zhaokai Wang, Jinguo Zhu, Wenhan Dou, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai:
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding. CoRR abs/2412.09604 (2024)- 2023
[c6]Le Zhuo, Zhaokai Wang, Baisen Wang
, Yue Liao, Chenxi Bao, Stanley Peng, Songhao Han, Aixi Zhang, Fei Fang, Si Liu:
Video Background Music Generation: Dataset, Method and Evaluation. ICCV 2023: 15591-15601
[i4]Hao Li, Xue Yang, Zhaokai Wang, Xizhou Zhu, Jie Zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai:
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft. CoRR abs/2312.09238 (2023)- 2022
[i3]Le Zhuo, Zhaokai Wang, Baisen Wang, Yue Liao, Stanley Peng, Chenxi Bao, Miao Lu, Xiaobo Li, Si Liu:
Video Background Music Generation: Dataset, Method and Evaluation. CoRR abs/2211.11248 (2022)- 2021
[c5]Zhaokai Wang, Renda Bao, Qi Wu, Si Liu:
Confidence-aware Non-repetitive Multimodal Transformers for TextCaps. AAAI 2021: 2835-2843
[c4]Shangzhe Di, Zeren Jiang, Si Liu, Zhaokai Wang, Leyan Zhu, Zexin He
, Hongming Liu
, Shuicheng Yan:
Video Background Music Generation with Controllable Music Transformer. ACM Multimedia 2021: 2037-2045
[i2]Shangzhe Di, Zeren Jiang, Si Liu, Zhaokai Wang, Leyan Zhu, Zexin He, Hongming Liu, Shuicheng Yan:
Video Background Music Generation with Controllable Music Transformer. CoRR abs/2111.08380 (2021)- 2020
[i1]Zhaokai Wang, Renda Bao, Qi Wu, Si Liu:
Confidence-aware Non-repetitive Multimodal Transformers for TextCaps. CoRR abs/2012.03662 (2020)
2010 – 2019
- 2019
[j2]Shubin Su
, Limin Xiao, Li Ruan, Fei Gu, Shupan Li
, Zhaokai Wang, Rongbin Xu:
An Efficient Density-Based Local Outlier Detection Approach for Scattered Data. IEEE Access 7: 1006-1020 (2019)
[j1]Baicheng Yan, Limin Xiao, Hang Zhang, Daliang Xu, Li Ruan, Zhaokai Wang, Yiyang Zhang:
An adaptive template matching-based single object tracking algorithm with parallel acceleration. J. Vis. Commun. Image Represent. 64 (2019)
[c3]Baicheng Yan, Yi Zhou, Limin Xiao, Jiantong Huo, Zhaokai Wang:
LogGOPSC: A Parallel Computation Model Extending Network Contention into LogGOPS. CLUSTER 2019: 1-2
[c2]Zhaokai Wang, Limin Xiao, Rongbin Xu, Shubin Su, Shupan Li, Yao Song:
Deeper Monocular Depth Prediction via Long and Short Skip Connection. IJCNN 2019: 1-7
[c1]Shubin Su, Limin Xiao, Li Ruan, Rongbin Xu, Shupan Li, Zhaokai Wang, Qigong He, Wei Li:
ADCMO: An Anomaly Detection Approach Based on Local Outlier Factor for Continuously Monitored Object. ISPA/BDCloud/SocialCom/SustainCom 2019: 865-874
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-02-20 22:38 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







