


default search action
Yu Qiao 0001
Person information
- affiliation: Shanghai AI Laboratory, OpenGVLab, China
- affiliation: Chinese Academy of Sciences, Shenzhen Institutes of Advanced Technology, China
- affiliation (former): University of Tokyo, Graduate School of Information Science and Technology, Japan
- affiliation (PhD 2006): University of Electro-Communications, Tokyo, Japan
Other persons with the same name
- Yu Qiao — disambiguation page
- Yu Qiao 0002 — Biomedical Imaging Lab, Singapore
- Yu Qiao 0003
— Shanghai Jiao Tong University, Department of Automation, Institute of Image Processing and Pattern Recognition, China (and 1 more)
- Yu Qiao 0004
— Kyung Hee University, School of Computing, Department of Artificial Intelligence, Yongin, South Korea (and 1 more)
- Yu Qiao 0005 — RWTH Aachen University, Germany
- Yu Qiao 0006
— Nanjing University, National Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, China
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2025
- [j116]Yu Qiao, Xiaohui Yang
, Jing Wang, Tongzhen Si, Qingbei Guo:
Driver Cognitive Distraction Detection based on eye movement behavior and integration of multi-view space-channel feature. Expert Syst. Appl. 266: 125975 (2025) - [j115]Yaohui Wang, Xin Ma, Xinyuan Chen, Cunjian Chen, Antitza Dantcheva, Bo Dai, Yu Qiao:
LEO: Generative Latent Image Animator for Human Video Synthesis. Int. J. Comput. Vis. 133(3): 1277-1289 (2025) - [j114]Ziyan Huang, Zhongying Deng, Jin Ye, Haoyu Wang, Yanzhou Su, Tianbin Li, Hui Sun, Junlong Cheng, Jianpin Chen, Junjun He, Yun Gu, Shaoting Zhang, Lixu Gu, Yu Qiao:
A-Eval: A benchmark for cross-dataset and cross-modality evaluation of abdominal multi-organ segmentation. Medical Image Anal. 101: 103499 (2025) - [j113]Peng Xu
, Wenqi Shao
, Kaipeng Zhang, Peng Gao
, Shuo Liu, Meng Lei, Fanqing Meng, Siyuan Huang, Yu Qiao
, Ping Luo
:
LVLM-EHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models. IEEE Trans. Pattern Anal. Mach. Intell. 47(3): 1877-1893 (2025) - [j112]Zhiqi Li
, Wenhai Wang
, Hongyang Li
, Enze Xie
, Chonghao Sima
, Tong Lu
, Yu Qiao, Jifeng Dai
:
BEVFormer: Learning Bird's-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers. IEEE Trans. Pattern Anal. Mach. Intell. 47(3): 2020-2036 (2025) - [j111]Boyu Chen
, Siran Chen
, Kunchang Li, Qinglin Xu, Yu Qiao, Yali Wang:
Percept, Chat, Adapt: Knowledge transfer of foundation models for open-world video recognition. Pattern Recognit. 160: 111189 (2025) - [j110]Qingsong Zhao
, Yi Wang
, Yinan He, Yu Qiao
, Cairong Zhao
:
Learning Discriminative Representations in Videos via Active Embedding Distance Correlation. IEEE Signal Process. Lett. 32: 56-60 (2025) - [j109]Hao Zhang
, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Nanning Zheng
, Kaipeng Zhang
:
B-AVIBench: Toward Evaluating the Robustness of Large Vision-Language Model on Black-Box Adversarial Visual-Instructions. IEEE Trans. Inf. Forensics Secur. 20: 1434-1446 (2025) - [i444]Xinhao Li, Yi Wang, Jiashuo Yu, Xiangyu Zeng, Yuhan Zhu, Haian Huang, Jianfei Gao, Kunchang Li, Yinan He, Chenting Wang, Yu Qiao, Yali Wang, Limin Wang:
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling. CoRR abs/2501.00574 (2025) - [i443]Jiakang Yuan, Xiangchao Yan, Botian Shi, Tao Chen, Wanli Ouyang, Bo Zhang, Lei Bai, Yu Qiao, Bowen Zhou:
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback. CoRR abs/2501.03916 (2025) - [i442]Siran Chen, Yuxiao Luo, Yue Ma, Yu Qiao, Yali Wang:
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving. CoRR abs/2501.04302 (2025) - [i441]Zhaokai Wang, Xizhou Zhu, Xue Yang, Gen Luo, Hao Li, Changyao Tian, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai:
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding. CoRR abs/2501.07783 (2025) - [i440]Weichen Fan, Chenyang Si, Junhao Song, Zhenyu Yang, Yinan He, Long Zhuo, Ziqi Huang, Ziyue Dong, Jingwen He, Dongwei Pan, Yi Wang, Yuming Jiang, Yaohui Wang, Peng Gao, Xinyuan Chen, Hengjie Li, Dahua Lin, Yu Qiao, Ziwei Liu:
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models. CoRR abs/2501.08453 (2025) - [i439]Chenyang Si, Weichen Fan, Zhengyao Lv, Ziqi Huang, Yu Qiao, Ziwei Liu:
RepVideo: Rethinking Cross-Layer Representation for Video Generation. CoRR abs/2501.08994 (2025) - [i438]Xiaohui Li, Yihao Liu, Shuo Cao, Ziyan Chen, Shaobin Zhuang, Xiangyu Chen, Yinan He, Yi Wang, Yu Qiao:
DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency. CoRR abs/2501.10110 (2025) - [i437]Yi Wang, Xinhao Li, Ziang Yan, Yinan He, Jiashuo Yu, Xiangyu Zeng, Chenting Wang, Changlian Ma, Haian Huang, Jianfei Gao, Min Dou, Kai Chen, Wenhai Wang, Yu Qiao, Yali Wang, Limin Wang:
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling. CoRR abs/2501.12386 (2025) - [i436]Jia Yu, Fei Yuan, Rui Min, Jing Yu, Pei Chu, Jiayang Li, Wei Li, Ruijie Zhang, Zhenxiang Li, Zhifei Ren, Dong Zheng, Wenjian Zhang, Yan Teng, Lingyu Meng, Zhenjiang Jin, Jiantao Qiu, ShaSha Wang, Zhongying Tu, Dahua Lin, Yu Wang, Yu Qiao, Yanfeng Wang, Conghui He:
WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages. CoRR abs/2501.14506 (2025) - [i435]Dongyang Liu, Shicheng Li, Yutong Liu, Zhen Li, Kai Wang, Xinyue Li, Qi Qin, Yufei Liu, Yi Xin, Zhongyu Li, Bin Fu, Chenyang Si, Yuewen Cao, Conghui He, Ziwei Liu, Yu Qiao, Qibin Hou, Hongsheng Li, Peng Gao:
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT. CoRR abs/2502.06782 (2025) - [i434]Daocheng Fu, Naiting Zhong, Xu Han, Pinlong Cai, Licheng Wen, Song Mao, Botian Shi, Yu Qiao:
LimSim Series: An Autonomous Driving Simulation Platform for Validation and Enhancement. CoRR abs/2502.09170 (2025) - [i433]Haochen Xue, Feilong Tang, Ming Hu, Yexin Liu, Qidong Huang, Yulong Li, Chengzhi Liu, Zhongxing Xu, Chong Zhang, Chun-Mei Feng, Yutong Xie, Imran Razzak, Zongyuan Ge, Jionglong Su, Junjun He, Yu Qiao:
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation. CoRR abs/2502.11903 (2025) - 2024
- [j108]Yihao Liu
, Hengyuan Zhao, Kelvin C. K. Chan, Xintao Wang, Chen Change Loy, Yu Qiao, Chao Dong:
Temporally consistent video colorization with deep feature propagation and self-regularization learning. Comput. Vis. Media 10(2): 375-395 (2024) - [j107]Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li
, Yu Qiao:
CLIP-Adapter: Better Vision-Language Models with Feature Adapters. Int. J. Comput. Vis. 132(2): 581-595 (2024) - [j106]Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang:
MixStyle Neural Networks for Domain Generalization and Adaptation. Int. J. Comput. Vis. 132(3): 822-836 (2024) - [j105]Peng Gao, Ziyi Lin, Renrui Zhang, Rongyao Fang, Hongyang Li, Hongsheng Li
, Yu Qiao:
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking. Int. J. Comput. Vis. 132(5): 1546-1556 (2024) - [j104]Haibin He, Xinyuan Chen, Chaoyue Wang, Juhua Liu
, Bo Du, Dacheng Tao, Yu Qiao:
Diff-Font: Diffusion Model for Robust One-Shot Font Generation. Int. J. Comput. Vis. 132(11): 5372-5386 (2024) - [j103]Hao Zhang
, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo, Yu Qiao, Kaipeng Zhang:
Open-Vocabulary Animal Keypoint Detection with Semantic-Feature Matching. Int. J. Comput. Vis. 132(12): 5741-5758 (2024) - [j102]Yuhui Wang, Yahan Xie, Yu Qiao, Zhaohui Xia
, Yanying Chen:
Chinese CSUQ: Cross-Cultural Adaptation and Evaluation of Measurement Properties. Int. J. Hum. Comput. Interact. 40(22): 7623-7641 (2024) - [j101]Yi Liu, Yu Qiao, Yali Wang:
F2S-Net: learning frame-to-segment prediction for online action detection. J. Real Time Image Process. 21(3): 73 (2024) - [j100]Hongyang Li
, Chonghao Sima
, Jifeng Dai
, Wenhai Wang
, Lewei Lu
, Huijie Wang
, Jia Zeng
, Zhiqi Li
, Jiazhi Yang
, Hanming Deng
, Hao Tian
, Enze Xie
, Jiangwei Xie
, Li Chen
, Tianyu Li
, Yang Li
, Yulu Gao
, Xiaosong Jia
, Si Liu
, Jianping Shi
, Dahua Lin
, Yu Qiao
:
Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe. IEEE Trans. Pattern Anal. Mach. Intell. 46(4): 2151-2170 (2024) - [j99]Yuexin Ma
, Tai Wang
, Xuyang Bai
, Huitong Yang, Yuenan Hou
, Yaming Wang, Yu Qiao
, Ruigang Yang
, Xinge Zhu
:
Vision-Centric BEV Perception: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 46(12): 10978-10997 (2024) - [j98]Yingqi Liu
, Jingwen He, Yihao Liu
, Xinqi Lin, Fanghua Yu, Jinfan Hu
, Yu Qiao, Chao Dong:
AdaptBIR: Adaptive Blind Image Restoration with latent diffusion prior for higher fidelity. Pattern Recognit. 155: 110659 (2024) - [j97]Mingfei Han, Yali Wang
, Mingjie Li
, Xiaojun Chang
, Yi Yang, Yu Qiao
:
Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection. IEEE Trans. Image Process. 33: 1560-1573 (2024) - [j96]Siran Chen
, Qinglin Xu
, Yue Ma
, Yu Qiao
, Yali Wang
:
Attentive Snippet Prompting for Video Retrieval. IEEE Trans. Multim. 26: 4348-4359 (2024) - [j95]Yuer Ma
, Yi Liu
, Limin Wang
, Wenxiong Kang
, Yu Qiao
, Yali Wang
:
Dual Masked Modeling for Weakly-Supervised Temporal Boundary Discovery. IEEE Trans. Multim. 26: 5694-5704 (2024) - [j94]Mingye Xu
, Zhipeng Zhou
, Hongbin Xu
, Yu Qiao
, Yali Wang
:
CP-Net: Contour-Perturbed Reconstruction Network for Self-Supervised Point Cloud Learning. IEEE Trans. Multim. 26: 8799-8810 (2024) - [j93]Zhangwei Gao, Zhe Chen, Erfei Cui, Yiming Ren, Weiyun Wang, Jinguo Zhu, Hao Tian, Shenglong Ye, Junjun He, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Jifeng Dai, Wenhai Wang
:
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance. Vis. Intell. 2(1): 32 (2024) - [c384]Siran Chen, Yue Ma, Yu Qiao, Yali Wang:
M-BEV: Masked BEV Perception for Robust Autonomous Driving. AAAI 2024: 1183-1191 - [c383]Ziteng Cui, Lin Gu, Xiao Sun, Xianzheng Ma, Yu Qiao, Tatsuya Harada:
Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption. AAAI 2024: 1435-1444 - [c382]Bo Peng, Xinyuan Chen, Yaohui Wang, Chaochao Lu, Yu Qiao:
ConditionVideo: Training-Free Condition-Guided Video Generation. AAAI 2024: 4459-4467 - [c381]Wenshuo Peng, Kaipeng Zhang, Yue Yang, Hao Zhang
, Yu Qiao:
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification. AAAI 2024: 4506-4514 - [c380]Shilin Yan, Renrui Zhang, Ziyu Guo, Wenchao Chen, Wei Zhang, Hongyang Li, Yu Qiao, Hao Dong, Zhongjiang He, Peng Gao:
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation. AAAI 2024: 6449-6457 - [c379]Lingjun Zhang, Xinyuan Chen, Yaohui Wang, Yue Lu, Yu Qiao:
Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model. AAAI 2024: 7215-7223 - [c378]Yuanfu Wang, Chao Yang, Ying Wen, Yu Liu, Yu Qiao:
Critic-Guided Decision Transformer for Offline Reinforcement Learning. AAAI 2024: 15706-15714 - [c377]Yan Ma, Yu Qiao, Pengfei Liu:
MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation. ACL (1) 2024: 2135-2169 - [c376]Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, Wangmeng Zuo, Dahua Lin, Yu Qiao, Jing Shao:
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models. ACL (Findings) 2024: 3923-3954 - [c375]Chen Qian, Jie Zhang, Wei Yao, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu, Jing Shao:
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models. ACL (Findings) 2024: 4864-4888 - [c374]Guoxin Chen, Kexin Tang, Chao Yang, Fuying Ye, Yu Qiao, Yiming Qian:
SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning. ACL (1) 2024: 5901-5921 - [c373]Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo:
ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning. ACL (Findings) 2024: 7775-7803 - [c372]Zhanhui Zhou, Jie Liu, Jing Shao, Xiangyu Yue, Chao Yang, Wanli Ouyang, Yu Qiao:
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization. ACL (Findings) 2024: 10586-10613 - [c371]Fangzhi Xu, Zhiyong Wu, Qiushi Sun, Siyu Ren, Fei Yuan, Shuai Yuan, Qika Lin, Yu Qiao, Jun Liu:
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models. ACL (1) 2024: 13091-13116 - [c370]Zaibin Zhang, Yongting Zhang, Lijun Li, Jing Shao, Hongzhi Gao, Yu Qiao, Lijun Wang, Huchuan Lu, Feng Zhao:
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety. ACL (1) 2024: 15202-15231 - [c369]Zhanhui Zhou, Jie Liu, Zhichen Dong, Jiaheng Liu, Chao Yang, Wanli Ouyang, Yu Qiao:
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire! ACL (1) 2024: 15810-15830 - [c368]Yuan Xu, Xiaoxuan Ma, Jiajun Su, Wentao Zhu, Yu Qiao, Yizhou Wang:
ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring. CVPR 2024: 979-989 - [c367]Xiaoliang Ju, Zhaoyang Huang, Yijiin Li, Guofeng Zhang, Yu Qiao, Hongsheng Li
:
DiffInDScene: Diffusion-Based High-Quality 3D Indoor Scene Generation. CVPR 2024: 4526-4535 - [c366]Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao:
Point Transformer V3: Simpler, Faster, Stronger. CVPR 2024: 4840-4851 - [c365]Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li
, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai:
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications. CVPR 2024: 5652-5661 - [c364]Ziyan Chen, Jingwen He, Xinqi Lin, Yu Qiao, Chao Dong:
Towards Real-world Video Face Restoration: A New Benchmark. CVPR Workshops 2024: 5929-5939 - [c363]Lirui Zhao, Yue Yang, Kaipeng Zhang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Rongrong Ji
:
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model. CVPR 2024: 6390-6399 - [c362]Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu:
VideoBooth: Diffusion-based Video Generation with Image Prompts. CVPR 2024: 6689-6700 - [c361]Bin Fu, Fanghua Yu, Anran Liu, Zixuan Wang, Jie Wen, Junjun He, Yu Qiao:
Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models. CVPR 2024: 6892-6901 - [c360]Shaobin Zhuang, Kunchang Li, Xinyuan Chen, Yaohui Wang, Ziwei Liu, Yu Qiao, Yali Wang:
Vlogger: Make Your Dream A Vlog. CVPR 2024: 8806-8817 - [c359]Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, Lu Sheng:
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion. CVPR 2024: 9784-9794 - [c358]Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao:
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction. CVPR 2024: 14089-14099 - [c357]Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li:
Generalized Predictive Model for Autonomous Driving. CVPR 2024: 14662-14672 - [c356]Yiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing Shao:
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception. CVPR 2024: 16307-16316 - [c355]Hao Li, Xue Yang, Zhaokai Wang, Xizhou Zhu, Jie Zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li
, Lewei Lu, Jifeng Dai:
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft. CVPR 2024: 16426-16435 - [c354]Yi Yu, Xue Yang, Qingyun Li, Feipeng Da, Jifeng Dai, Yu Qiao, Junchi Yan:
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-End Oriented Object Detection with Single Point Supervision. CVPR 2024: 16783-16793 - [c353]Zhiyu Zhao, Bingkun Huang, Sen Xing, Gangshan Wu, Yu Qiao, Limin Wang:
Asymmetric Masked Distillation for Pre-Training Small Foundation Models. CVPR 2024: 18516-18526 - [c352]Hao Wu, Huabin Liu, Yu Qiao, Xiao Sun:
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement. CVPR 2024: 18699-18708 - [c351]Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu:
VBench: Comprehensive Benchmark Suite for Video Generative Models. CVPR 2024: 21807-21818 - [c350]Yifei Huang, Guo Chen, Jilan Xu, Mingfang Zhang, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao:
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World. CVPR 2024: 22072-22086 - [c349]Yutao Hu, Tianbin Li, Quanfeng Lu, Wenqi Shao, Junjun He, Yu Qiao, Ping Luo:
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM. CVPR 2024: 22170-22183 - [c348]Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Lou, Limin Wang, Yu Qiao:
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark. CVPR 2024: 22195-22206 - [c347]Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai:
Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks. CVPR 2024: 24185-24198 - [c346]Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, Chao Dong:
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild. CVPR 2024: 25669-25680 - [c345]Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C. Kot, Bihan Wen:
SinSR: Diffusion-Based Image Super-Resolution in a Single Step. CVPR 2024: 25796-25805 - [c344]Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue:
OneLLM: One Framework to Align All Modalities with Language. CVPR 2024: 26574-26585 - [c343]Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao:
Language-aware Visual Semantic Distillation for Video Question Answering. CVPR 2024: 27103-27113 - [c342]Ziyi Lin, Dongyang Liu, Renrui Zhang, Peng Gao, Longtian Qiu, Han Xiao, Han Qiu, Wenqi Shao, Keqin Chen, Jiaming Han, Siyuan Huang, Yichi Zhang, Xuming He, Yu Qiao, Hongsheng Li
:
SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models. ECCV (62) 2024: 36-55 - [c341]Yuchen Yang
, Yu Qiao
, Xiao Sun
:
Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation. ECCV (44) 2024: 38-55 - [c340]Shuo Cao, Yihao Liu, Wenlong Zhang, Yu Qiao, Chao Dong:
GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity. ECCV (70) 2024: 70-87 - [c339]Xiangyu Chen, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, Chao Dong:
A Comparative Study of Image Restoration Networks for General Backbone Network Design. ECCV (71) 2024: 74-91 - [c338]Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang:
ControlLLM: Augment Language Models with Tools by Searching on Graphs. ECCV (12) 2024: 89-105 - [c337]Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li:
Embodied Understanding of Driving Scenarios. ECCV (62) 2024: 129-148 - [c336]Gang Li, Wenhai Wang, Xiang Li, Ziheng Li, Jian Yang, Jifeng Dai, Yu Qiao, Shanshan Zhang:
Distilling Knowledge from Large-Scale Image Models for Object Detection. ECCV (84) 2024: 142-160 - [c335]Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Yu Qiao, Peng Gao, Hongsheng Li
:
MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? ECCV (8) 2024: 169-186 - [c334]Jiakang Yuan
, Bo Zhang
, Kaixiong Gong, Xiangyu Yue, Botian Shi, Yu Qiao
, Tao Chen
:
Reg-TTA3D: Better Regression Makes Better Test-Time Adaptive 3D Object Detection. ECCV (43) 2024: 197-213 - [c333]Kunchang Li
, Xinhao Li
, Yi Wang, Yinan He
, Yali Wang
, Limin Wang
, Yu Qiao
:
VideoMamba: State Space Model for Efficient Video Understanding. ECCV (26) 2024: 237-255 - [c332]Zhihang Zhong, Gurunandan Krishnan, Xiao Sun, Yu Qiao, Sizhuo Ma, Jian Wang:
Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation. ECCV (33) 2024: 346-363 - [c331]Xin Liu, Yichen Zhu, Jindong Gu, Yunshi Lan, Chao Yang, Yu Qiao:
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models. ECCV (56) 2024: 386-403 - [c330]Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Jilan Xu, Hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang:
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding. ECCV (85) 2024: 396-416 - [c329]Xinqi Lin
, Jingwen He
, Ziyan Chen
, Zhaoyang Lyu
, Bo Dai
, Fanghua Yu
, Yu Qiao
, Wanli Ouyang
, Chao Dong
:
DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior. ECCV (59) 2024: 430-448 - [c328]Weiyun Wang, Yiming Ren, Haowen Luo, Tiantong Li, Chenxiang Yan, Zhe Chen, Wenhai Wang, Qingyun Li, Lewei Lu, Xizhou Zhu, Yu Qiao, Jifeng Dai:
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World. ECCV (33) 2024: 471-490 - [c327]Yutong Chen, Yifan Zhan, Zhihang Zhong, Wei Wang, Xiao Sun, Yu Qiao, Yinqiang Zheng:
Within the Dynamic Context: Inertia-Aware 3D Human Modeling with Pose Sequence. ECCV (49) 2024: 491-508 - [c326]Zhaoxun Ju, Chao Yang, Fuchun Sun, Hongbo Wang, Yu Qiao:
Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning. ICAPS 2024: 301-309 - [c325]Yue Yang, Kaipeng Zhang, Yuying Ge, Wenqi Shao, Zeyue Xue, Yu Qiao, Ping Luo:
Align, Adapt and Inject: Audio-Guided Image Generation, Editing and Stylization. ICASSP 2024: 3475-3479 - [c324]Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai:
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning. ICLR 2024 - [c323]Bo Zhang, Xinyu Cai, Jiakang Yuan, Donglin Yang, Jianfei Guo, Xiangchao Yan, Renqiu Xia, Botian Shi, Min Dou, Tao Chen, Si Liu, Junchi Yan, Yu Qiao:
ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation. ICLR 2024 - [c322]Xinyuan Chen, Yaohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Yu, Yali Wang, Dahua Lin, Yu Qiao, Ziwei Liu:
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction. ICLR 2024 - [c321]Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, Ping Luo:
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models. ICLR 2024 - [c320]Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, Ping Luo:
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models. ICLR 2024 - [c319]Weigao Sun, Zhen Qin, Weixuan Sun, Shidi Li, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong:
CO2: Efficient Distributed Training with Full Communication-Computation Overlap. ICLR 2024 - [c318]Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, Linjie Xing, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao:
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World. ICLR 2024 - [c317]Yi Wang, Yinan He, Yizhuo Li, Kunchang Li, Jiashuo Yu, Xin Ma, Xinhao Li, Guo Chen, Xinyuan Chen, Yaohui Wang, Ping Luo, Ziwei Liu, Yali Wang, Limin Wang, Yu Qiao:
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation. ICLR 2024 - [c316]Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yu Qiao:
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models. ICLR 2024 - [c315]Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo:
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation. ICLR 2024 - [c314]Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li:
Personalize Segment Anything Model with One Shot. ICLR 2024 - [c313]Renrui Zhang, Jiaming Han, Chris Liu, Aojun Zhou, Pan Lu, Yu Qiao, Hongsheng Li, Peng Gao:
LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention. ICLR 2024 - [c312]Wenlong Zhang, Xiaohui Li, Xiangyu Chen, Xiaoyun Zhang, Yu Qiao, Xiao-Ming Wu, Chao Dong:
SEAL: A Framework for Systematic Evaluation of Real-World Super-Resolution. ICLR 2024 - [c311]Mingzhou Liu, Xinwei Sun, Yu Qiao, Yizhou Wang:
Causal Discovery via Conditional Independence Testing with Proxy Variables. ICML 2024 - [c310]Yihao Liu, Xiangyu Chen, Xianzheng Ma, Xintao Wang, Jiantao Zhou, Yu Qiao, Chao Dong:
Unifying Image Processing as Visual Prompting Question Answering. ICML 2024 - [c309]Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, Ping Luo:
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis. ICML 2024 - [c308]Dongyang Liu, Renrui Zhang, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Yu Qiao, Hongsheng Li, Peng Gao:
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models. ICML 2024 - [c307]Yue Yang, Yuqi Lin, Hong Liu, Wenqi Shao, Runjian Chen, Hailong Shang, Yu Wang, Yu Qiao, Kaipeng Zhang, Ping Luo:
Position: Towards Implicit Prompt For Text-To-Image Models. ICML 2024 - [c306]Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao:
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI. ICML 2024 - [c305]Xin Liu, Yichen Zhu, Yunshi Lan, Chao Yang, Yu Qiao:
Safety of Multimodal Large Language Models on Images and Text. IJCAI 2024: 8151-8159 - [c304]Daocheng Fu, Wenjie Lei, Licheng Wen, Pinlong Cai, Song Mao, Min Dou, Botian Shi, Yu Qiao:
LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving. IV 2024: 1084-1090 - [c303]Xiangyu Chen
, Yihao Liu
, Yuandong Pu
, Wenlong Zhang
, Jiantao Zhou
, Yu Qiao
, Chao Dong
:
Learning A Low-Level Vision Generalist via Visual Task Prompt. ACM Multimedia 2024: 2671-2680 - [c302]Yixu Wang, Yan Teng, Kexin Huang, Chengqi Lyu, Songyang Zhang, Wenwei Zhang, Xingjun Ma, Yu-Gang Jiang, Yu Qiao, Yingchun Wang:
Fake Alignment: Are LLMs Really Aligned Well? NAACL-HLT 2024: 4696-4712 - [c301]Tao Ma, Hongbin Zhou, Qiusheng Huang, Xuemeng Yang, Jianfei Guo, Bo Zhang, Min Dou, Yu Qiao, Botian Shi, Hongsheng Li:
ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving. NeurIPS 2024 - [c300]Lin Chen, Xilin Wei, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Lin Bin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang:
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions. NeurIPS 2024 - [c299]Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, Feng Zhao:
Are We on the Right Way for Evaluating Large Vision-Language Models? NeurIPS 2024 - [c298]Pengcheng Chen, Jin Ye, Guoan Wang, Yanjun Li, Zhongying Deng, Wei Li, Tianbin Li, Haodong Duan, Ziyan Huang, Yanzhou Su, Benyou Wang, Shaoting Zhang, Bin Fu, Jianfei Cai, Bohan Zhuang, Eric J. Seibel, Junjun He, Yu Qiao:
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI. NeurIPS 2024 - [c297]Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang:
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD. NeurIPS 2024 - [c296]Yiwei Guo, Shaobin Zhuang, Kunchang Li, Yu Qiao, Yali Wang:
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration. NeurIPS 2024 - [c295]Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, Pengfei Liu:
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI. NeurIPS 2024 - [c294]Chuanhao Li, Zhen Li, Chenchen Jing, Shuo Liu, Wenqi Shao, Yuwei Wu, Ping Luo, Yu Qiao, Kaipeng Zhang:
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge. NeurIPS 2024 - [c293]Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, Jiaqi Wang:
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs. NeurIPS 2024 - [c292]Shuo Liu, Kaining Ying, Hao Zhang, Yue Yang, Yuqi Lin, Tianle Zhang, Chuanhao Li, Yu Qiao, Ping Luo, Wenqi Shao, Kaipeng Zhang:
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models. NeurIPS 2024 - [c291]Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao:
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving. NeurIPS 2024 - [c290]Chenxin Tao, Xizhou Zhu, Shiqian Su, Lewei Lu, Changyao Tian, Xuan Luo, Gao Huang, Hongsheng Li, Yu Qiao, Jie Zhou, Jifeng Dai:
Learning 1D Causal Visual Representation with De-focus Attention Networks. NeurIPS 2024 - [c289]Weiyun Wang, Shuibo Zhang, Yiming Ren, Yuchen Duan, Tiantong Li, Shuo Liu, Mengkang Hu, Zhe Chen, Kaipeng Zhang, Lewei Lu, Xizhou Zhu, Ping Luo, Yu Qiao, Jifeng Dai, Wenqi Shao, Wenhai Wang:
Needle In A Multimodal Haystack. NeurIPS 2024 - [c288]Jiannan Wu, Muyan Zhong, Sen Xing, Zeqiang Lai, Zhaoyang Liu, Zhe Chen, Wenhai Wang, Xizhou Zhu, Lewei Lu, Tong Lu, Ping Luo, Yu Qiao, Jifeng Dai:
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks. NeurIPS 2024 - [c287]Chenyu Yang, Xizhou Zhu, Jinguo Zhu, Weijie Su, Junjie Wang, Xuan Dong, Wenhai Wang, Bin Li, Jie Zhou, Yu Qiao, Jifeng Dai:
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning. NeurIPS 2024 - [c286]Haiyu Zhang, Xinyuan Chen, Yaohui Wang, Xihui Liu, Yunhong Wang, Yu Qiao:
4Diffusion: Multi-view Video Diffusion Model for 4D Generation. NeurIPS 2024 - [c285]Tianle Zhang, Langtian Ma, Yuchen Yan, Yuchen Zhang, Yue Yang, Ziyao Guo, Wenqi Shao, Kai Wang, Yang You, Yu Qiao, Ping Luo, Kaipeng Zhang:
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality. NeurIPS 2024 - [c284]Qingsong Zhao, Yi Wang, Jilan Xu, Yinan He, Zifan Song, Limin Wang, Yu Qiao, Cairong Zhao:
Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection? NeurIPS 2024 - [c283]Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao:
SyncVIS: Synchronized Video Instance Segmentation. NeurIPS 2024 - [c282]Xizhou Zhu, Xue Yang, Zhaokai Wang, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai:
Parameter-Inverted Image Pyramid Networks. NeurIPS 2024 - [c281]Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Xiangyang Zhu, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Lirui Zhao, Si Liu, Xiangyu Yue, Wanli Ouyang, Yu Qiao, Hongsheng Li, Peng Gao:
Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT. NeurIPS 2024 - [c280]Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li:
Learning Manipulation by Predicting Interaction. Robotics: Science and Systems 2024 - [c279]Daocheng Fu, Xin Li, Licheng Wen, Min Dou, Pinlong Cai, Botian Shi, Yu Qiao:
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models. WACV (Workshops) 2024: 910-919 - [c278]Zeyu Lu, Chengyue Wu, Xinyuan Chen, Yaohui Wang, Lei Bai, Yu Qiao, Xihui Liu:
Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation. WACV 2024: 5362-5371 - [i432]Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo:
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning. CoRR abs/2401.02384 (2024) - [i431]Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Ziwei Liu, Yuan-Fang Li, Cunjian Chen
, Yu Qiao:
Latte: Latent Diffusion Transformer for Video Generation. CoRR abs/2401.03048 (2024) - [i430]Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai:
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications. CoRR abs/2401.06197 (2024) - [i429]Shaobin Zhuang, Kunchang Li, Xinyuan Chen, Yaohui Wang, Ziwei Liu, Yu Qiao, Yali Wang:
Vlogger: Make Your Dream A Vlog. CoRR abs/2401.09414 (2024) - [i428]Changyao Tian, Xizhou Zhu, Yuwen Xiong, Weiyun Wang, Zhe Chen, Wenhai Wang, Yuntao Chen, Lewei Lu, Tong Lu, Jie Zhou, Hongsheng Li, Yu Qiao, Jifeng Dai:
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer. CoRR abs/2401.10208 (2024) - [i427]Zaibin Zhang, Yongting Zhang, Lijun Li, Hongzhi Gao, Lijun Wang, Huchuan Lu, Feng Zhao, Yu Qiao, Jing Shao:
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety. CoRR abs/2401.11880 (2024) - [i426]Guoxin Chen, Kexin Tang, Chao Yang, Fuying Ye, Yu Qiao, Yiming Qian:
SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning. CoRR abs/2401.13246 (2024) - [i425]Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, Chao Dong:
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild. CoRR abs/2401.13627 (2024) - [i424]Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng
, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang
, Yali Wang, Yan Teng
, Yaru Wang, Yi Wang, Yinan He, Yingchun Wang, Yixu Wang, Yongting Zhang, Yu Qiao, Yujiong Shen, Yurong Mou, Yuxi Chen, Zaibin Zhang, Zhelun Shi, Zhenfei Yin, Zhipin Wang:
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities. CoRR abs/2401.15071 (2024) - [i423]Weigao Sun, Zhen Qin, Weixuan Sun, Shidi Li, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong:
CO2: Efficient Distributed Training with Full Communication-Computation Overlap. CoRR abs/2401.16265 (2024) - [i422]Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang
, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang:
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model. CoRR abs/2401.16420 (2024) - [i421]Xin Liu, Yichen Zhu, Yunshi Lan, Chao Yang, Yu Qiao:
Safety of Multimodal Large Language Models on Images and Text. CoRR abs/2402.00357 (2024) - [i420]Daocheng Fu, Wenjie Lei, Licheng Wen, Pinlong Cai, Song Mao, Min Dou, Botian Shi, Yu Qiao:
LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving. CoRR abs/2402.01246 (2024) - [i419]Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, Wangmeng Zuo, Dahua Lin, Yu Qiao, Jing Shao:
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models. CoRR abs/2402.05044 (2024) - [i418]Shikun Ban, Juling Fan, Wentao Zhu, Xiaoxuan Ma, Yu Qiao, Yizhou Wang:
Real-time Holistic Robot Pose Estimation with Unknown States. CoRR abs/2402.05655 (2024) - [i417]Peng Gao, Renrui Zhang, Chris Liu, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao:
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models. CoRR abs/2402.05935 (2024) - [i416]Yutao Hu, Tianbin Li, Quanfeng Lu, Wenqi Shao, Junjun He, Yu Qiao, Ping Luo:
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM. CoRR abs/2402.09181 (2024) - [i415]Zhichen Dong, Zhanhui Zhou, Chao Yang, Jing Shao, Yu Qiao:
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey. CoRR abs/2402.09283 (2024) - [i414]Renqiu Xia, Bo Zhang, Hancheng Ye, Xiangchao Yan, Qi Liu, Hongbin Zhou, Zijun Chen, Min Dou, Botian Shi, Junchi Yan, Yu Qiao:
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning. CoRR abs/2402.12185 (2024) - [i413]Zhanhui Zhou, Jie Liu, Zhichen Dong, Jiaheng Liu, Chao Yang, Wanli Ouyang
, Yu Qiao:
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire! CoRR abs/2402.12343 (2024) - [i412]Junting Chen, Yao Mu, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, Yu Qiao, Huazhe Xu, Mingyu Ding, Ping Luo:
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation. CoRR abs/2402.14623 (2024) - [i411]Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, Ping Luo:
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis. CoRR abs/2402.16117 (2024) - [i410]Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo:
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation. CoRR abs/2402.16880 (2024) - [i409]Zhaoxun Ju, Chao Yang, Hongbo Wang, Yu Qiao, Fuchun Sun:
Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning. CoRR abs/2402.17511 (2024) - [i408]Boyu Chen, Siran Chen, Kunchang Li, Qinglin Xu, Yu Qiao, Yali Wang:
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition. CoRR abs/2402.18951 (2024) - [i407]Jiantao Qiu, Haijun Lv, Zhenjiang Jin, Rui Wang
, Wenchang Ning, Jia Yu, ChaoBin Zhang, Zhenxiang Li, Pei Chu, Yuan Qu, Jin Shi, Lindong Lu, Runyu Peng, Zhiyuan Zeng, Huanze Tang, Zhikai Lei, Jiawei Hong, Keyu Chen, Zhaoye Fei, Ruiliang Xu, Wei Li, Zhongying Tu, Dahua Lin, Yu Qiao, Hang Yan, Conghui He:
WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset. CoRR abs/2402.19282 (2024) - [i406]Chen Qian, Jie Zhang, Wei Yao, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu, Jing Shao:
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models. CoRR abs/2402.19465 (2024) - [i405]Weiyun Wang, Yiming Ren, Haowen Luo, Tiantong Li, Chenxiang Yan, Zhe Chen, Wenhai Wang, Qingyun Li, Lewei Lu, Xizhou Zhu, Yu Qiao, Jifeng Dai:
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World. CoRR abs/2402.19474 (2024) - [i404]Zishi Li, Xiaoxuan Ma, Qiuyan Shang, Wentao Zhu, Hai Ci, Yu Qiao, Yizhou Wang:
Efficient Action Counting with Dynamic Queries. CoRR abs/2403.01543 (2024) - [i403]Yue Yang, Yuqi lin, Hong Liu, Wenqi Shao, Runjian Chen, Hailong Shang, Yu Wang, Yu Qiao, Kaipeng Zhang, Ping Luo:
Towards Implicit Prompt For Text-To-Image Models. CoRR abs/2403.02118 (2024) - [i402]Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang:
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures. CoRR abs/2403.02308 (2024) - [i401]Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li:
Embodied Understanding of Driving Scenarios. CoRR abs/2403.04593 (2024) - [i400]Kunchang Li, Xinhao Li, Yi Wang, Yinan He, Yali Wang, Limin Wang, Yu Qiao:
VideoMamba: State Space Model for Efficient Video Understanding. CoRR abs/2403.06977 (2024) - [i399]Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan, Yu Qiao, Wai Lam, Lizhuang Ma:
Exploring Safety Generalization Challenges of Large Language Models via Code. CoRR abs/2403.07865 (2024) - [i398]Hao Zhang, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Kaipeng Zhang:
AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions. CoRR abs/2403.09346 (2024) - [i397]Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li:
Generalized Predictive Model for Autonomous Driving. CoRR abs/2403.09630 (2024) - [i396]Enshen Zhou, Yiran Qin, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng
, Yu Qiao, Jing Shao:
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control. CoRR abs/2403.12037 (2024) - [i395]Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu
, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang:
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding. CoRR abs/2403.15377 (2024) - [i394]Yifei Huang, Guo Chen, Jilan Xu, Mingfang Zhang, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao:
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World. CoRR abs/2403.16182 (2024) - [i393]Zhelun Shi, Zhipin Wang, Hongxing Fan, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng
, Yu Qiao, Jing Shao:
Assessment of Multimodal Large Language Models in Alignment with Human Values. CoRR abs/2403.17830 (2024) - [i392]Yutong Chen, Yifan Zhan, Zhihang Zhong, Wei Wang, Xiao Sun, Yu Qiao, Yinqiang Zheng:
Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence. CoRR abs/2403.19160 (2024) - [i391]Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Haoshu Fang, Zhenfei Yin, Wanli Ouyang
, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng
:
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents. CoRR abs/2403.19622 (2024) - [i390]Shuo Liu, Kaining Ying, Hao Zhang, Yue Yang, Yuqi Lin, Tianle Zhang, Chuanhao Li, Yu Qiao, Ping Luo, Wenqi Shao, Kaipeng Zhang:
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models. CoRR abs/2403.20194 (2024) - [i389]Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, Feng Zhao:
Are We on the Right Way for Evaluating Large Vision-Language Models? CoRR abs/2403.20330 (2024) - [i388]Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao:
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction. CoRR abs/2404.00913 (2024) - [i387]Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao:
VideoDistill: Language-aware Vision Distillation for Video Question Answering. CoRR abs/2404.00973 (2024) - [i386]Lirui Zhao, Yue Yang, Kaipeng Zhang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Rongrong Ji:
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model. CoRR abs/2404.01342 (2024) - [i385]Hao Wu, Huabin Liu, Yu Qiao, Xiao Sun:
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement. CoRR abs/2404.02755 (2024) - [i384]Weigao Sun, Zhen Qin, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong:
Linear Attention Sequence Parallelism. CoRR abs/2404.02882 (2024) - [i383]Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang
, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang:
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD. CoRR abs/2404.06512 (2024) - [i382]Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao:
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI. CoRR abs/2404.16006 (2024) - [i381]Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang:
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites. CoRR abs/2404.16821 (2024) - [i380]Ziyan Chen, Jingwen He, Xinqi Lin, Yu Qiao, Chao Dong:
Towards Real-world Video Face Restoration: A New Benchmark. CoRR abs/2404.19500 (2024) - [i379]Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Yu Qiao, Chaochao Lu:
Causal Evaluation of Language Models. CoRR abs/2405.00622 (2024) - [i378]Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li:
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers. CoRR abs/2405.05945 (2024) - [i377]Chuanhao Li, Zhen Li, Chenchen Jing, Shuo Liu, Wenqi Shao, Yuwei Wu, Ping Luo, Yu Qiao, Kaipeng Zhang:
UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge. CoRR abs/2405.14554 (2024) - [i376]Chongjie Si, Xuehui Wang, Xue Yang, Zhengqin Xu, Qingyun Li, Jifeng Dai, Yu Qiao, Xiaokang Yang, Wei Shen:
FLoRA: Low-Rank Core Space for N-dimension. CoRR abs/2405.14739 (2024) - [i375]Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao:
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving. CoRR abs/2405.15324 (2024) - [i374]Zhanhui Zhou, Zhixuan Liu, Jie Liu, Zhichen Dong, Chao Yang, Yu Qiao:
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models. CoRR abs/2405.19262 (2024) - [i373]Haiyu Zhang, Xinyuan Chen, Yaohui Wang, Xihui Liu, Yunhong Wang, Yu Qiao:
4Diffusion: Multi-view Video Diffusion Model for 4D Generation. CoRR abs/2405.20674 (2024) - [i372]Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li:
Learning Manipulation by Predicting Interaction. CoRR abs/2406.00439 (2024) - [i371]Hao Wen, Zehuan Huang, Yaohui Wang, Xinyuan Chen, Yu Qiao, Lu Sheng
:
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion. CoRR abs/2406.03184 (2024) - [i370]Lin Chen, Xilin Wei, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang:
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions. CoRR abs/2406.04325 (2024) - [i369]Xizhou Zhu, Xue Yang, Zhaokai Wang, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai:
Parameter-Inverted Image Pyramid Networks. CoRR abs/2406.04330 (2024) - [i368]Chenxin Tao, Xizhou Zhu, Shiqian Su, Lewei Lu, Changyao Tian, Xuan Luo, Gao Huang, Hongsheng Li, Yu Qiao, Jie Zhou, Jifeng Dai:
Learning 1D Causal Visual Representation with De-focus Attention Networks. CoRR abs/2406.04342 (2024) - [i367]Weiyun Wang, Shuibo Zhang, Yiming Ren, Yuchen Duan, Tiantong Li, Shuo Liu, Mengkang Hu, Zhe Chen, Kaipeng Zhang, Lewei Lu, Xizhou Zhu, Ping Luo, Yu Qiao, Jifeng Dai, Wenqi Shao, Wenhai Wang:
Needle In A Multimodal Haystack. CoRR abs/2406.07230 (2024) - [i366]Chenyu Yang, Xizhou Zhu, Jinguo Zhu, Weijie Su, Junjie Wang, Xuan Dong, Wenhai Wang, Lewei Lu, Bin Li, Jie Zhou, Yu Qiao, Jifeng Dai:
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning. CoRR abs/2406.07543 (2024) - [i365]Jiannan Wu, Muyan Zhong, Sen Xing, Zeqiang Lai, Zhaoyang Liu
, Wenhai Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Ping Luo, Yu Qiao, Jifeng Dai:
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks. CoRR abs/2406.08394 (2024) - [i364]Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu
, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang, Min Dou, Changyao Tian, Xizhou Zhu, Lewei Lu, Yushi Chen, Junjun He, Zhongying Tu, Tong Lu, Yali Wang, Limin Wang, Dahua Lin, Yu Qiao, Botian Shi, Conghui He, Jifeng Dai:
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text. CoRR abs/2406.08418 (2024) - [i363]Quanfeng Lu, Wenqi Shao, Zitao Liu, Fanqing Meng, Boxuan Li, Botong Chen, Siyuan Huang, Kaipeng Zhang, Yu Qiao, Ping Luo:
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices. CoRR abs/2406.08451 (2024) - [i362]Tianle Zhang, Langtian Ma, Yuchen Yan, Yuchen Zhang, Kai Wang, Yue Yang, Ziyao Guo, Wenqi Shao, Yang You, Yu Qiao, Ping Luo, Kaipeng Zhang:
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality. CoRR abs/2406.08845 (2024) - [i361]Renqiu Xia, Song Mao, Xiangchao Yan, Hongbin Zhou, Bo Zhang, Haoyang Peng, Jiahao Pi, Daocheng Fu, Wenjie Wu, Hancheng Ye, Shiyang Feng, Bin Wang, Chao Xu, Conghui He, Pinlong Cai, Min Dou, Botian Shi, Sheng Zhou, Yongwei Wang, Bin Wang, Junchi Yan, Fei Wu, Yu Qiao:
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models. CoRR abs/2406.11633 (2024) - [i360]Fangzhi Xu, Qiushi Sun, Kanzhi Cheng, Jun Liu, Yu Qiao, Zhiyong Wu:
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models. CoRR abs/2406.11736 (2024) - [i359]Fanqing Meng, Wenqi Shao, Lixin Luo, Yahong Wang, Yiran Chen, Quanfeng Lu, Yue Yang, Tianshuo Yang, Kaipeng Zhang, Yu Qiao, Ping Luo:
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models. CoRR abs/2406.11802 (2024) - [i358]Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, Jiaqi Wang:
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs. CoRR abs/2406.11833 (2024) - [i357]Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, Jing Shao:
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model. CoRR abs/2406.12030 (2024) - [i356]Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, Pengfei Liu:
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI. CoRR abs/2406.12753 (2024) - [i355]Baoqi Pei, Guo Chen, Jilan Xu, Yuping He, Yicheng Liu, Kanghua Pan, Yifei Huang, Yali Wang, Tong Lu, Limin Wang, Yu Qiao:
EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation. CoRR abs/2406.18070 (2024) - [i354]Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang
, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao:
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT. CoRR abs/2406.18583 (2024) - [i353]Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang
, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang:
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output. CoRR abs/2407.03320 (2024) - [i352]Jingwen He, Tianfan Xue, Dongyang Liu, Xinqi Lin, Peng Gao, Dahua Lin, Yu Qiao, Wanli Ouyang
, Ziwei Liu:
VEnhancer: Generative Space-Time Enhancement for Video Generation. CoRR abs/2407.07667 (2024) - [i351]Wenshuo Peng, Kaipeng Zhang, Yue Yang, Hao Zhang, Yu Qiao:
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification. CoRR abs/2407.08787 (2024) - [i350]Hanqing Wang, Jiahe Chen, Wensi Huang, Qingwei Ben, Tai Wang, Boyu Mi, Tao Huang, Siheng Zhao, Yilun Chen, Sizhe Yang, Peizhou Cao, Wenye Yu, Zichao Ye, Jialun Li, Junfeng Long, Zirui Wang, Huiling Wang, Ying Zhao, Zhongying Tu, Yu Qiao, Dahua Lin, Jiangmiao Pang:
GRUtopia: Dream General Robots in a City at Scale. CoRR abs/2407.10943 (2024) - [i349]Mengzhao Chen, Wenqi Shao, Peng Xu, Jiahao Wang, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo:
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models. CoRR abs/2407.11062 (2024) - [i348]Xuhong Wang, Haoyu Jiang, Yi Yu, Jingru Yu, Yilun Lin, Ping Yi, Yingchun Wang, Yu Qiao, Li Li, Fei-Yue Wang:
Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond. CoRR abs/2407.11100 (2024) - [i347]Yi Yu, Jingru Yu, Xuhong Wang, Juanjuan Li, Yilun Lin, Conghui He, Yanqing Yang, Yu Qiao, Li Li, Fei-Yue Wang:
Navigating the Data Trading Crossroads: An Interdisciplinary Survey. CoRR abs/2407.11466 (2024) - [i346]Shuo Cao, Yihao Liu, Wenlong Zhang, Yu Qiao, Chao Dong:
GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity. CoRR abs/2407.12273 (2024) - [i345]Jie Zhang, Dongrui Liu, Chen Qian, Ziyue Gan, Yong Liu, Yu Qiao, Jing Shao:
The Better Angels of Machine Personality: How Personality Relates to LLM Safety. CoRR abs/2407.12344 (2024) - [i344]Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao:
ViLLa: Video Reasoning Segmentation with Large Language Model. CoRR abs/2407.14500 (2024) - [i343]Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Yuan-Fang Li, Cunjian Chen, Yu Qiao:
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models. CoRR abs/2407.15642 (2024) - [i342]Yangzhou Liu, Yue Cao, Zhangwei Gao, Weiyun Wang, Zhe Chen, Wenhai Wang, Hao Tian, Lewei Lu, Xizhou Zhu, Tong Lu, Yu Qiao, Jifeng Dai:
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity. CoRR abs/2407.15838 (2024) - [i341]Jingru Yu, Yi Yu, Xuhong Wang, Yilun Lin, Manzhi Yang, Yu Qiao, Fei-Yue Wang:
The Shadow of Fraud: The Emerging Danger of AI-powered Social Engineering and its Possible Cure. CoRR abs/2407.15912 (2024) - [i340]Lirui Zhao, Tianshuo Yang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Kaipeng Zhang, Rongrong Ji:
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model. CoRR abs/2407.16982 (2024) - [i339]Xuemeng Yang, Licheng Wen, Yukai Ma, Jianbiao Mei, Xin Li, Tiantian Wei, Wenjie Lei, Daocheng Fu, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao:
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving. CoRR abs/2408.00415 (2024) - [i338]Dongyang Liu, Shitian Zhao, Le Zhuo, Weifeng Lin, Yu Qiao, Hongsheng Li, Peng Gao:
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining. CoRR abs/2408.02657 (2024) - [i337]Fanqing Meng, Jin Wang, Chuanhao Li, Quanfeng Lu, Hao Tian, Jiaqi Liao, Xizhou Zhu, Jifeng Dai, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao:
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models. CoRR abs/2408.02718 (2024) - [i336]Zihan Li, Diping Song, Zefeng Yang, Deming Wang, Fei Li, Xiulan Zhang, Paul E. Kinahan, Yu Qiao:
VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge. CoRR abs/2408.02865 (2024) - [i335]Pengcheng Chen, Jin Ye, Guoan Wang, Yanjun Li, Zhongying Deng, Wei Li, Tianbin Li, Haodong Duan, Ziyan Huang, Yanzhou Su, Benyou Wang, Shaoting Zhang, Bin Fu, Jianfei Cai, Bohan Zhuang, Eric J. Seibel, Junjun He, Yu Qiao:
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI. CoRR abs/2408.03361 (2024) - [i334]Xiangyu Chen, Yihao Liu, Yuandong Pu, Wenlong Zhang, Jiantao Zhou, Yu Qiao, Chao Dong:
Learning A Low-Level Vision Generalist via Visual Task Prompt. CoRR abs/2408.08601 (2024) - [i333]Yanbo Ding, Shaobin Zhuang, Kunchang Li, Zhengrong Yue, Yu Qiao, Yali Wang:
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration. CoRR abs/2408.10605 (2024) - [i332]Xiangtao Kong, Jinjin Gu, Yihao Liu, Wenlong Zhang, Xiangyu Chen, Yu Qiao, Chao Dong:
A Preliminary Exploration Towards General Image Restoration. CoRR abs/2408.15143 (2024) - [i331]Junyi Chen, Weicai Ye, Yifan Wang, Danpeng Chen, Di Huang, Wanli Ouyang, Guofeng Zhang, Yu Qiao, Tong He:
GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction. CoRR abs/2409.06685 (2024) - [i330]Weifeng Lin, Xinyu Wei, Renrui Zhang, Le Zhuo, Shitian Zhao, Siyuan Huang, Junlin Xi, Yu Qiao, Peng Gao, Hongsheng Li:
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions. CoRR abs/2409.15278 (2024) - [i329]Fuxian Huang, Qi Zhang, Shaopeng Zhai, Jie Wang, Tianyi Zhang, Haoran Zhang, Ming Zhou, Yu Liu, Yu Qiao:
CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation. CoRR abs/2409.15806 (2024) - [i328]Zhixuan Liu, Zhanhui Zhou, Yuanfu Wang, Chao Yang, Yu Qiao:
Inference-Time Language Model Alignment via Integrated Value Guidance. CoRR abs/2409.17819 (2024) - [i327]Bin Wang, Chao Xu, Xiaomeng Zhao
, Linke Ouyang, Fan Wu, Zhiyuan Zhao, Rui Xu, Kaiwen Liu, Yuan Qu, Fukai Shang, Bo Zhang, Liqun Wei, Zhihao Sui, Wei Li, Botian Shi, Yu Qiao, Dahua Lin, Conghui He:
MinerU: An Open-Source Solution for Precise Document Content Extraction. CoRR abs/2409.18839 (2024) - [i326]Fanqing Meng, Jiaqi Liao, Xinyu Tan, Wenqi Shao, Quanfeng Lu, Kaipeng Zhang, Yu Cheng, Dianqi Li, Yu Qiao, Ping Luo:
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation. CoRR abs/2410.05363 (2024) - [i325]Qingwen Bu, Hongyang Li, Li Chen, Jisong Cai, Jia Zeng, Heming Cui, Maoqing Yao, Yu Qiao:
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation. CoRR abs/2410.08001 (2024) - [i324]Yifan Zhan, Qingtian Zhu, Muyao Niu, Mingze Ma, Jiancheng Zhao, Zhihang Zhong, Xiao Sun, Yu Qiao, Yinqiang Zheng:
ToMiE: Towards Modular Growth in Enhanced SMPL Skeleton for 3D Human with Animatable Garments. CoRR abs/2410.08082 (2024) - [i323]Gen Luo, Xue Yang, Wenhan Dou, Zhaokai Wang, Jifeng Dai, Yu Qiao, Xizhou Zhu:
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training. CoRR abs/2410.08202 (2024) - [i322]Qibing Ren, Hao Li, Dongrui Liu, Zhanxu Xie, Xiaoya Lu, Yu Qiao, Lei Sha, Junchi Yan, Lizhuang Ma, Jing Shao:
Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues. CoRR abs/2410.10700 (2024) - [i321]Ying Chen, Guoan Wang, Yuanfeng Ji, Yanjun Li, Jin Ye, Tianbin Li, Bin Zhang, Nana Pei, Rongshan Yu, Yu Qiao, Junjun He:
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding. CoRR abs/2410.11761 (2024) - [i320]Yiwei Guo, Shaobin Zhuang, Kunchang Li, Yu Qiao, Yali Wang:
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration. CoRR abs/2410.12183 (2024) - [i319]Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang
, Yong Liu, Yu Qiao, Jing Shao:
REEF: Representation Encoding Fingerprints for Large Language Models. CoRR abs/2410.14273 (2024) - [i318]Zhi Hou, Tianyi Zhang, Yuwen Xiong, Hengjun Pu, Chengyang Zhao, Ronglei Tong, Yu Qiao, Jifeng Dai, Yuntao Chen:
Diffusion Transformer Policy. CoRR abs/2410.15959 (2024) - [i317]Zhangwei Gao, Zhe Chen, Erfei Cui, Yiming Ren, Weiyun Wang, Jinguo Zhu, Hao Tian, Shenglong Ye, Junjun He, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Jifeng Dai, Wenhai Wang:
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance. CoRR abs/2410.16261 (2024) - [i316]Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, Chao Dong:
An Intelligent Agentic System for Complex Image Restoration Problems. CoRR abs/2410.17809 (2024) - [i315]Hengwei Bian, Lingdong Kong, Haozhe Xie, Liang Pan, Yu Qiao, Ziwei Liu:
DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes. CoRR abs/2410.18084 (2024) - [i314]Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong:
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality. CoRR abs/2410.19355 (2024) - [i313]Yu Qiao, Lina Gong, Yu Zhao, Yongwei Wang, Mingqiang Wei:
DeMuVGN: Effective Software Defect Prediction Model by Learning Multi-view Software Dependency via Graph Neural Networks. CoRR abs/2410.19550 (2024) - [i312]Xiangyu Zeng, Kunchang Li, Chenting Wang, Xinhao Li, Tianxiang Jiang, Ziang Yan, Songze Li, Yansong Shi, Zhengrong Yue, Yi Wang, Yali Wang, Yu Qiao, Limin Wang:
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning. CoRR abs/2410.19702 (2024) - [i311]Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding
, Liheng Chen, Paul Pu Liang, Yu Qiao:
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents. CoRR abs/2410.23218 (2024) - [i310]Tao Ma, Hongbin Zhou, Qiusheng Huang, Xuemeng Yang, Jianfei Guo, Bo Zhang, Min Dou, Yu Qiao, Botian Shi, Hongsheng Li:
ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving. CoRR abs/2411.05311 (2024) - [i309]Weiyun Wang, Zhe Chen, Wenhai Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Jinguo Zhu, Xizhou Zhu, Lewei Lu, Yu Qiao, Jifeng Dai:
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization. CoRR abs/2411.10442 (2024) - [i308]Ziyi Yang, Zaibin Zhang, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong, Prateek Gupta, Shuyue Hu, Zhenfei Yin, Guohao Li, Xu Jia, Lijun Wang, Bernard Ghanem, Huchuan Lu, Chaochao Lu, Wanli Ouyang, Yu Qiao, Philip Torr, Jing Shao:
OASIS: Open Agent Social Interaction Simulations with One Million Agents. CoRR abs/2411.11581 (2024) - [i307]Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu:
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models. CoRR abs/2411.13503 (2024) - [i306]Tianbin Li, Yanzhou Su, Wei Li, Bin Fu, Zhe Chen, Ziyan Huang, Guoan Wang, Chenglong Ma
, Ying Chen, Ming Hu, Yanjun Li, Pengcheng Chen, Xiaowei Hu, Zhongying Deng, Yuanfeng Ji, Jin Ye, Yu Qiao, Junjun He:
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI. CoRR abs/2411.14522 (2024) - [i305]Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, Kaipeng Zhang:
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation. CoRR abs/2411.18499 (2024) - [i304]Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao:
SyncVIS: Synchronized Video Instance Segmentation. CoRR abs/2412.00882 (2024) - [i303]Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang:
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling. CoRR abs/2412.05271 (2024) - [i302]Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang:
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel. CoRR abs/2412.08467 (2024) - [i301]Pan Zhang, Xiaoyi Dong, Yuhang Cao, Yuhang Zang, Rui Qian, Xilin Wei, Lin Chen, Yifei Li, Junbo Niu, Shuangrui Ding, Qipeng Guo, Haodong Duan, Xin Chen, Han Lv, Zheng Nie, Min Zhang, Bin Wang, Wenwei Zhang, Xinyue Zhang, Jiaye Ge, Wei Li, Jingwen Li, Zhongying Tu, Conghui He, Xingcheng Zhang, Kai Chen, Yu Qiao, Dahua Lin, Jiaqi Wang:
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions. CoRR abs/2412.09596 (2024) - [i300]Fan Zhang, Shulin Tian, Ziqi Huang, Yu Qiao, Ziwei Liu:
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models. CoRR abs/2412.09645 (2024) - [i299]Chenxin Tao, Shiqian Su, Xizhou Zhu, Chenyu Zhang, Zhe Chen, Jiawen Liu, Wenhai Wang, Lewei Lu, Gao Huang, Yu Qiao, Jifeng Dai:
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding. CoRR abs/2412.16158 (2024) - [i298]Ziang Yan, Zhilin Li, Yinan He, Chenting Wang, Kunchang Li, Xinhao Li, Xiangyu Zeng, Zilei Wang, Yali Wang, Yu Qiao, Limin Wang, Yi Wang:
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment. CoRR abs/2412.19326 (2024) - [i297]Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu:
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis. CoRR abs/2412.19723 (2024) - [i296]Yifei Huang, Jilan Xu, Baoqi Pei, Yuping He, Guo Chen, Lijin Yang, Xinyuan Chen, Yaohui Wang, Zheng Nie, Jinyao Liu, Guoshun Fan, Dechen Lin, Fang Fang, Kunpeng Li, Chang Yuan, Yali Wang, Yu Qiao, Limin Wang:
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model. CoRR abs/2412.21080 (2024) - 2023
- [j92]Ruyun Hu
, Lihao Fu, Yongcan Chen, Junyu Chen, Yu Qiao, Tong Si
:
Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments. Briefings Bioinform. 24(1) (2023) - [j91]Mingye Xu, Zhipeng Zhou, Yali Wang, Yu Qiao:
Towards robustness and generalization of point cloud representation: A geometry coding method and a large-scale object-level dataset. Comput. Vis. Media 10(1): 27-43 (2023) - [j90]Kaiyang Zhou
, Ziwei Liu
, Yu Qiao
, Tao Xiang
, Chen Change Loy
:
Domain Generalization: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 45(4): 4396-4415 (2023) - [j89]Anran Liu
, Yihao Liu
, Jinjin Gu
, Yu Qiao
, Chao Dong
:
Blind Image Super-Resolution: A Survey and Beyond. IEEE Trans. Pattern Anal. Mach. Intell. 45(5): 5461-5480 (2023) - [j88]Mingye Xu
, Yali Wang
, Yihao Liu
, Tong He
, Yu Qiao
:
CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm. IEEE Trans. Pattern Anal. Mach. Intell. 45(8): 9583-9594 (2023) - [j87]Kunchang Li
, Yali Wang
, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li
, Yu Qiao
:
UniFormer: Unifying Convolution and Self-Attention for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(10): 12581-12600 (2023) - [j86]Yihao Liu
, Hengyuan Zhao
, Jinjin Gu
, Yu Qiao
, Chao Dong
:
Evaluating the Generalization Ability of Super-Resolution Networks. IEEE Trans. Pattern Anal. Mach. Intell. 45(12): 14497-14513 (2023) - [j85]Weicong Su
, Yali Wang, Kunchang Li, Peng Gao, Yu Qiao:
Hybrid token transformer for deep face recognition. Pattern Recognit. 139: 109443 (2023) - [j84]Shihua Li
, Haobin Chen
, Shijie Yu, Zhiqun He, Feng Zhu, Rui Zhao, Jie Chen
, Yu Qiao
:
COCAS+: Large-Scale Clothes-Changing Person Re-Identification With Clothes Templates. IEEE Trans. Circuits Syst. Video Technol. 33(4): 1839-1853 (2023) - [j83]Ming Li, Bin Fu
, Zhengfu Zhang, Yu Qiao
:
Character-Aware Sampling and Rectification for Scene Text Recognition. IEEE Trans. Multim. 25: 649-661 (2023) - [j82]Shixiang Wu, Chao Dong
, Yu Qiao
:
Blind Image Restoration Based on Cycle-Consistent Network. IEEE Trans. Multim. 25: 1111-1124 (2023) - [j81]Ming Li, Bin Fu
, Han Chen, Junjun He
, Yu Qiao
:
Dual Relation Network for Scene Text Recognition. IEEE Trans. Multim. 25: 4094-4107 (2023) - [j80]Yihao Liu
, Jingwen He
, Xiangyu Chen
, Zhengwen Zhang, Hengyuan Zhao, Chao Dong
, Yu Qiao
:
Very Lightweight Photo Retouching Network With Conditional Sequential Modulation. IEEE Trans. Multim. 25: 4638-4652 (2023) - [j79]Qitong Wang
, Bin Fu
, Ming Li, Junjun He
, Xi Peng
, Yu Qiao
:
Region-Aware Arbitrary-Shaped Text Detection With Progressive Fusion. IEEE Trans. Multim. 25: 4718-4729 (2023) - [j78]Yu Qiao
, Yuhao Liu
, Ziqi Wei
, Yuxin Wang
, Qiang Cai
, Guofeng Zhang
, Xin Yang:
Hierarchical and Progressive Image Matting. ACM Trans. Multim. Comput. Commun. Appl. 19(2): 52:1-52:23 (2023) - [j77]Shidong Wang
, Wei Zeng
, Xi Chen
, Yu Ye, Yu Qiao
, Chi-Wing Fu
:
ActFloor-GAN: Activity-Guided Adversarial Networks for Human-Centric Floorplan Design. IEEE Trans. Vis. Comput. Graph. 29(3): 1610-1624 (2023) - [c277]Zhenyu Wu, Yaoxiang Wang, Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Jingjing Xu, Yu Qiao:
OpenICL: An Open-Source Framework for In-context Learning. ACL (demo) 2023: 489-498 - [c276]Jia Zeng, Li Chen, Hanming Deng, Lewei Lu, Junchi Yan, Yu Qiao, Hongyang Li:
Distilling Focal Knowledge from Imperfect Expert for 3D Object Detection. CVPR 2023: 992-1001 - [c275]Chenxin Tao, Xizhou Zhu, Weijie Su
, Gao Huang, Bin Li, Jie Zhou, Yu Qiao, Xiaogang Wang, Jifeng Dai:
Siamese Image Modeling for Self-Supervised Vision Representation Learning. CVPR 2023: 2132-2141 - [c274]Hao Li, Jinguo Zhu, Xiaohu Jiang, Xizhou Zhu, Hongsheng Li
, Chun Yuan, Xiaohua Wang, Yu Qiao, Xiaogang Wang, Wenhai Wang, Jifeng Dai:
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks. CVPR 2023: 2691-2700 - [c273]Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng, Yi Wang, Yu Qiao, Weidi Xie:
Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision. CVPR 2023: 2935-2944 - [c272]Mingye Xu, Mutian Xu, Tong He, Wanli Ouyang, Yali Wang, Xiaoguang Han, Yu Qiao:
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency. CVPR 2023: 4380-4390 - [c271]Runnan Chen, Youquan Liu, Lingdong Kong
, Xinge Zhu, Yuexin Ma, Yikang Li, Yuenan Hou, Yu Qiao, Wenping Wang:
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP. CVPR 2023: 7020-7030 - [c270]Bo Zhang, Jiakang Yuan, Botian Shi, Tao Chen, Yikang Li, Yu Qiao:
Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection. CVPR 2023: 9253-9262 - [c269]Xuyang Shen, Dong Li, Jinxing Zhou, Zhen Qin, Bowen He, Xiaodong Han, Aixuan Li, Yuchao Dai, Lingpeng Kong, Meng Wang, Yu Qiao, Yiran Zhong:
Fine-grained Audible Video Description. CVPR 2023: 10585-10596 - [c268]Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li
, Xiaogang Wang, Yu Qiao:
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions. CVPR 2023: 14408-14419 - [c267]Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong
, Yinan He, Yi Wang, Yali Wang, Yu Qiao:
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking. CVPR 2023: 14549-14560 - [c266]Renrui Zhang, Xiangfei Hu, Bohao Li, Siyuan Huang, Hanqiu Deng, Yu Qiao, Peng Gao, Hongsheng Li
:
Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners. CVPR 2023: 15211-15222 - [c265]Jiakang Yuan, Bo Zhang, Xiangchao Yan, Tao Chen, Botian Shi, Yikang Li, Yu Qiao:
Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection. CVPR 2023: 15599-15608 - [c264]Weijie Su
, Xizhou Zhu, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie Zhou, Jifeng Dai:
Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information. CVPR 2023: 15888-15899 - [c263]Xin Li, Tao Ma, Yuenan Hou, Botian Shi, Yuchen Yang, Youquan Liu, Xingjiao Wu
, Qin Chen, Yikang Li, Yu Qiao, Liang He:
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion. CVPR 2023: 17524-17534 - [c262]Zhaoyang Xia, Youquan Liu, Xin Li, Xinge Zhu, Yuexin Ma, Yikang Li, Yuenan Hou, Yu Qiao:
SCPNet: Semantic Scene Completion on Point Cloud. CVPR 2023: 17642-17651 - [c261]Chenyu Yang
, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai:
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision. CVPR 2023: 17830-17839 - [c260]Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li:
Planning-oriented Autonomous Driving. CVPR 2023: 17853-17862 - [c259]Jiaqi Xu
, Xiaowei Hu, Lei Zhu, Qi Dou, Jifeng Dai, Yu Qiao, Pheng-Ann Heng:
Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior. CVPR 2023: 18053-18062 - [c258]Yurui Zhu, Tianyu Wang, Xueyang Fu, Xuanyu Yang, Xin Guo, Jifeng Dai, Yu Qiao, Xiaowei Hu:
Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions. CVPR 2023: 21747-21758 - [c257]Renrui Zhang, Liuhui Wang, Yu Qiao, Peng Gao, Hongsheng Li
:
Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders. CVPR 2023: 21769-21780 - [c256]Xiangyu Chen, Xintao Wang, Jiantao Zhou, Yu Qiao, Chao Dong:
Activating More Pixels in Image Super-Resolution Transformer. CVPR 2023: 22367-22377 - [c255]Bin Fu, Junjun He, Jianjun Wang, Yu Qiao:
Neural Transformation Fields for Arbitrary-Styled Font Generation. CVPR 2023: 22438-22447 - [c254]Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu Qiao, Yu-Gang Jiang:
ResFormer: Scaling ViTs with Multi-Resolution Training. CVPR 2023: 22721-22731 - [c253]Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo
:
Stare at What You See: Masked Image Modeling without Reconstruction. CVPR 2023: 22732-22741 - [c252]Yihao Liu
, Jingwen He, Jinjin Gu, Xiangtao Kong, Yu Qiao, Chao Dong:
DegAE: A New Pretraining Paradigm for Low-Level Vision. CVPR 2023: 23292-23303 - [c251]Lingdong Kong
, Youquan Liu, Runnan Chen, Yuexin Ma, Xinge Zhu, Yikang Li, Yuenan Hou, Yu Qiao, Ziwei Liu:
Rethinking Range View Representation for LiDAR Segmentation. ICCV 2023: 228-240 - [c250]Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Limin Wang, Yu Qiao:
UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding. ICCV 2023: 1632-1643 - [c249]Tao Ma, Xuemeng Yang, Hongbin Zhou
, Xin Li, Botian Shi, Junjie Liu, Yuchen Yang, Zhizheng Liu, Liang He, Yu Qiao, Yikang Li, Hongsheng Li
:
DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds. ICCV 2023: 6713-6724 - [c248]Renrui Zhang, Han Qiu, Tai Wang, Ziyu Guo, Ziteng Cui, Yu Qiao, Hongsheng Li
, Peng Gao:
MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection. ICCV 2023: 9121-9132 - [c247]Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao:
Scaling Data Generation in Vision-and-Language Navigation. ICCV 2023: 11975-11986 - [c246]Mingfei Han, Yali Wang, Zhihui Li, Lina Yao, Xiaojun Chang
, Yu Qiao:
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation. ICCV 2023: 13368-13377 - [c245]Bingkun Huang, Zhiyu Zhao, Guozhen Zhang, Yu Qiao
, Limin Wang:
MGMAE: Motion Guided Masking for Video Masked Autoencoding. ICCV 2023: 13447-13458 - [c244]Lihe Yang, Zhen Zhao, Lei Qi, Yu Qiao
, Yinghuan Shi, Hengshuang Zhao:
Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning. ICCV 2023: 16141-16150 - [c243]Mengzhao Chen, Wenqi Shao, Peng Xu, Mingbao Lin, Kaipeng Zhang, Fei Chao, Rongrong Ji
, Yu Qiao, Ping Luo:
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers. ICCV 2023: 17118-17128 - [c242]Kunchang Li, Yali Wang, Yizhuo Li, Yi Wang, Yinan He, Limin Wang, Yu Qiao:
Unmasked Teacher: Towards Training-Efficient Video Foundation Models. ICCV 2023: 19891-19903 - [c241]Youquan Liu, Runnan Chen, Xin Li, Lingdong Kong
, Yuchen Yang, Zhaoyang Xia, Yeqi Bai, Xinge Zhu, Yuexin Ma, Yikang Li, Yu Qiao, Yuenan Hou:
UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase. ICCV 2023: 21605-21616 - [c240]Yu Qiao, Bo Dong, Ao Jin, Yu Fu, Seung-Hwan Baek
, Felix Heide, Pieter Peers, Xiaopeng Wei, Xin Yang:
Multi-view Spectral Polarization Propagation for Video Glass Segmentation. ICCV 2023: 23161-23171 - [c239]Junting Pan, Ziyi Lin, Yuying Ge, Xiatian Zhu, Renrui Zhang, Yi Wang, Yu Qiao, Hongsheng Li
:
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models. ICCV (Workshops) 2023: 272-283 - [c238]Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao:
Vision Transformer Adapter for Dense Predictions. ICLR 2023 - [c237]Runjian Chen, Yao Mu, Runsen Xu, Wenqi Shao, Chenhan Jiang, Hang Xu, Yu Qiao, Zhenguo Li, Ping Luo:
CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving. ICLR 2023 - [c236]Penghao Wu, Li Chen, Hongyang Li, Xiaosong Jia, Junchi Yan, Yu Qiao:
Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling. ICLR 2023 - [c235]Jiashuo Yu, Yaohui Wang, Xinyuan Chen, Xiao Sun, Yu Qiao:
Long-Term Rhythmic Video Soundtracker. ICML 2023: 40339-40353 - [c234]Yu Qiao, Hengyi Zhang, Pengfei Sun, Yuan Tian, Yong Guan, Zhenzhou Shao, Zhiping Shi:
Parallelizable Simple Recurrent Units with Hierarchical Memory. ICONIP (15) 2023: 380-392 - [c233]Licheng Wen, Daocheng Fu, Song Mao, Pinlong Cai, Min Dou, Yikang Li, Yu Qiao:
LimSim: A Long-Term Interactive Multi-Scenario Traffic Simulator. ITSC 2023: 1255-1262 - [c232]Yunkun Zhang, Jin Gao
, Mu Zhou, Xiaosong Wang
, Yu Qiao, Shaoting Zhang, Dequan Wang:
Text-Guided Foundation Model Adaptation for Pathological Image Classification. MICCAI (5) 2023: 272-282 - [c231]Hongjie Zhang
, Yi Liu
, Yali Wang
, Limin Wang
, Yu Qiao
:
Learning Discriminative Feature Representation for Open Set Action Recognition. ACM Multimedia 2023: 7696-7705 - [c230]Jinjin Gu, Xianzheng Ma, Xiangtao Kong, Yu Qiao, Chao Dong:
Networks are Slacking Off: Understanding Generalization Problem in Image Deraining. NeurIPS 2023 - [c229]Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li:
Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection. NeurIPS 2023 - [c228]Fanqing Meng, Wenqi Shao, Zhanglin Peng, Chonghe Jiang, Kaipeng Zhang, Yu Qiao, Ping Luo:
Foundation Model is Efficient Multimodal Multitask Model Selector. NeurIPS 2023 - [c227]Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo:
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought. NeurIPS 2023 - [c226]Keqiang Sun, Junting Pan, Yuying Ge, Hao Li, Haodong Duan, Xiaoshi Wu, Renrui Zhang, Aojun Zhou, Zipeng Qin, Yi Wang, Jifeng Dai, Yu Qiao, Limin Wang, Hongsheng Li:
JourneyDB: A Benchmark for Generative Image Understanding. NeurIPS 2023 - [c225]Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai:
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks. NeurIPS 2023 - [c224]Jiakang Yuan, Bo Zhang, Xiangchao Yan, Botian Shi, Tao Chen, Yikang Li, Yu Qiao:
AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset. NeurIPS 2023 - [c223]Wenlong Zhang, Xiaohui Li, Guangyuan Shi, Xiangyu Chen, Yu Qiao, Xiaoyun Zhang, Xiao-Ming Wu, Chao Dong:
Real-World Image Super-Resolution as Multi-Task Learning. NeurIPS 2023 - [c222]Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao:
TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation. NeurIPS 2023 - [i295]Penghao Wu, Li Chen, Hongyang Li, Xiaosong Jia, Junchi Yan, Yu Qiao:
Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling. CoRR abs/2301.01006 (2023) - [i294]Runnan Chen, Youquan Liu, Lingdong Kong, Xinge Zhu, Yuexin Ma, Yikang Li, Yuenan Hou, Yu Qiao, Wenping Wang:
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP. CoRR abs/2301.04926 (2023) - [i293]Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng, Yi Wang, Yu Qiao, Weidi Xie:
Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision. CoRR abs/2301.09121 (2023) - [i292]Renrui Zhang, Xiangfei Hu, Bohao Li, Siyuan Huang, Hanqiu Deng, Hongsheng Li, Yu Qiao, Peng Gao:
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners. CoRR abs/2303.02151 (2023) - [i291]Zhenyu Wu, Yaoxiang Wang, Jiacheng Ye, Jiangtao Feng, Jingjing Xu, Yu Qiao, Zhiyong Wu:
OpenICL: An Open-Source Framework for In-context Learning. CoRR abs/2303.02913 (2023) - [i290]Xin Li, Tao Ma, Yuenan Hou, Botian Shi, Yuchen Yang, Youquan Liu, Xingjiao Wu, Qin Chen, Yikang Li, Yu Qiao, Liang He:
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion. CoRR abs/2303.03595 (2023) - [i289]Zhongying Deng, Xiaoyu Ren, Jin Ye, Junjun He, Yu Qiao:
FCN+: Global Receptive Convolution Makes FCN Great Again. CoRR abs/2303.04589 (2023) - [i288]Lingdong Kong, Youquan Liu, Runnan Chen, Yuexin Ma, Xinge Zhu, Yikang Li, Yuenan Hou, Yu Qiao, Ziwei Liu:
Rethinking Range View Representation for LiDAR Segmentation. CoRR abs/2303.05367 (2023) - [i287]Ziteng Cui, Lin Gu, Xiao Sun, Yu Qiao, Tatsuya Harada:
Aleth-NeRF: Low-light Condition View Synthesis with Concealing Fields. CoRR abs/2303.05807 (2023) - [i286]Jiakang Yuan, Bo Zhang, Xiangchao Yan, Tao Chen, Botian Shi, Yikang Li, Yu Qiao:
Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection. CoRR abs/2303.05886 (2023) - [i285]Bo Zhang, Jiakang Yuan, Botian Shi, Tao Chen, Yikang Li, Yu Qiao:
Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection. CoRR abs/2303.06880 (2023) - [i284]Zhaoyang Xia, Youquan Liu, Xin Li, Xinge Zhu, Yuexin Ma, Yikang Li, Yuenan Hou, Yu Qiao:
SCPNet: Semantic Scene Completion on Point Cloud. CoRR abs/2303.06884 (2023) - [i283]Jiaqi Xu, Xiaowei Hu, Lei Zhu, Qi Dou, Jifeng Dai, Yu Qiao, Pheng-Ann Heng:
Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior. CoRR abs/2303.09757 (2023) - [i282]Xuyang Shen, Dong Li, Jinxing Zhou, Zhen Qin, Bowen He, Xiaodong Han, Aixuan Li, Yuchao Dai, Lingpeng Kong, Meng Wang, Yu Qiao, Yiran Zhong:
Fine-grained Audible Video Description. CoRR abs/2303.15616 (2023) - [i281]Kunchang Li, Yali Wang, Yizhuo Li, Yi Wang, Yinan He, Limin Wang, Yu Qiao:
Unmasked Teacher: Towards Training-Efficient Video Foundation Models. CoRR abs/2303.16058 (2023) - [i280]Renrui Zhang, Jiaming Han, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Peng Gao, Yu Qiao:
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. CoRR abs/2303.16199 (2023) - [i279]Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, Yu Qiao:
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking. CoRR abs/2303.16727 (2023) - [i278]Tianyu Li
, Li Chen, Xiangwei Geng, Huijie Wang, Yang Li, Zhenbo Liu, Shengyin Jiang, Yuting Wang, Hang Xu, Chunjing Xu, Feng Wen, Ping Luo, Junchi Yan, Wei Zhang, Xiaogang Wang, Yu Qiao, Hongyang Li:
Topology Reasoning for Driving Scenes. CoRR abs/2304.05277 (2023) - [i277]Ziyan Huang, Haoyu Wang, Zhongying Deng, Jin Ye, Yanzhou Su, Hui Sun, Junjun He, Yun Gu, Lixu Gu, Shaoting Zhang
, Yu Qiao:
STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training. CoRR abs/2304.06716 (2023) - [i276]Xiaoliang Ju, Yiyang Sun, Yiming Hao, Yikang Li, Yu Qiao, Hongsheng Li:
Perception Imitation: Towards Synthesis-free Simulator for Autonomous Vehicles. CoRR abs/2304.09365 (2023) - [i275]Huijie Wang, Zhenbo Liu, Yang Li, Tianyu Li
, Li Chen, Chonghao Sima, Yuting Wang, Shengyin Jiang, Feng Wen, Hang Xu, Ping Luo, Junchi Yan, Wei Zhang, Jun Yao, Yu Qiao, Hongyang Li:
Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving. CoRR abs/2304.10440 (2023) - [i274]Zeyu Lu, Chengyue Wu, Xinyuan Chen, Yaohui Wang, Lei Bai, Yu Qiao, Xihui Liu:
Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation. CoRR abs/2304.11829 (2023) - [i273]Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao:
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model. CoRR abs/2304.15010 (2023) - [i272]Jiashuo Yu, Yaohui Wang, Xinyuan Chen, Xiao Sun, Yu Qiao:
Long-Term Rhythmic Video Soundtracker. CoRR abs/2305.01319 (2023) - [i271]Yaohui Wang, Xin Ma, Xinyuan Chen, Antitza Dantcheva, Bo Dai, Yu Qiao:
LEO: Generative Latent Image Animator for Human Video Synthesis. CoRR abs/2305.03989 (2023) - [i270]Mingzhou Liu, Xinwei Sun
, Yu Qiao, Yizhou Wang:
Causal Discovery with Unobserved Variables: A Proxy Variable Approach. CoRR abs/2305.05281 (2023) - [i269]Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Zeqiang Lai, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, Limin Wang, Ping Luo, Jifeng Dai, Yu Qiao:
InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language. CoRR abs/2305.05662 (2023) - [i268]Kunchang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, Limin Wang, Yu Qiao:
VideoChat: Chat-Centric Video Understanding. CoRR abs/2305.06355 (2023) - [i267]Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai:
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks. CoRR abs/2305.11175 (2023) - [i266]Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li:
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model. CoRR abs/2305.11176 (2023) - [i265]Guo Chen, Yin-Dong Zheng, Jiahao Wang, Jilan Xu, Yifei Huang, Junting Pan, Yi Wang, Yali Wang, Yu Qiao, Tong Lu, Limin Wang:
VideoLLM: Modeling Video Sequence with Large Language Models. CoRR abs/2305.13292 (2023) - [i264]Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo:
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought. CoRR abs/2305.15021 (2023) - [i263]Jinjin Gu, Xianzheng Ma, Xiangtao Kong, Yu Qiao, Chao Dong:
Networks are Slacking Off: Understanding Generalization Problem in Image Deraining. CoRR abs/2305.15134 (2023) - [i262]Shilin Yan, Renrui Zhang, Ziyu Guo, Wenchao Chen, Wei Zhang, Hongyang Li, Yu Qiao, Zhongjiang He, Peng Gao:
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation. CoRR abs/2305.16318 (2023) - [i261]Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, Jifeng Dai:
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory. CoRR abs/2305.17144 (2023) - [i260]Mengzhao Chen, Wenqi Shao, Peng Xu, Mingbao Lin, Kaipeng Zhang, Fei Chao, Rongrong Ji, Yu Qiao, Ping Luo:
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers. CoRR abs/2305.17997 (2023) - [i259]Xiaoliang Ju, Zhaoyang Huang, Yijin Li, Guofeng Zhang, Yu Qiao, Hongsheng Li:
DiffRoom: Diffusion-based High-Quality 3D Room Reconstruction and Generation with Occupancy Prior. CoRR abs/2306.00519 (2023) - [i258]Jiakang Yuan, Bo Zhang, Xiangchao Yan, Tao Chen, Botian Shi, Yikang Li, Yu Qiao:
AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset. CoRR abs/2306.00612 (2023) - [i257]Zeqiang Lai, Yuchen Duan, Jifeng Dai, Ziheng Li, Ying Fu, Hongsheng Li, Yu Qiao, Wenhai Wang:
Denoising Diffusion Semantic Segmentation with Mask Prior Modeling. CoRR abs/2306.01721 (2023) - [i256]Tao Ma, Xuemeng Yang, Hongbin Zhou, Xin Li, Botian Shi, Junjie Liu, Yuchen Yang, Zhizheng Liu, Liang He, Yu Qiao, Yikang Li, Hongsheng Li:
DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds. CoRR abs/2306.06023 (2023) - [i255]Peng Xu, Wenqi Shao, Kaipeng Zhang, Peng Gao, Shuo Liu, Meng Lei, Fanqing Meng, Siyuan Huang, Yu Qiao, Ping Luo:
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models. CoRR abs/2306.09265 (2023) - [i254]Dequan Wang, Xiaosong Wang, Lilong Wang, Mengzhang Li, Qian Da, Xiaoqiang Liu, Xiangyu Gao, Jun Shen, Junjun He, Tian Shen, Qi Duan, Jie Zhao, Kang Li, Yu Qiao, Shaoting Zhang:
MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification. CoRR abs/2306.09579 (2023) - [i253]Yue Yang, Kaipeng Zhang, Yuying Ge, Wenqi Shao, Zeyue Xue, Yu Qiao, Ping Luo:
Align, Adapt and Inject: Sound-guided Unified Image Generation. CoRR abs/2306.11504 (2023) - [i252]Junting Pan, Ziyi Lin, Yuying Ge, Xiatian Zhu
, Renrui Zhang, Yi Wang, Yu Qiao, Hongsheng Li:
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models. CoRR abs/2306.11732 (2023) - [i251]Junting Pan, Keqiang Sun, Yuying Ge, Hao Li, Haodong Duan, Xiaoshi Wu, Renrui Zhang, Aojun Zhou, Zipeng Qin, Yi Wang, Jifeng Dai, Yu Qiao, Hongsheng Li:
JourneyDB: A Benchmark for Generative Image Understanding. CoRR abs/2307.00716 (2023) - [i250]Yuwei Guo, Ceyuan Yang, Anyi Rao
, Yaohui Wang, Yu Qiao, Dahua Lin, Bo Dai:
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning. CoRR abs/2307.04725 (2023) - [i249]Licheng Wen, Daocheng Fu, Song Mao, Pinlong Cai, Min Dou, Yikang Li, Yu Qiao:
LimSim: A Long-term Interactive Multi-scenario Traffic Simulator. CoRR abs/2307.06648 (2023) - [i248]Yi Wang, Yinan He, Yizhuo Li, Kunchang Li, Jiashuo Yu
, Xin Ma, Xinyuan Chen, Yaohui Wang, Ping Luo, Ziwei Liu, Yali Wang, Limin Wang, Yu Qiao:
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation. CoRR abs/2307.06942 (2023) - [i247]Daocheng Fu, Xin Li, Licheng Wen, Min Dou, Pinlong Cai, Botian Shi, Yu Qiao:
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models. CoRR abs/2307.07162 (2023) - [i246]Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue:
Meta-Transformer: A Unified Framework for Multimodal Learning. CoRR abs/2307.10802 (2023) - [i245]Yunkun Zhang, Jin Gao, Mu Zhou, Xiaosong Wang, Yu Qiao, Shaoting Zhang, Dequan Wang:
Text-guided Foundation Model Adaptation for Pathological Image Classification. CoRR abs/2307.14901 (2023) - [i244]Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Fei Yuan, Xiao Luo, Yu Qiao, Yiran Zhong:
Scaling TransNormer to 175 Billion Parameters. CoRR abs/2307.14995 (2023) - [i243]Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao:
Scaling Data Generation in Vision-and-Language Navigation. CoRR abs/2307.15644 (2023) - [i242]Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, Linjie Xing, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao:
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World. CoRR abs/2308.01907 (2023) - [i241]Wenqi Shao, Yutao Hu, Peng Gao, Meng Lei, Kaipeng Zhang, Fanqing Meng, Peng Xu, Siyuan Huang, Hongsheng Li, Yu Qiao, Ping Luo:
Tiny LVLM-eHub: Early Multimodal Experiments with Bard. CoRR abs/2308.03729 (2023) - [i240]Fanqing Meng, Wenqi Shao, Zhanglin Peng, Chonghe Jiang, Kaipeng Zhang, Yu Qiao, Ping Luo:
Foundation Model is Efficient Multimodal Multitask Model Selector. CoRR abs/2308.06262 (2023) - [i239]Lihe Yang, Zhen Zhao, Lei Qi, Yu Qiao
, Yinghuan Shi, Hengshuang Zhao:
Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning. CoRR abs/2308.06777 (2023) - [i238]Bingkun Huang, Zhiyu Zhao, Guozhen Zhang, Yu Qiao, Limin Wang:
MGMAE: Motion Guided Masking for Video Masked Autoencoding. CoRR abs/2308.10794 (2023) - [i237]Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, Ping Luo:
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models. CoRR abs/2308.13137 (2023) - [i236]Xinqi Lin, Jingwen He, Ziyan Chen
, Zhaoyang Lyu, Ben Fei, Bo Dai, Wanli Ouyang
, Yu Qiao, Chao Dong:
DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior. CoRR abs/2308.15070 (2023) - [i235]Junlong Cheng, Jin Ye, Zhongying Deng, Jianpin Chen, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Junjun He, Shaoting Zhang, Min Zhu, Yu Qiao:
SAM-Med2D. CoRR abs/2308.16184 (2023) - [i234]Wenlong Zhang, Xiaohui Li, Xiangyu Chen, Yu Qiao, Xiao-Ming Wu, Chao Dong:
SEAL: A Framework for Systematic Evaluation of Real-World Super-Resolution. CoRR abs/2309.03020 (2023) - [i233]Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao:
ImageBind-LLM: Multi-modality Instruction Tuning. CoRR abs/2309.03905 (2023) - [i232]Ziyan Huang, Zhongying Deng, Jin Ye, Haoyu Wang, Yanzhou Su, Tianbin Li, Hui Sun, Junlong Cheng, Jianpin Chen, Junjun He, Yun Gu, Shaoting Zhang, Lixu Gu, Yu Qiao:
A-Eval: A Benchmark for Cross-Dataset Evaluation of Abdominal Multi-Organ Segmentation. CoRR abs/2309.03906 (2023) - [i231]Xiangyu Chen, Zheyuan Li, Zhengwen Zhang, Jimmy S. Ren, Yihao Liu, Jingwen He, Yu Qiao, Jiantao Zhou, Chao Dong:
Towards Efficient SDRTV-to-HDRTV by Learning from Image Formation. CoRR abs/2309.04084 (2023) - [i230]Xiangyu Chen, Xintao Wang, Wenlong Zhang, Xiangtao Kong, Yu Qiao, Jiantao Zhou, Chao Dong:
HAT: Hybrid Attention Transformer for Image Restoration. CoRR abs/2309.05239 (2023) - [i229]Bo Zhang, Xinyu Cai, Jiakang Yuan, Donglin Yang, Jianfei Guo, Xiangchao Yan, Renqiu Xia, Botian Shi, Min Dou, Tao Chen, Si Liu, Junchi Yan, Yu Qiao:
ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation. CoRR abs/2309.05527 (2023) - [i228]Youquan Liu, Runnan Chen, Xin Li, Lingdong Kong, Yuchen Yang, Zhaoyang Xia, Yeqi Bai, Xinge Zhu, Yuexin Ma, Yikang Li, Yu Qiao, Yuenan Hou:
UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase. CoRR abs/2309.05573 (2023) - [i227]Xiangchao Yan, Runjian Chen, Bo Zhang, Jiakang Yuan, Xinyu Cai, Botian Shi, Wenqi Shao, Junchi Yan, Ping Luo, Yu Qiao:
SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving. CoRR abs/2309.10527 (2023) - [i226]Renqiu Xia, Bo Zhang, Haoyang Peng, Ning Liao, Peng Ye, Botian Shi, Junchi Yan, Yu Qiao
:
StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding. CoRR abs/2309.11268 (2023) - [i225]Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu
, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen
, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu:
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models. CoRR abs/2309.15103 (2023) - [i224]Pan Zhang, Xiaoyi Dong, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Shuangrui Ding, Songyang Zhang
, Haodong Duan, Wenwei Zhang, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang:
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition. CoRR abs/2309.15112 (2023) - [i223]Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yu Qiao:
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models. CoRR abs/2309.16292 (2023) - [i222]Mingzhou Liu, Xinwei Sun, Ching-Wen Lee, Yu Qiao
, Yizhou Wang:
Exploring Counterfactual Alignment Loss towards Human-centered AI. CoRR abs/2310.01766 (2023) - [i221]Zhanhui Zhou, Jie Liu, Chao Yang, Jing Shao, Yu Liu, Xiangyu Yue, Wanli Ouyang
, Yu Qiao:
Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization for Language Models. CoRR abs/2310.03708 (2023) - [i220]Hao Zhang, Kaipeng Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo, Yu Qiao:
Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face. CoRR abs/2310.05056 (2023) - [i219]Ning Liao, Shaofeng Zhang, Renqiu Xia, Bo Zhang, Min Cao, Yu Qiao, Junchi Yan:
REVO-LION: Evaluating and Refining Vision-Language Instruction Tuning Datasets. CoRR abs/2310.06594 (2023) - [i218]Zeqiang Lai, Xizhou Zhu, Jifeng Dai, Yu Qiao, Wenhai Wang:
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models. CoRR abs/2310.07653 (2023) - [i217]Bo Peng, Xinyuan Chen, Yaohui Wang, Chaochao Lu, Yu Qiao:
ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation. CoRR abs/2310.07697 (2023) - [i216]Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, Ping Luo:
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models. CoRR abs/2310.08582 (2023) - [i215]Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang
, Xianglong He, Tong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Wanli Ouyang
:
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm. CoRR abs/2310.08586 (2023) - [i214]Yihao Liu, Xiangyu Chen, Xianzheng Ma, Xintao Wang, Jiantao Zhou, Yu Qiao, Chao Dong:
Unifying Image Processing as Visual Prompting Question Answering. CoRR abs/2310.10513 (2023) - [i213]Xiangyu Chen, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, Chao Dong:
A Comparative Study of Image Restoration Networks for General Backbone Network Design. CoRR abs/2310.11881 (2023) - [i212]Haoyu Wang, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tianbin Li, Jianpin Chen, Yanzhou Su, Ziyan Huang, Yiqing Shen, Bin Fu, Shaoting Zhang, Junjun He, Yu Qiao:
SAM-Med3D. CoRR abs/2310.15161 (2023) - [i211]Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li:
Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection. CoRR abs/2310.15670 (2023) - [i210]Zhaoyang Liu
, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Zhiheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang:
ControlLLM: Augment Language Models with Tools by Searching on Graphs. CoRR abs/2310.17796 (2023) - [i209]Yizhuo Li, Kunchang Li, Yinan He, Yi Wang, Yali Wang, Limin Wang, Yu Qiao, Ping Luo:
Harvest Video Foundation Models via Efficient Post-Pretraining. CoRR abs/2310.19554 (2023) - [i208]Xinyuan Chen, Yaohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Yu
, Yali Wang, Dahua Lin, Yu Qiao, Ziwei Liu:
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction. CoRR abs/2310.20700 (2023) - [i207]Zeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng
, Wanli Ouyang
, Yu Qiao, Jing Shao:
Octavius: Mitigating Task Interference in MLLMs via MoE. CoRR abs/2311.02684 (2023) - [i206]Zhelun Shi, Zhipin Wang, Hongxing Fan, Zhenfei Yin, Lu Sheng
, Yu Qiao, Jing Shao:
ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models. CoRR abs/2311.02692 (2023) - [i205]Zhiyu Zhao, Bingkun Huang, Sen Xing, Gangshan Wu, Yu Qiao, Limin Wang:
Asymmetric Masked Distillation for Pre-Training Small Foundation Models. CoRR abs/2311.03149 (2023) - [i204]Licheng Wen, Xuemeng Yang, Daocheng Fu, Xiaofeng Wang, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao:
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving. CoRR abs/2311.05332 (2023) - [i203]Yixu Wang, Yan Teng
, Kexin Huang, Chengqi Lyu, Songyang Zhang, Wenwei Zhang, Xingjun Ma, Yu-Gang Jiang, Yu Qiao, Yingchun Wang:
Fake Alignment: Are LLMs Really Aligned Well? CoRR abs/2311.05915 (2023) - [i202]Ziyi Lin, Chris Liu, Renrui Zhang, Peng Gao, Longtian Qiu, Han Xiao, Han Qiu, Chen Lin, Wenqi Shao, Keqin Chen, Jiaming Han, Siyuan Huang, Yichi Zhang, Xuming He, Hongsheng Li, Yu Qiao:
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models. CoRR abs/2311.07575 (2023) - [i201]Zhihang Zhong, Gurunandan Krishnan, Xiao Sun, Yu Qiao, Sizhuo Ma, Jian Wang:
Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation. CoRR abs/2311.08007 (2023) - [i200]Fangzhi Xu, Zhiyong Wu, Qiushi Sun, Siyu Ren, Fei Yuan, Shuai Yuan
, Qika Lin, Yu Qiao, Jun Liu:
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models. CoRR abs/2311.09278 (2023) - [i199]Jin Ye, Junlong Cheng, Jianpin Chen, Zhongying Deng, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Min Zhu, Shaoting Zhang, Junjun He, Yu Qiao:
SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks. CoRR abs/2311.11969 (2023) - [i198]Yangyang Xu, Shengfeng He, Wenqi Shao, Kwan-Yee K. Wong, Yu Qiao, Ping Luo:
DiffusionMat: Alpha Matting as Sequential Refinement Learning. CoRR abs/2311.13535 (2023) - [i197]Yu Yi, Xue Yang, Qingyun Li, Feipeng Da, Junchi Yan, Jifeng Dai, Yu Qiao:
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision. CoRR abs/2311.14758 (2023) - [i196]Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C. Kot, Bihan Wen:
SinSR: Diffusion-Based Image Super-Resolution in a Single Step. CoRR abs/2311.14760 (2023) - [i195]Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Luo, Limin Wang, Yu Qiao
:
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark. CoRR abs/2311.17005 (2023) - [i194]Xin Liu, Yichen Zhu, Yunshi Lan, Chao Yang, Yu Qiao:
Query-Relevant Images Jailbreak Large Multi-Modal Models. CoRR abs/2311.17600 (2023) - [i193]Ziqi Huang, Yinan He, Jiashuo Yu
, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu:
VBench: Comprehensive Benchmark Suite for Video Generative Models. CoRR abs/2311.17982 (2023) - [i192]Yanqing Liu, Kai Wang, Wenqi Shao, Ping Luo, Yu Qiao, Mike Zheng Shou, Kaipeng Zhang, Yang You:
MLLMs-Augmented Visual-Language Representation Learning. CoRR abs/2311.18765 (2023) - [i191]Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu:
VideoBooth: Diffusion-based Video Generation with Image Prompts. CoRR abs/2312.00777 (2023) - [i190]Hongyang Li, Yang Li, Huijie Wang, Jia Zeng, Pinlong Cai, Huilin Xu, Dahua Lin, Junchi Yan, Feng Xu, Lu Xiong, Jingdong Wang, Futang Zhu, Kai Yan, Chunjing Xu, Tiancai Wang, Beipeng Mu, Shaoqing Ren, Zhihui Peng, Yu Qiao:
Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future. CoRR abs/2312.03408 (2023) - [i189]Jiaming Han, Kaixiong Gong
, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue:
OneLLM: One Framework to Align All Modalities with Language. CoRR abs/2312.03700 (2023) - [i188]Xin Li, Yeqi Bai, Pinlong Cai, Licheng Wen, Daocheng Fu, Bo Zhang, Xuemeng Yang, Xinyu Cai, Tao Ma, Jianfei Guo, Xing Gao, Min Dou, Yikang Li, Botian Shi, Yong Liu, Liang He, Yu Qiao:
Towards Knowledge-driven Autonomous Driving. CoRR abs/2312.04316 (2023) - [i187]Pengcheng Chen, Ziyan Huang, Zhongying Deng, Tianbin Li, Yanzhou Su, Haoyu Wang, Jin Ye, Yu Qiao, Junjun He:
Enhancing Medical Task Performance in GPT-4V: A Comprehensive Study on Prompt Engineering Strategies. CoRR abs/2312.04344 (2023) - [i186]Hongjie Zhang, Yi Liu, Lu Dong, Yifei Huang, Zhen-Hua Ling, Yali Wang, Limin Wang, Yu Qiao:
MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding. CoRR abs/2312.04817 (2023) - [i185]Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao
, Hengshuang Zhao:
TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation. CoRR abs/2312.06630 (2023) - [i184]Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, Lu Sheng
:
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion. CoRR abs/2312.06725 (2023) - [i183]Yuchen Yang, Yu Qiao, Xiao Sun:
Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation. CoRR abs/2312.07051 (2023) - [i182]Yiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng
, Ruimao Zhang, Yu Qiao, Jing Shao:
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception. CoRR abs/2312.07472 (2023) - [i181]Ziteng Cui, Lin Gu, Xiao Sun, Xianzheng Ma, Yu Qiao, Tatsuya Harada:
Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption. CoRR abs/2312.09093 (2023) - [i180]Hao Li, Xue Yang, Zhaokai Wang, Xizhou Zhu, Jie Zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai:
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft. CoRR abs/2312.09238 (2023) - [i179]Wenhai Wang, Jiangwei Xie, Chuanyang Hu, Haoming Zou, Jianan Fan, Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, Hao Tian, Lewei Lu, Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai:
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving. CoRR abs/2312.09245 (2023) - [i178]Xiaoyang Wu, Li Jiang
, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang
, Tong He, Hengshuang Zhao:
Point Transformer V3: Simpler, Faster, Stronger. CoRR abs/2312.10035 (2023) - [i177]Xu Liu, Tong Zhou, Yuanxin Wang
, Yuping Wang, Qinjingwen Cao, Weizhi Du, Yonghuan Yang, Junjun He, Yu Qiao, Yiqing Shen:
Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey. CoRR abs/2312.10163 (2023) - [i176]Siran Chen, Yue Ma, Yu Qiao, Yali Wang:
M-BEV: Masked BEV Perception for Robust Autonomous Driving. CoRR abs/2312.12144 (2023) - [i175]Lingjun Zhang, Xinyuan Chen, Yaohui Wang, Yue Lu, Yu Qiao:
Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model. CoRR abs/2312.12232 (2023) - [i174]Yuanfu Wang, Chao Yang, Ying Wen, Yu Liu, Yu Qiao:
Critic-Guided Decision Transformer for Offline Reinforcement Learning. CoRR abs/2312.13716 (2023) - [i173]Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai:
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks. CoRR abs/2312.14238 (2023) - 2022
- [j76]Diping Song, Fei Li, Cheng Li, Jian Xiong, Junjun He, Xiulan Zhang
, Yu Qiao
:
Asynchronous feature regularization and cross-modal distillation for OCT based glaucoma diagnosis. Comput. Biol. Medicine 151(Part): 106283 (2022) - [j75]Xiaoxing Zeng, Zhelun Wu, Xiaojiang Peng, Yu Qiao:
Joint 3D facial shape reconstruction and texture completion from a single image. Comput. Vis. Media 8(2): 239-256 (2022) - [j74]Fei Li
, Diping Song, Han Chen, Jian Xiong, Xingyi Li
, Hua Zhong, Guangxian Tang
, Sujie Fan, Dennis S. C. Lam
, Weihua Pan, Yajuan Zheng, Ying Li, Guoxiang Qu, Junjun He, Zhe Wang
, Ling Jin, Rouxi Zhou, Yunhe Song, Yi Sun, Weijing Cheng, Chunman Yang, Yazhi Fan, Yingjie Li, Hengli Zhang
, Ye Yuan, Yang Xu, Yunfan Xiong, Lingfei Jin, Aiguo Lv, Lingzhi Niu, Yuhong Liu, Shaoli Li, Jiani Zhang, Linda M. Zangwill, Alejandro F. Frangi
, Tin Aung, Ching-Yu Cheng, Yu Qiao, Xiulan Zhang, Daniel S. W. Ting
:
Author Correction: Development and clinical deployment of a smartphone-based visual field deep learning system for glaucoma detection. npj Digit. Medicine 5 (2022) - [j73]Wenlong Zhang
, Yihao Liu
, Chao Dong
, Yu Qiao
:
RankSRGAN: Super Resolution Generative Adversarial Networks With Learning to Rank. IEEE Trans. Pattern Anal. Mach. Intell. 44(10): 7149-7166 (2022) - [j72]Jingwen He
, Chao Dong
, Yihao Liu
, Yu Qiao
:
Interactive Multi-Dimension Modulation for Image Restoration. IEEE Trans. Pattern Anal. Mach. Intell. 44(12): 9363-9379 (2022) - [j71]Qing Li, Xiaojiang Peng, Yu Qiao, Qi Hao:
Unsupervised person re-identification with multi-label learning guided self-paced clustering. Pattern Recognit. 125: 108521 (2022) - [j70]Weijian Ruan
, Yiran Tao
, Linjun Ruan, Xiujun Shu, Yu Qiao
:
Temporal Weighting Appearance-Aligned Network for Nighttime Video Retrieval. IEEE Signal Process. Lett. 29: 2008-2012 (2022) - [j69]Haiwei Wu
, Jiantao Zhou
, Jinyu Tian
, Jun Liu
, Yu Qiao
:
Robust Image Forgery Detection Against Transmission Over Online Social Networks. IEEE Trans. Inf. Forensics Secur. 17: 443-456 (2022) - [j68]Yi Liu
, Limin Wang
, Yali Wang
, Xiao Ma, Yu Qiao
:
FineAction: A Fine-Grained Video Dataset for Temporal Action Localization. IEEE Trans. Image Process. 31: 6937-6950 (2022) - [j67]Yuhao Liu
, Jiake Xie
, Yu Qiao
, Yong Tang
, Xin Yang:
Prior-Induced Information Alignment for Image Matting. IEEE Trans. Multim. 24: 2727-2738 (2022) - [c221]Yu Qiao, Jincheng Zhu, Chengjiang Long, Zeyao Zhang, Yuxin Wang, Zhenjun Du, Xin Yang:
CPRAL: Collaborative Panoptic-Regional Active Learning for Semantic Segmentation. AAAI 2022: 2108-2116 - [c220]Ziteng Cui, Kunchang Li, Lin Gu, Shenghan Su, Peng Gao, Zhengkai Jiang, Yu Qiao, Tatsuya Harada:
You Only Need 90K Parameters to Adapt Light: a Light Weight Transformer for Image Enhancement and Exposure Correction. BMVC 2022: 238 - [c219]Teli Ma, Shijie Geng, Mengmeng Wang, Sheng Xu, Hongsheng Li, Baochang Zhang, Peng Gao, Yu Qiao:
Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition. BMVC 2022: 481 - [c218]Yu Qiao
, Ziqi Wei, Yuhao Liu, Yuxin Wang, Dongsheng Zhou, Qiang Zhang, Xin Yang:
Wider and Higher: Intensive Integration and Global Foreground Perception for Image Matting. CGI 2022: 541-553 - [c217]Xiaosong Jia, Li Chen, Penghao Wu, Jia Zeng, Junchi Yan, Hongyang Li, Yu Qiao:
Towards Capturing the Temporal Dynamics for Trajectory Prediction: a Coarse-to-Fine Approach. CoRL 2022: 910-920 - [c216]Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Jinjin Gu
, Yu Qiao, Chao Dong:
Blueprint Separable Residual Network for Efficient Image Super-Resolution. CVPR Workshops 2022: 832-842 - [c215]Yawei Li
, Kai Zhang, Radu Timofte
, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu
, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu
, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoglu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye
, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman H. Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gao, Dengwen Zhou, Qian Ning, Jingzhu Tang, Han Huang, Yufei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang:
NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results. CVPR Workshops 2022: 1061-1101 - [c214]Mingfei Han, David Junhao Zhang, Yali Wang, Rui Yan, Lina Yao, Xiaojun Chang
, Yu Qiao:
Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition. CVPR 2022: 2980-2989 - [c213]Xiangtao Kong
, Xina Liu, Jinjin Gu, Yu Qiao
, Chao Dong:
Reflash Dropout in Image Super-Resolution. CVPR 2022: 5992-6002 - [c212]Renrui Zhang, Ziyu Guo, Wei Zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, Hongsheng Li
:
PointCLIP: Point Cloud Understanding by CLIP. CVPR 2022: 8542-8552 - [c211]Mengzhe He, Yali Wang, Jiaxi Wu, Yiru Wang, Hanqing Li, Bo Li, Weihao Gan, Wei Wu, Yu Qiao:
Cross Domain Object Detection by Target-Perceived Dual Branch Distillation. CVPR 2022: 9560-9570 - [c210]Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Yu Qiao, Jifeng Dai:
BEVFormer: Learning Bird's-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers. ECCV (9) 2022: 1-18 - [c209]Sheng Xu, Yanjing Li, Tiancheng Wang, Teli Ma, Baochang Zhang, Peng Gao, Yu Qiao, Jinhu Lü
, Guodong Guo:
Recurrent Bilinear Optimization for Binary Neural Networks. ECCV (24) 2022: 19-35 - [c208]Changyao Tian, Wenhai Wang, Xizhou Zhu, Jifeng Dai, Yu Qiao:
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition. ECCV (25) 2022: 73-91 - [c207]David Junhao Zhang, Kunchang Li, Yali Wang, Yunpeng Chen
, Shashwat Chandra, Yu Qiao, Luoqi Liu, Mike Zheng Shou:
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning. ECCV (35) 2022: 230-248 - [c206]Lin Zhou, Haoming Cai, Jinjin Gu, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Yu Qiao, Chao Dong:
Efficient Image Super-Resolution Using Vast-Receptive-Field Attention. ECCV Workshops (2) 2022: 256-272 - [c205]Yi Wang, Menghan Xia, Lu Qi, Jing Shao, Yu Qiao:
PalGAN: Image Colorization with Palette Generative Adversarial Networks. ECCV (15) 2022: 271-288 - [c204]Ziyi Lin, Shijie Geng, Renrui Zhang, Peng Gao, Gerard de Melo, Xiaogang Wang
, Jifeng Dai, Yu Qiao, Hongsheng Li
:
Frozen CLIP Models are Efficient Video Learners. ECCV (35) 2022: 388-404 - [c203]Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu:
Self-slimmed Vision Transformer. ECCV (11) 2022: 432-448 - [c202]Renrui Zhang, Wei Zhang, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li
:
Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification. ECCV (35) 2022: 493-510 - [c201]Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Kun Wang, Zhenfei Yin, Lu Sheng
, Ziwei Liu, Yu Qiao, Jing Shao:
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation. ECCV (26) 2022: 509-528 - [c200]Li Chen, Chonghao Sima, Yang Li, Zehan Zheng, Jiajie Xu, Xiangwei Geng, Hongyang Li, Conghui He, Jianping Shi, Yu Qiao, Junchi Yan:
PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark. ECCV (38) 2022: 550-567 - [c199]Kunchang Li
, Yali Wang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao:
UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning. ICLR 2022 - [c198]Yi Liu
, Xuan Zhang, Ying Li, Guixin Liang, Yabing Jiang, Lixia Qiu, Haiping Tang, Fei Xie, Wei Yao, Yi Dai, Yu Qiao, Yali Wang:
VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection. ICPR 2022: 4967-4973 - [c197]Bin Wang, Yu Qiao, Dahua Lin, Stephen D. H. Yang, Weijia Li:
Cycle-Consistent Learning for Weakly Supervised Semantic Segmentation. HCMA@MM 2022: 7-13 - [c196]Yue Ma, Yali Wang, Yue Wu, Ziyu Lyu, Siran Chen
, Xiu Li, Yu Qiao:
Visual Knowledge Graph for Human Action Reasoning in Videos. ACM Multimedia 2022: 4132-4141 - [c195]Peng Gao, Teli Ma, Hongsheng Li, Ziyi Lin, Jifeng Dai, Yu Qiao:
MCMAE: Masked Convolution Meets Masked Autoencoders. NeurIPS 2022 - [c194]Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, Yu Qiao:
Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline. NeurIPS 2022 - [c193]Renrui Zhang, Ziyu Guo, Peng Gao, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li:
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training. NeurIPS 2022 - [i172]Kunchang Li, Yali Wang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao:
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning. CoRR abs/2201.04676 (2022) - [i171]Mingye Xu, Zhipeng Zhou, Hongbin Xu, Yali Wang, Yu Qiao:
CP-Net: Contour-Perturbed Reconstruction Network for Self-Supervised Point Cloud Learning. CoRR abs/2201.08215 (2022) - [i170]Kunchang Li, Yali Wang, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao:
UniFormer: Unifying Convolution and Self-attention for Visual Recognition. CoRR abs/2201.09450 (2022) - [i169]Kexue Fu, Peng Gao, Renrui Zhang, Hongsheng Li, Yu Qiao, Manning Wang:
Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning. CoRR abs/2202.04241 (2022) - [i168]Yuanhan Zhang, Qinghong Sun, Yichun Zhou, Zexin He
, Zhenfei Yin, Kun Wang, Lu Sheng
, Yu Qiao, Jing Shao, Ziwei Liu:
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy. CoRR abs/2203.07845 (2022) - [i167]Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Wang Kun, Zhenfei Yin, Lu Sheng
, Ziwei Liu, Yu Qiao, Jing Shao:
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation. CoRR abs/2203.08764 (2022) - [i166]Li Chen, Chonghao Sima, Yang Li, Zehan Zheng
, Jiajie Xu, Xiangwei Geng, Hongyang Li, Conghui He, Jianping Shi, Yu Qiao, Junchi Yan:
PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark. CoRR abs/2203.11089 (2022) - [i165]Renrui Zhang, Han Qiu, Tai Wang, Xuanzhuo Xu, Ziyu Guo, Yu Qiao, Peng Gao, Hongsheng Li:
MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection. CoRR abs/2203.13310 (2022) - [i164]Kexue Fu, Peng Gao, Shaolei Liu, Renrui Zhang, Yu Qiao, Manning Wang:
POS-BERT: Point Cloud One-Stage BERT Pre-Training. CoRR abs/2204.00989 (2022) - [i163]Mingfei Han, David Junhao Zhang, Yali Wang, Rui Yan, Lina Yao, Xiaojun Chang, Yu Qiao:
Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition. CoRR abs/2204.02148 (2022) - [i162]Mengzhe He, Yali Wang, Jiaxi Wu, Yiru Wang, Hanqing Li, Bo Li, Weihao Gan, Wei Wu, Yu Qiao:
Cross Domain Object Detection by Target-Perceived Dual Branch Distillation. CoRR abs/2205.01291 (2022) - [i161]Peng Gao, Teli Ma, Hongsheng Li, Ziyi Lin, Jifeng Dai, Yu Qiao:
ConvMAE: Masked Convolution Meets Masked Autoencoders. CoRR abs/2205.03892 (2022) - [i160]Yawei Li
, Kai Zhang, Radu Timofte
, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao
, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, et al.:
NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results. CoRR abs/2205.05675 (2022) - [i159]Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Jinjin Gu, Yu Qiao, Chao Dong:
Blueprint Separable Residual Network for Efficient Image Super-Resolution. CoRR abs/2205.05996 (2022) - [i158]Yihao Liu
, Hengyuan Zhao, Jinjin Gu
, Yu Qiao, Chao Dong:
Evaluating the Generalization Ability of Super-Resolution Networks. CoRR abs/2205.07019 (2022) - [i157]Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao:
Vision Transformer Adapter for Dense Predictions. CoRR abs/2205.08534 (2022) - [i156]Renrui Zhang, Ziyu Guo, Peng Gao, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li:
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training. CoRR abs/2205.14401 (2022) - [i155]Ziteng Cui, Kunchang Li, Lin Gu, Shenghan Su, Peng Gao, Zhengkai Jiang, Yu Qiao, Tatsuya Harada:
Illumination Adaptive Transformer. CoRR abs/2205.14871 (2022) - [i154]Chenxin Tao, Xizhou Zhu, Gao Huang, Yu Qiao, Xiaogang Wang, Jifeng Dai:
Siamese Image Modeling for Self-Supervised Vision Representation Learning. CoRR abs/2206.01204 (2022) - [i153]Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, Yu Qiao
:
Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline. CoRR abs/2206.08129 (2022) - [i152]Li Chen, Tutian Tang, Zhitian Cai, Yang Li, Penghao Wu, Hongyang Li, Jianping Shi, Junchi Yan, Yu Qiao:
Level 2 Autonomous Driving on a Single Device: Diving into the Devils of Openpilot. CoRR abs/2206.08176 (2022) - [i151]Mingye Xu, Yali Wang, Yihao Liu
, Yu Qiao:
CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm. CoRR abs/2207.05359 (2022) - [i150]Renrui Zhang, Zhang Wei, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li:
Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification. CoRR abs/2207.09519 (2022) - [i149]Yuexin Ma, Tai Wang, Xuyang Bai, Huitong Yang, Yuenan Hou, Yaming Wang, Yu Qiao, Ruigang Yang, Dinesh Manocha, Xinge Zhu:
Vision-Centric BEV Perception: A Survey. CoRR abs/2208.02797 (2022) - [i148]Ziyi Lin, Shijie Geng, Renrui Zhang, Peng Gao, Gerard de Melo, Xiaogang Wang, Jifeng Dai, Yu Qiao, Hongsheng Li:
Frozen CLIP Models are Efficient Video Learners. CoRR abs/2208.03550 (2022) - [i147]Sheng Xu, Yanjing Li, Tiancheng Wang, Teli Ma, Baochang Zhang, Peng Gao, Yu Qiao
, Jinhu Lv, Guodong Guo:
Recurrent Bilinear Optimization for Binary Neural Networks. CoRR abs/2209.01542 (2022) - [i146]Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Enze Xie, Zhiqi Li, Hanming Deng, Hao Tian, Xizhou Zhu, Li Chen, Yulu Gao, Xiangwei Geng, Jia Zeng, Yang Li, Jiazhi Yang, Xiaosong Jia, Bohan Yu, Yu Qiao, Dahua Lin, Si Liu, Junchi Yan, Jianping Shi, Ping Luo:
Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe. CoRR abs/2209.05324 (2022) - [i145]Renrui Zhang, Hanqiu Deng, Bohao Li, Wei Zhang, Hao Dong, Hongsheng Li, Peng Gao, Yu Qiao:
Collaboration of Pre-trained Models Makes Better Few-shot Learner. CoRR abs/2209.12255 (2022) - [i144]Boyu Chen, Yu Qiao, Yali Wang:
Low-Resolution Action Recognition for Tiny Actions Challenge. CoRR abs/2209.14711 (2022) - [i143]Lin Zhou, Haoming Cai, Jinjin Gu, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Yu Qiao, Chao Dong:
Efficient Image Super-Resolution using Vast-Receptive-Field Attention. CoRR abs/2210.05960 (2022) - [i142]Yu Qiao, Yuhao Liu, Ziqi Wei, Yuxin Wang, Qiang Cai, Guofeng Zhang, Xin Yang:
Hierarchical and Progressive Image Matting. CoRR abs/2210.06906 (2022) - [i141]Yu Qiao
, Ziqi Wei, Yuhao Liu, Yuxin Wang, Dongsheng Zhou, Qiang Zhang, Xin Yang:
Wider and Higher: Intensive Integration and Global Foreground Perception for Image Matting. CoRR abs/2210.06919 (2022) - [i140]Yi Liu, Xuan Zhang, Ying Li, Guixin Liang, Yabing Jiang, Lixia Qiu, Haiping Tang, Fei Xie, Wei Yao, Yi Dai, Yu Qiao, Yali Wang:
VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection. CoRR abs/2210.11158 (2022) - [i139]Yi Wang, Menghan Xia, Lu Qi, Jing Shao, Yu Qiao:
PalGAN: Image Colorization with Palette Generative Adversarial Networks. CoRR abs/2210.11204 (2022) - [i138]Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao:
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions. CoRR abs/2211.05778 (2022) - [i137]Jifeng Dai, Min Shi, Weiyun Wang, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhu, Lewei Lu, Jie Zhou, Xiaogang Wang, Yu Qiao, Xiaowei Hu:
Demystify Transformers & Convolutions in Modern Image Deep Networks. CoRR abs/2211.05781 (2022) - [i136]Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo
:
Stare at What You See: Masked Image Modeling without Reconstruction. CoRR abs/2211.08887 (2022) - [i135]Guo Chen, Sen Xing, Zhe Chen, Yi Wang, Kunchang Li, Yizhuo Li, Yi Liu, Jiahao Wang, Yin-Dong Zheng, Bingkun Huang, Zhiyu Zhao, Junting Pan, Yifei Huang, Zun Wang, Jiashuo Yu
, Yinan He, Hongjie Zhang, Tong Lu, Yali Wang, Limin Wang, Yu Qiao:
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges. CoRR abs/2211.09529 (2022) - [i134]Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Limin Wang, Yu Qiao:
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer. CoRR abs/2211.09552 (2022) - [i133]Weijie Su, Xizhou Zhu, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie Zhou, Jifeng Dai:
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information. CoRR abs/2211.09807 (2022) - [i132]