default search action
Hongsheng Li 0001
Person information
- affiliation: Chinese University of Hong Kong, Department of Electrical Engineering, CUHK-SenseTime Joint Laboratory, Hong Kong
- affiliation (former): Lehigh University, Department of Computer Science and Engineering, PA, USA
Other persons with the same name
- Hongsheng Li — disambiguation page
- Hongsheng Li 0002 — Southeast University, School of Instrument Science and Engineering, Nanjing, China
- Hongsheng Li 0003 — Xidian University, School of Computer Science and Technology, Xi'an, China
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j57]Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, Yu Qiao:
CLIP-Adapter: Better Vision-Language Models with Feature Adapters. Int. J. Comput. Vis. 132(2): 581-595 (2024) - [j56]Peng Gao, Ziyi Lin, Renrui Zhang, Rongyao Fang, Hongyang Li, Hongsheng Li, Yu Qiao:
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking. Int. J. Comput. Vis. 132(5): 1546-1556 (2024) - [j55]Keqiang Sun, Shangzhe Wu, Ning Zhang, Zhaoyang Huang, Quan Wang, Hongsheng Li:
CGOF++: Controllable 3D Face Synthesis With Conditional Generative Occupancy Fields. IEEE Trans. Pattern Anal. Mach. Intell. 46(2): 913-926 (2024) - [j54]Fangzhou Hong, Lingdong Kong, Hui Zhou, Xinge Zhu, Hongsheng Li, Ziwei Liu:
Unified 3D and 4D Panoptic Segmentation via Dynamic Shifting Networks. IEEE Trans. Pattern Anal. Mach. Intell. 46(5): 3480-3495 (2024) - [j53]Yan Xu, Kwan-Yee Lin, Guofeng Zhang, Xiaogang Wang, Hongsheng Li:
RNNPose: 6-DoF Object Pose Estimation via Recurrent Correspondence Field Estimation and Pose Optimization. IEEE Trans. Pattern Anal. Mach. Intell. 46(7): 4669-4683 (2024) - [j52]Rongyao Fang, Peng Gao, Aojun Zhou, Yingjie Cai, Si Liu, Jifeng Dai, Hongsheng Li:
FeatAug-DETR: Enriching One-to-Many Matching for DETRs With Feature Augmentation. IEEE Trans. Pattern Anal. Mach. Intell. 46(9): 6402-6415 (2024) - [j51]Jihao Liu, Jinliang Zheng, Boxiao Liu, Yu Liu, Hongsheng Li:
Enhancing Vision-Language Model with Unmasked Token Alignment. Trans. Mach. Learn. Res. 2024 (2024) - [j50]Lin Zhao, Hui Zhou, Xinge Zhu, Xiao Song, Hongsheng Li, Wenbing Tao:
LIF-Seg: LiDAR and Camera Image Fusion for 3D LiDAR Semantic Segmentation. IEEE Trans. Multim. 26: 1158-1168 (2024) - [j49]Zipeng Qin, Jianbo Liu, Xiaolin Zhang, Maoqing Tian, Aojun Zhou, Shuai Yi, Hongsheng Li:
Pyramid Fusion Transformer for Semantic Segmentation. IEEE Trans. Multim. 26: 9630-9643 (2024) - [j48]Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Xiaogang Wang, Hongsheng Li:
Structured Domain Adaptation With Online Relation Regularization for Unsupervised Person Re-ID. IEEE Trans. Neural Networks Learn. Syst. 35(1): 258-271 (2024) - [c204]Houxing Ren, Mingjie Zhan, Zhongyuan Wu, Hongsheng Li:
Empowering Character-level Text Infilling by Eliminating Sub-Tokens. ACL (1) 2024: 3253-3267 - [c203]Xudong Lu, Qi Liu, Yuhui Xu, Aojun Zhou, Siyuan Huang, Bo Zhang, Junchi Yan, Hongsheng Li:
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models. ACL (1) 2024: 6159-6172 - [c202]Xiaoliang Ju, Zhaoyang Huang, Yijiin Li, Guofeng Zhang, Yu Qiao, Hongsheng Li:
DiffInDScene: Diffusion-Based High-Quality 3D Indoor Scene Generation. CVPR 2024: 4526-4535 - [c201]Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai:
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications. CVPR 2024: 5652-5661 - [c200]Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L. Waslander, Yu Liu, Hongsheng Li:
LMDrive: Closed-Loop End-to-End Driving with Large Language Models. CVPR 2024: 15120-15130 - [c199]Yang Zhou, Hao Shao, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu:
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction. CVPR 2024: 15281-15290 - [c198]Hao Li, Xue Yang, Zhaokai Wang, Xizhou Zhu, Jie Zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai:
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft. CVPR 2024: 16426-16435 - [c197]Jihao Liu, Jinliang Zheng, Yu Liu, Hongsheng Li:
GLID: Pre-training a Generalist Encoder-Decoder Vision Model. CVPR 2024: 22851-22860 - [c196]Yijin Li, Yichen Shen, Zhaoyang Huang, Shuo Chen, Weikang Bian, Xiaoyu Shi, Fu-Yun Wang, Keqiang Sun, Hujun Bao, Zhaopeng Cui, Guofeng Zhang, Hongsheng Li:
BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation Using RGB Frames and Events. ECCV (67) 2024: 19-36 - [c195]Ziyi Lin, Dongyang Liu, Renrui Zhang, Peng Gao, Longtian Qiu, Han Xiao, Han Qiu, Wenqi Shao, Keqin Chen, Jiaming Han, Siyuan Huang, Yichi Zhang, Xuming He, Yu Qiao, Hongsheng Li:
SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models. ECCV (62) 2024: 36-55 - [c194]Haiyang Wang, Hao Tang, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, Liwei Wang:
GiT: Towards Generalist Vision Transformer Through Universal Language Interface. ECCV (29) 2024: 55-73 - [c193]Keqiang Sun, Dor Litvak, Yunzhi Zhang, Hongsheng Li, Jiajun Wu, Shangzhe Wu:
Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos. ECCV (1) 2024: 100-119 - [c192]Xiaoshi Wu, Yiming Hao, Manyuan Zhang, Keqiang Sun, Zhaoyang Huang, Guanglu Song, Yu Liu, Hongsheng Li:
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models. ECCV (83) 2024: 108-124 - [c191]Benjin Zhu, Zhe Wang, Hongsheng Li:
nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding. ECCV (5) 2024: 125-141 - [c190]Manyuan Zhang, Guanglu Song, Xiaoyu Shi, Yu Liu, Hongsheng Li:
Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediction Tasks. ECCV (42) 2024: 128-145 - [c189]Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li:
Be-Your-Outpainter: Mastering Video Outpainting Through Input-Specific Adaptation. ECCV (44) 2024: 153-168 - [c188]Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Yu Qiao, Peng Gao, Hongsheng Li:
MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? ECCV (8) 2024: 169-186 - [c187]Fu-Yun Wang, Zhaoyang Huang, Qiang Ma, Guanglu Song, Xudong Lu, Weikang Bian, Yijin Li, Yu Liu, Hongsheng Li:
ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model. ECCV (45) 2024: 329-345 - [c186]Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Bin Zhao, Zhigang Wang, Peng Gao, Hongsheng Li, Dong Wang, Xuelong Li:
Any2Point: Empowering Any-Modality Large Models for Efficient 3D Understanding. ECCV (36) 2024: 456-473 - [c185]Changyao Tian, Chenxin Tao, Jifeng Dai, Hao Li, Ziheng Li, Lewei Lu, Xiaogang Wang, Hongsheng Li, Gao Huang, Xizhou Zhu:
ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process. ICLR 2024 - [c184]Ke Wang, Houxing Ren, Aojun Zhou, Zimu Lu, Sichun Luo, Weikang Shi, Renrui Zhang, Linqi Song, Mingjie Zhan, Hongsheng Li:
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning. ICLR 2024 - [c183]Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li:
Personalize Segment Anything Model with One Shot. ICLR 2024 - [c182]Renrui Zhang, Jiaming Han, Chris Liu, Aojun Zhou, Pan Lu, Yu Qiao, Hongsheng Li, Peng Gao:
LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention. ICLR 2024 - [c181]Aojun Zhou, Ke Wang, Zimu Lu, Weikang Shi, Sichun Luo, Zipeng Qin, Shaoqing Lu, Anya Jia, Linqi Song, Mingjie Zhan, Hongsheng Li:
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification. ICLR 2024 - [c180]Dongyang Liu, Renrui Zhang, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Yu Qiao, Hongsheng Li, Peng Gao:
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models. ICML 2024 - [c179]Xudong Lu, Aojun Zhou, Yuhui Xu, Renrui Zhang, Peng Gao, Hongsheng Li:
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models. ICML 2024 - [c178]Tao Ma, Zhiwei Zheng, Hongbin Zhou, Xinyu Cai, Xuemeng Yang, Yikang Li, Botian Shi, Hongsheng Li:
VeloVox: A Low-Cost and Accurate 4D Object Detector with Single-Frame Point Cloud of Livox LiDAR. ICRA 2024: 1992-1998 - [c177]Xiaoyu Shi, Zhaoyang Huang, Fu-Yun Wang, Weikang Bian, Dasong Li, Yi Zhang, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li:
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling. SIGGRAPH (Conference Paper Track) 2024: 111 - [c176]Fu-Yun Wang, Zhaoyang Huang, Weikang Bian, Xiaoyu Shi, Keqiang Sun, Guanglu Song, Yu Liu, Hongsheng Li:
AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data. SIGGRAPH Asia Technical Communications 2024: 23:1-23:5 - [i266]Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai:
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications. CoRR abs/2401.06197 (2024) - [i265]Changyao Tian, Xizhou Zhu, Yuwen Xiong, Weiyun Wang, Zhe Chen, Wenhai Wang, Yuntao Chen, Lewei Lu, Tong Lu, Jie Zhou, Hongsheng Li, Yu Qiao, Jifeng Dai:
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer. CoRR abs/2401.10208 (2024) - [i264]Xiaoyu Shi, Zhaoyang Huang, Fu-Yun Wang, Weikang Bian, Dasong Li, Yi Zhang, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li:
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling. CoRR abs/2401.15977 (2024) - [i263]Fu-Yun Wang, Zhaoyang Huang, Xiaoyu Shi, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li:
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning. CoRR abs/2402.00769 (2024) - [i262]Peng Gao, Renrui Zhang, Chris Liu, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao:
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models. CoRR abs/2402.05935 (2024) - [i261]Xudong Lu, Qi Liu, Yuhui Xu, Aojun Zhou, Siyuan Huang, Bo Zhang, Junchi Yan, Hongsheng Li:
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models. CoRR abs/2402.14800 (2024) - [i260]Ke Wang, Junting Pan, Weikang Shi, Zimu Lu, Mingjie Zhan, Hongsheng Li:
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset. CoRR abs/2402.14804 (2024) - [i259]Zimu Lu, Aojun Zhou, Houxing Ren, Ke Wang, Weikang Shi, Junting Pan, Mingjie Zhan, Hongsheng Li:
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs. CoRR abs/2402.16352 (2024) - [i258]Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang:
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures. CoRR abs/2403.02308 (2024) - [i257]Haiyang Wang, Hao Tang, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, Liwei Wang:
GiT: Towards Generalist Vision Transformer through Universal Language Interface. CoRR abs/2403.09394 (2024) - [i256]Siyuan Huang, Iaroslav Ponomarenko, Zhengkai Jiang, Xiaoqi Li, Xiaobin Hu, Peng Gao, Hongsheng Li, Hao Dong:
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models. CoRR abs/2403.11289 (2024) - [i255]Yang Zhou, Hao Shao, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu:
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction. CoRR abs/2403.11492 (2024) - [i254]Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li:
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis. CoRR abs/2403.12963 (2024) - [i253]Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li:
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation. CoRR abs/2403.13745 (2024) - [i252]Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, Hongsheng Li:
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? CoRR abs/2403.14624 (2024) - [i251]Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li:
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models. CoRR abs/2403.16999 (2024) - [i250]Sicheng Li, Keqiang Sun, Zhixin Lai, Xiaoshi Wu, Feng Qiu, Haoran Xie, Kazunori Miyata, Hongsheng Li:
ECNet: Effective Controllable Text-to-Image Diffusion Models. CoRR abs/2403.18417 (2024) - [i249]Weifeng Lin, Xinyu Wei, Ruichuan An, Peng Gao, Bocheng Zou, Yulin Luo, Siyuan Huang, Shanghang Zhang, Hongsheng Li:
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want. CoRR abs/2403.20271 (2024) - [i248]Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li:
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching. CoRR abs/2404.03653 (2024) - [i247]Fan Lu, Kwan-Yee Lin, Yan Xu, Hongsheng Li, Guang Chen, Changjun Jiang:
Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior. CoRR abs/2404.06780 (2024) - [i246]Jihao Liu, Jinliang Zheng, Yu Liu, Hongsheng Li:
GLID: Pre-training a Generalist Encoder-Decoder Vision Model. CoRR abs/2404.07603 (2024) - [i245]Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu:
MoVA: Adapting Mixture of Vision Experts to Multimodal Context. CoRR abs/2404.13046 (2024) - [i244]Xiaoshi Wu, Yiming Hao, Manyuan Zhang, Keqiang Sun, Zhaoyang Huang, Guanglu Song, Yu Liu, Hongsheng Li:
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models. CoRR abs/2405.00760 (2024) - [i243]Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li:
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers. CoRR abs/2405.05945 (2024) - [i242]Xudong Lu, Aojun Zhou, Ziyi Lin, Qi Liu, Yuhui Xu, Renrui Zhang, Yafei Wen, Shuai Ren, Peng Gao, Junchi Yan, Hongsheng Li:
TerDiT: Ternary Diffusion Models with Transformers. CoRR abs/2405.14854 (2024) - [i241]Xudong Lu, Aojun Zhou, Yuhui Xu, Renrui Zhang, Peng Gao, Hongsheng Li:
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models. CoRR abs/2405.16057 (2024) - [i240]Houxing Ren, Mingjie Zhan, Zhongyuan Wu, Aojun Zhou, Junting Pan, Hongsheng Li:
ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation. CoRR abs/2405.17057 (2024) - [i239]Houxing Ren, Mingjie Zhan, Zhongyuan Wu, Hongsheng Li:
Empowering Character-level Text Infilling by Eliminating Sub-Tokens. CoRR abs/2405.17103 (2024) - [i238]Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li, Xiaogang Wang:
Phased Consistency Model. CoRR abs/2405.18407 (2024) - [i237]Jihao Liu, Jinliang Zheng, Boxiao Liu, Yu Liu, Hongsheng Li:
Enhancing Vision-Language Model with Unmasked Token Alignment. CoRR abs/2405.19009 (2024) - [i236]Chenxin Tao, Xizhou Zhu, Shiqian Su, Lewei Lu, Changyao Tian, Xuan Luo, Gao Huang, Hongsheng Li, Yu Qiao, Jie Zhou, Jifeng Dai:
Learning 1D Causal Visual Representation with De-focus Attention Networks. CoRR abs/2406.04342 (2024) - [i235]Siyuan Huang, Haonan Chang, Yuhan Liu, Yimeng Zhu, Hao Dong, Peng Gao, Abdeslam Boularias, Hongsheng Li:
A3VLM: Actionable Articulation-Aware Vision Language Model. CoRR abs/2406.07549 (2024) - [i234]Yuan Pu, Yazhe Niu, Jiyuan Ren, Zhenjie Yang, Hongsheng Li, Yu Liu:
UniZero: Generalized and Efficient Planning with Scalable Latent World Models. CoRR abs/2406.10667 (2024) - [i233]Bingqi Ma, Zhuofan Zong, Guanglu Song, Hongsheng Li, Yu Liu:
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models. CoRR abs/2406.11831 (2024) - [i232]Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao:
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT. CoRR abs/2406.18583 (2024) - [i231]Zimu Lu, Aojun Zhou, Ke Wang, Houxing Ren, Weikang Shi, Junting Pan, Mingjie Zhan, Hongsheng Li:
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning. CoRR abs/2407.00782 (2024) - [i230]Renrui Zhang, Xinyu Wei, Dongzhi Jiang, Yichi Zhang, Ziyu Guo, Chengzhuo Tong, Jiaming Liu, Aojun Zhou, Bin Wei, Shanghang Zhang, Peng Gao, Hongsheng Li:
MAVIS: Mathematical Visual Instruction Tuning. CoRR abs/2407.08739 (2024) - [i229]Yuxiang Chai, Siyuan Huang, Yazhe Niu, Han Xiao, Liang Liu, Dingyu Zhang, Peng Gao, Shuai Ren, Hongsheng Li:
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents. CoRR abs/2407.17490 (2024) - [i228]Dongyang Liu, Shitian Zhao, Le Zhuo, Weifeng Lin, Yu Qiao, Hongsheng Li, Peng Gao:
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining. CoRR abs/2408.02657 (2024) - [i227]Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Guanghao Zhang, Haonan Shi, Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Bolin Li, Zhelun Yu, Si Liu, Hongsheng Li, Hao Jiang:
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation. CoRR abs/2408.15881 (2024) - [i226]Dongzhi Jiang, Renrui Zhang, Ziyu Guo, Yanmin Wu, Jiayi Lei, Pengshuo Qiu, Pan Lu, Zehui Chen, Guanglu Song, Peng Gao, Yu Liu, Chunyuan Li, Hongsheng Li:
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines. CoRR abs/2409.12959 (2024) - [i225]Weifeng Lin, Xinyu Wei, Renrui Zhang, Le Zhuo, Shitian Zhao, Siyuan Huang, Junlin Xi, Yu Qiao, Peng Gao, Hongsheng Li:
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions. CoRR abs/2409.15278 (2024) - [i224]Xin Li, Siyuan Huang, Qiaojun Yu, Zhengkai Jiang, Ce Hao, Yimeng Zhu, Hongsheng Li, Peng Gao, Cewu Lu:
SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation. CoRR abs/2409.18082 (2024) - [i223]Lijian Xu, Hao Sun, Ziyu Ni, Hongsheng Li, Shaoting Zhang:
MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation. CoRR abs/2409.19684 (2024) - [i222]Qiaojun Yu, Siyuan Huang, Xibin Yuan, Zhengkai Jiang, Ce Hao, Xin Li, Haonan Chang, Junbo Wang, Liu Liu, Hongsheng Li, Peng Gao, Cewu Lu:
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models. CoRR abs/2409.20551 (2024) - [i221]Wei Huang, Yue Liao, Jianhui Liu, Ruifei He, Haoru Tan, Shiming Zhang, Hongsheng Li, Si Liu, Xiaojuan Qi:
MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More. CoRR abs/2410.06270 (2024) - [i220]Xiangyu Wang, Donglin Yang, Ziqin Wang, Hohin Kwan, Jinyu Chen, Wenjun Wu, Hongsheng Li, Yue Liao, Si Liu:
Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology. CoRR abs/2410.07087 (2024) - [i219]Ruoyi Du, Dongyang Liu, Le Zhuo, Qin Qi, Hongsheng Li, Zhanyu Ma, Peng Gao:
I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow. CoRR abs/2410.07536 (2024) - [i218]Guankun Wang, Han Xiao, Huxin Gao, Renrui Zhang, Long Bai, Xiaoxiao Yang, Zhen Li, Hongsheng Li, Hongliang Ren:
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection. CoRR abs/2410.07540 (2024) - [i217]Yang Zhou, Hao Shao, Letian Wang, Steven L. Waslander, Hongsheng Li, Yu Liu:
SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction. CoRR abs/2410.08669 (2024) - [i216]Lijian Xu, Ziyu Ni, Hao Sun, Hongsheng Li, Shaoting Zhang:
A foundation model for generalizable disease diagnosis in chest X-ray images. CoRR abs/2410.08861 (2024) - [i215]Rongyao Fang, Chengqi Duan, Kun Wang, Hao Li, Hao Tian, Xingyu Zeng, Rui Zhao, Jifeng Dai, Hongsheng Li, Xihui Liu:
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation. CoRR abs/2410.13861 (2024) - [i214]Yijin Li, Yichen Shen, Zhaoyang Huang, Shuo Chen, Weikang Bian, Xiaoyu Shi, Fu-Yun Wang, Keqiang Sun, Hujun Bao, Zhaopeng Cui, Guofeng Zhang, Hongsheng Li:
BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events. CoRR abs/2410.20451 (2024) - 2023
- [j47]Changjuan Tao, Difei Gu, Rui Huang, Ling Zhou, Zhiqiang Hu, Yuanyuan Chen, Xiaofan Zhang, Hongsheng Li:
Hippocampus segmentation after brain tumor resection via postoperative region synthesis. BMC Medical Imaging 23(1): 142 (2023) - [j46]Shaoshuai Shi, Li Jiang, Jiajun Deng, Zhe Wang, Chaoxu Guo, Jianping Shi, Xiaogang Wang, Hongsheng Li:
PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection. Int. J. Comput. Vis. 131(2): 531-551 (2023) - [j45]Jiageng Mao, Shaoshuai Shi, Xiaogang Wang, Hongsheng Li:
3D Object Detection for Autonomous Driving: A Comprehensive Survey. Int. J. Comput. Vis. 131(8): 1909-1963 (2023) - [j44]Peipei Zhao, Qiguang Miao, Hongsheng Li, Ruyi Liu, Yi-Ning Quan, Jianfeng Song:
Refined probability distribution module for fine-grained visual categorization. Neurocomputing 518: 533-544 (2023) - [j43]Xianying He, Jiahui Li, Fang Yan, Linlin Wang, Wen Chen, Xiaodi Huang, Zhiqiang Hu, Qi Duan, Hongsheng Li, Shaoting Zhang, Jie Zhao:
Predicting cancer outcomes from whole slide images via hybrid supervision learning. Neurocomputing 557: 126736 (2023) - [j42]Jihan Yang, Shaoshuai Shi, Zhe Wang, Hongsheng Li, Xiaojuan Qi:
ST3D++: Denoised Self-Training for Unsupervised Domain Adaptation on 3D Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 45(5): 6354-6371 (2023) - [j41]Jianbo Liu, Junjun He, Yuanjie Zheng, Shuai Yi, Xiaogang Wang, Hongsheng Li:
A Holistically-Guided Decoder for Deep Representation Learning With Applications to Semantic Segmentation and Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 45(10): 11390-11406 (2023) - [j40]Kunchang Li, Yali Wang, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao:
UniFormer: Unifying Convolution and Self-Attention for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(10): 12581-12600 (2023) - [j39]Linjiang Huang, Kaixin Lu, Guanglu Song, Liang Wang, Si Liu, Yu Liu, Hongsheng Li:
Teach-DETR: Better Training DETR With Teachers. IEEE Trans. Pattern Anal. Mach. Intell. 45(12): 15759-15771 (2023) - [c175]Xiaoyu Shi, Zhaoyang Huang, Dasong Li, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li:
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation. CVPR 2023: 1599-1610 - [c174]Hao Li, Jinguo Zhu, Xiaohu Jiang, Xizhou Zhu, Hongsheng Li, Chun Yuan, Xiaohua Wang, Yu Qiao, Xiaogang Wang, Wenhai Wang, Jifeng Dai:
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks. CVPR 2023: 2691-2700 - [c173]Renrui Zhang, Liuhui Wang, Yali Wang, Peng Gao, Hongsheng Li, Jianbo Shi:
Starting from Non-Parametric Networks for 3D Point Cloud Analysis. CVPR 2023: 5344-5353 - [c172]Jihao Liu, Xin Huang, Jinliang Zheng, Yu Liu, Hongsheng Li:
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers. CVPR 2023: 6252-6261 - [c171]Xiaoshi Wu, Feng Zhu, Rui Zhao, Hongsheng Li:
CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching. CVPR 2023: 7031-7040 - [c170]