


default search action
33rd MM 2025: Dublin, Ireland
- Cathal Gurrin, Klaus Schoeffmann, Min Zhang, Luca Rossetto, Stevan Rudinac, Duc-Tien Dang-Nguyen, Wen-Huang Cheng, Phoebe Chen, Jenny Benois-Pineau:

Proceedings of the 33rd ACM International Conference on Multimedia, MM 2025, Dublin, Ireland, October 27-31, 2025. ACM 2025, ISBN 979-8-4007-2035-2
Keynote Talks
- Shalini De Mello:

AI-Mediated Human Interaction. 1 - Tat-Seng Chua:

Next Phase of Research on Multimodal Foundation Models: From Alignments to Content Generation and Quality Assessment. 2 - Steve Hodges:

SenseCam and Isotyping: The Challenges and Benefits of Working with New Hardware. 3-4
Content: Media Interpretation
- Haolun Li, Weihuang Liu, Jiateng Liu, Zhenhua Tang, Chi-Man Pun, Qiguang Miao, Feng Xu, Hao Gao:

MotionRefineNet: Fine-Grained Pose Sequence Smoothing and Refinement. 5-14 - Mo Yang, Luo Chen, Jiali Zhou:

Change-UP: Advancing Visualization and Inference Capability for Multi-level Remote Sensing Change Interpretation. 15-24 - Yuxiang Zhao, Wei Huang, Haipeng Zeng, Huan Zhao, Yujie Song:

Cross Time Domain Intention Interaction for Conditional Trajectory Prediction. 25-33 - Ye-Chan Kim, SeungJu Cha, Si-Woo Kim, Taewhan Kim, Dong-Jin Kim:

SIDA: Synthetic Image Driven Zero-shot Domain Adaptation. 34-42 - Han Hu, Wenli Du, Bing Wang:

Efficient Video Anomaly Detection via Scene-Dependent Memory Assisted Inter-Frame RGB Difference Reconstruction. 43-51 - Hyungjun Doh, Dong In Lee, Seunggeun Chi, Pin-Hao Huang, Kwonjoon Lee, Sangpil Kim, Karthik Ramani:

Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction. 52-61 - Guoyi Li, Die Hu, Haozhe Li, Qirui Tang, Xiaomeng Fu, Yulei Wu, Xiaodan Zhang, Honglei Lyu:

Zero-Shot Multimodal Fact-Checking with Conceptual Reasoning. 62-71 - Junyu Zhou, Yuyang Huang, Wenrui Dai, Junni Zou, Ziyang Zheng, Nuowen Kan, Chenglin Li, Hongkai Xiong:

3DGabSplat: 3D Gabor Splatting for Frequency-adaptive Radiance Field Rendering. 72-81 - Songze Li, Yunfei Guo, Shen Chen, Bin Li, Kaiqing Lin, Changsheng Chen, Haodong Li, Taiping Yao, Shouhong Ding:

DITL2: Dual-Stage Invariance Transfer Learning for Generalizable Document Image Tampering Localization. 82-91 - Rouqi Zhang, Chengdi Lu, Hancheng Lu, Yang Cao, Tiesong Zhao:

RobustVisH: Robust Visual-Haptic Cross-Modal Recognition under Transmission Interference. 92-100 - Zhangchi Hu, Peixi Wu, Jie Chen, Huyue Zhu, Yijun Wang, Yansong Peng, Hebei Li, Xiaoyan Sun:

Dome-DETR: DETR with Density-Oriented Feature-Query Manipulation for Efficient Tiny Object Detection. 101-110 - Xiaojian Lin, Wenxin Zhang, Yuchu Jiang, Wangyu Wu, Yiran Guo, Kangxu Wang, Zongzheng Zhang, Guijin Wang, Lei Jin, Hao Zhao:

Butter: Frequency Consistency and Hierarchical Fusion for Autonomous Driving Object Detection. 111-120 - Xinkui Lin, Yongxiu Xu, Minghao Tang, Shilong Zhang, Hongbo Xu, Hao Xu, Yubin Wang:

REMOTE: A Unified Multimodal Relation Extraction Framework with Multilevel Optimal Transport and Mixture-of-Experts. 121-130 - Xiaoran Xu, Jiangang Yang, Wenyue Chong, Wenhui Shi, Shichu Sun, Jing Xing, Jian Liu:

Boosting Single-Domain Generalized Object Detection via Vision-Language Knowledge Interaction. 131-140 - Shaohua Liu, Ning Gao, Zuoya Gu, Hongkun Dou, Yue Deng, Hongjue Li:

Spatiotemporal Degradation-Aware 3D Gaussian Splatting for Realistic Underwater Scene Reconstruction. 141-150 - Tianyi Ma, Maoying Qiao:

EBaR: Efficient Buffer and Resetting for Single-Sample Continual Test-Time Adaptation. 151-160 - Wenzhe He, Xiaojun Chen, Wentang Chen, Hongyu Wang, Ying Liu, Ruihui Li:

RWKV-PCSSC: Exploring RWKV Model for Point Cloud Semantic Scene Completion. 161-170 - Ruian He, Zixian Zhang, Ri Cheng, Weimin Tan, Bo Yan:

Efficient Trajectory Space-Time Super-Resolution for Fast Live-cell Imaging. 171-179 - Hongzhao Li, Hualei Wan, Liangzhi Zhang, Mingyuan Jiu, Shupan Li, Mingliang Xu, Muhammad Haris Khan:

Towards Robust Multimodal Domain Generalization via Modality-Domain Joint Adversarial Training. 180-188 - Hongda Qin, Xiao Lu, Zhiyong Wei, Ningjiang Chen:

Object-Preserving Counterfactual Diffusion Augmentation for Single-Domain Generalized Object Detection. 189-198 - Yidong Chen, Qi Li, Yuyang Yang, Wen Li, Sheng Ao, Cheng Wang:

Unleashing the Power of Data Generation in One-Pass Outdoor LiDAR Localization. 199-208 - Wenli Zheng, Huiyuan Fu, Xicong Wang, Hao Kang, Chuanming Wang, Jin Liu, Zekai Xu, Heng Zhang, Huadong Ma:

EvRAW: Event-guided Structural and Color Modeling for RAW-to-sRGB Image Reconstruction. 209-218 - Zhaoxi Mu, Rilin Chen, Andong Li, Meng Yu, Xinyu Yang, Dong Yu:

From Continuous to Discrete: Cross-Domain Collaborative General Speech Enhancement via Hierarchical Language Models. 219-228 - Jin Han, Yixin Yang, Zhan Zhan, Boxin Shi, Imari Sato:

EDeF-Net: Spatio-temporal Association Network for Flicker Removal in Event Streams. 229-237 - Jinxiang Lai, Wenlong Wu, Jiawei Zhan, Jian Li, Bin-Bin Gao, Jun Liu, Jie Zhang, Song Guo:

BoxSeg: Quality-Aware and Peer-Assisted Learning for Box-supervised Instance Segmentation. 238-246 - Jiaxu Li, Rui Li, Jianyu Qi, Songning Lai, Linpu Lv, Kejia Fan, Jianheng Tang, Yutao Yue, Dongzhan Zhou, Yunhuai Liu, Huiping Zhuang:

CFSSeg: Closed-Form Solution for Class-Incremental Semantic Segmentation of 2D Images and 3D Point Clouds. 247-256 - Trong-Thang Pham, Anh Nguyen, Zhigang Deng, Carol C. Wu, Hien Nguyen, Ngan Le:

Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis. 257-266 - Mingliang Zhai, Yiheng Wang, Haidong Hu, Chi-Man Pun, Hao Gao:

FGRFlow: Learning Fine-Grained Rigidity Scene Flow from 4D Radar Point Cloud. 267-276 - Xiaoyu Zhang, Zhifeng Bao, Hai Dong, Ziwei Wang, Jiajun Liu:

Querying Autonomous Vehicle Point Clouds: Enhanced by 3D Object Counting with CounterNet. 277-285 - Guiping Cao, Xiangyuan Lan, Wenjian Huang, Jianguo Zhang, Dongmei Jiang, Yaowei Wang:

DS-Det: Single-Query Paradigm and Attention Disentangled Learning for Flexible Object Detection. 286-295 - Zhen Wang, Dongyuan Li, Yaozu Wu, Peide Zhu, Shiyin Tan, Renhe Jiang:

Video-based Transparent Object Segmentation via Temporal Feature Aggregation. 296-304 - Haosheng Cai, Yang Xue:

G2LFormer: Global-to-Local Query Enhancement for Robust Table Structure Recognition. 305-314 - Xinyi Hu, Yuran Wang, Ruixu Zhang, Yue Li, Wenxuan Liu, Zheng Wang:

SPAN: Continuous Modeling of Suspicion Progression for Temporal Intention Localization. 315-323 - Tianyi Zhang, Qinglong Lin, Yang Hu, Pengming Feng, Rubo Zhang:

Edge-aware Affinity Enhancement for Image Manipulation Localization. 324-332 - Kanglin Qu, Pan Gao, Qun Dai, Yuanhao Sun:

HydraMamba: Multi-Head State Space Model for Global Point Cloud Learning. 333-342 - Runmin Cong, Zongji Yu, Hao Fang, Haoyan Sun, Sam Kwong:

UIS-Mamba: Exploring Mamba for Underwater Instance Segmentation via Dynamic Tree Scan and Hidden State Weaken. 343-352 - Kuo Shi, Jie Lu, Shanshan Ye, Guangquan Zhang, Zhen Fang:

MiraGe: Multimodal Discriminative Representation Learning for Generalizable AI-Generated Image Detection. 353-361 - Runtian Yuan, Mohan Chen, Jilan Xu, Ling Zhou, Qingqiu Li, Yuejie Zhang, Rui Feng, Tao Zhang, Shang Gao:

Text-Promptable Propagation for Referring Medical Image Sequence Segmentation. 362-371 - Dunwei Tu, Huiyu Yi, Yuchi Wang, Baile Xu, Jian Zhao, Furao Shen:

Multiple Queries with Multiple Keys: A Precise Prompt Matching Paradigm for Prompt-based Continual Learning. 372-381 - Zihou Zhang, Hao Li, Zhengwei Yang, Zechao Hu, Liang Li, Zheng Wang:

From Language to Instance: Generative Visual Prompting for Zero-shot Camouflaged Object Detection. 382-391 - Chen Cai, Tianyi Liu, Jianjun Gao, Wenyang Liu, Kejun Wu, Ruoyu Wang, Yi Wang, Soo Chin Liew:

From Semantics, Scene to Instance-awareness: Distilling Foundation Model for Open-vocabulary Grounded Situation Recognition. 392-401 - Hanyu Guo, Suzhou Que, Junlong Gao, Hanzi Wang:

TFPA: Text Features Guided Dynamic Parameter Adjustment for Few Shot Action Recognition. 402-411 - Jitong Liao, Yulu Gao, Shaofei Huang, Jialin Gao, Jie Lei, Ronghua Liang, Si Liu:

DOMR: Establishing Cross-View Segmentation via Dense Object Matching. 412-421 - Yue Guo, Haoxiang Liao, Haibin Ling, Bingyao Huang:

NeuroPump: Simultaneous Geometric and Color Rectification for Underwater Images. 422-431 - Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Lei Liang, Wen Zhang, Huajun Chen:

Client-Server Co-design with Multi-modal Codebooks Makes Better and Faster Federate Knowledge Sharing. 432-440 - Bo Wang, Jin Liu, Huiyuan Fu, Xin Wang, Heng Zhang, Huadong Ma:

Severe Light, Textureless Sight: A Benchmark for Extreme Exposure Correction. 441-449 - Zhicheng Lian, Lizhi Wang, Hua Huang:

APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech. 450-459 - Zhaoyu Chen, Qian Huang, Xing Li, Yunfei Zhang, Shihao Han, Ge Gao, Yirui Wu, Xin Li, Ziyang Yin:

Geo-CF2Net: Geometry-Prior Cross-Frequency Interactive Fusion Network for 3D Human Action Recognition. 460-469 - Naisong Luo, Yuan Wang, Yuwen Pan, Rui Sun:

Focus on the Object: Gradient-based Feature Modulation for Camouflaged Object Segmentation. 470-478 - Liuyi Li, Feng Shi, Jian Wang, Jinjing Zhu, Wenze Shao:

An Event-tailored State-Space Based Model for Pedestrian Detection. 479-488 - Zhihong Zheng, Yang Cao, Junlong Gao, Hanzi Wang:

OV-VOD: Open-Vocabulary Video Object Detection. 489-498 - Yin Wang, Zixuan Wang, Hao Lu, Zhen Qin, Hailiang Zhao, Guanjie Cheng, Xin Du, Ge Su, Li Kuang, Mengchu Zhou, Shuiguang Deng:

SeMi: When Imbalanced Semi-Supervised Learning Meets Mining Hard Examples. 499-507 - Kuiye Ding, Fanda Fan, Yao Wang, Ruijie Jian, Xiaorui Wang, Luqi Gong, Yishan Jiang, Chunjie Luo, Jianfeng Zhan:

DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting Framework. 508-517 - Quanmin Liang, Jinyi Lu, Qiang Li, Shuai Liu, Zhihao Zhao, Yinzheng Zhao, Wei Zhang, Kai Huang, Yonghong Tian:

ESOD: Event-Based Small Object Detection. 518-527 - Michael Kohl, Tobias Wursthorn, Christof Weiß:

Cross-Modal Metrics for Capturing Correspondences Between Music Audio and Stage Lighting Signals. 528-534 - Yingbing Liu, Fei Ma, Yanan Wu, Xinxin Zuo, Fan Zhang, Yang Wang:

Collaborative Cloud-edge Generalized Category Discovery. 535-543 - Ping Li, Chenhao Ping, Wenxiao Wang, Mingli Song:

Sample-level Adaptive Knowledge Distillation for Action Recognition. 544-552 - Jiale Yu, Baopeng Zhang, Zhu Teng, Jianping Fan:

OV-DAVEL: Towards Open-Vocabulary Dense Audio-Visual Event Localization in Untrimmed Videos. 553-562 - Jie Fu, Bingkun Bao:

Retaining Temporal Semantics and Relation Topologies for Continual Weakly-Supervised Audio-Visual Video Parsing. 563-572 - Xiaofeng Liu, Guanchen Meng, Chongyang Feng, Risheng Liu, Zhongxuan Luo, Xin Fan:

TNT-GS: Truncated and Tailored Gaussian Splatting. 573-581 - Pengfei Cai, Yan Song, Qing Gu, Nan Jiang, Haoyu Song, Ian McLoughlin:

Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries. 582-591 - Zhaolin Cai, Fan Li, Ziwei Zheng, Yanjun Qin:

HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs. 592-601 - Guanchun Wang, Xiangrong Zhang, Yifei Zhang, Zelin Peng, Tianyang Zhang, Xu Tang, Licheng Jiao:

ACMamba: Fast Unsupervised Anomaly Detection via An Asymmetrical Consensus State Space Model. 602-611 - Jian Zhou, Yingjie Xie, Cunhang Fan, Huabin Wang, Zhao Lv, Liang Tao:

DHGCN: Dual HyperGraph Convolutional Network for EEG-Based Auditory Attention Detection. 612-620 - Peiqi Jiang, Bohan Lei, Yuhao Sun, Lingyun Yu, Zhineng Chen, Hongtao Xie, Yongdong Zhang:

Proactive Deepfake Detection via Self-Verifiable Semantic Watermarking. 621-630 - Yuzhen Li, Yuehui Han, Jianjun Qian, Jian Yang:

Self-Supervised Vision Graph Neural Networks Based on Contrastive Learning. 631-640 - Luosheng Xu, Dalin Zhang, Zhaohui Song:

Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change Detection. 641-649 - Chenglong Sun, Shijie Pang, Yuzheng Wang, Lizhe Qi:

RWKV3D: An RWKV-Based Model with Multiple Training Strategies for Point Cloud Analysis. 650-659 - Jinghan Liu, Xingmei Wang, Jiaxiang Meng:

Adaspeaker: Learning Discriminative Speaker Representations with Gradient-Aware Adaptive Scaling. 660-668 - Wenpeng Lang, Saihui Hou, Yongzhen Huang:

Beyond Sparse Keypoints: Dense Pose Modeling for Robust Gait Recognition. 669-678 - Jinwen Wang, Youfang Lin, Xiaobo Hu, Siyu Yang, Sheng Han, Shuo Wang, Kai Lv:

From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training. 679-688 - Yaoxun Xu, Hangting Chen, Jianwei Yu, Wei Tan, Shun Lei, Zhiwei Lin, Rongzhi Gu, Zhiyong Wu:

MuCodec: Ultra Low-Bitrate Music Codec for Music Generation. 689-698 - Chi Huang, Qi Zhang, Qian Zhang, Nan Li, Yipu Gong, Xiaowei Wang, Wei Feng:

TriGS: Tri-consistency 3D Gaussian Splatting from Sparse and Unposed Views. 699-708 - Xuedong He, Huiying Xu, Xinzhong Zhu, Hongbo Li:

High-Performance Discriminative Tracking with Spatio-Temporal Template Fusion. 709-718 - Jingdong Zhang, Hanrong Ye, Xin Li, Wenping Wang, Dan Xu:

Multi-Task Label Discovery via Hierarchical Task Tokens for Partially Annotated Dense Predictions. 719-728 - Jiaxi Wang, Yaosen Min, Xun Zhu, Miao Li, Ji Wu:

MIPS: A Multimodal Infinite Polymer Sequence Pre-training Framework for Polymer Property Prediction. 729-738 - Yuxuan Zhang, Bo Wang, Yu Du, Yangfu Zhu, Haorui Wang, Guangyao Su, Tao Zhou, Bin Wu:

Cause and Effect: Video Social Relationship Recognition from Causal Perspective. 739-747 - Mashiro Toyooka, Kiyoharu Aizawa, Yoko Yamakata:

A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task. 748-756 - Guitao Xu, Ziqi Yi, Peirong Zhang, Jiahuan Cao, Shihang Wu, Lianwen Jin:

From Pixels to Semantics: A Novel MLLM-Driven Approach for Explainable Tampered Text Detection. 757-766 - Yifan Wang, Yuntai Ding, Yiyang Gu, Ziyue Qiao, Chong Chen, Xian-Sheng Hua, Ming Zhang, Wei Ju:

Deep Graph Clustering with Disentangled Representation Learning. 767-776 - Han Li, Shaofei Huang, Longfei Xu, Yulu Gao, Beipeng Mu, Si Liu:

RATopo: Improving Lane Topology Reasoning via Redundancy Assignment. 777-786 - Sensen Wang, Yuehu Liu, Chi Zhang:

BiOMamba: Mamba-based Forward-Then-Backward Temporal Modeling for Online Action Detection and Anticipation. 787-795 - Xiangyu Zheng, Songcheng He, Wanyun Li, Xiaoqiang Li, Wei Zhang:

Shallow Features Matter: Hierarchical Memory with Heterogeneous Interaction for Unsupervised Video Object Segmentation. 796-805 - Xiaobo Liu, Henglu Wei, Chuxi Yang, Wei Yu, Xudong Zhao, Xiangyang Ji:

Camera-Specific Imaging Simulation for Raw Domain Image Super Resolution. 806-815 - Zongsheng Cao, Yangfan He, Anran Liu, Jun Xie, Zhepeng Wang, Feng Chen:

PurifyGen: A Risk-Discrimination and Semantic-Purification Model for Safe Text-to-Image Generation. 816-825 - Haonan Cheng, Junwei Zhang, Hengyan Huang, Long Ye:

FG-Midiformer: A Symbolic Music Understanding Model towards Fine-Grained Learning of Multi-Attributes. 826-835 - Yiran Meng, Junhong Ye, Wei Zhou, Guanghui Yue, Xudong Mao, Ruomei Wang, Baoquan Zhao:

VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering. 836-845 - Guorui Song, Guocun Wang, Zhe Huang, Jing Lin, Xuefei Zhe, Jian Li, Haoqian Wang:

Towards Fine-Grained Human Motion Video Captioning. 846-855
Content: Multimodal Fusion
- Junpu Zhang, Shengju Yu, Suyuan Liu, Siwei Wang, Miaomiao Li, Xinwang Liu, En Zhu, Kunlun He:

Learning the Anchors with Similar Distributions to Original Data for Multi-view Clustering. 857-866 - Fengshun Wang, Qiurui Wang, Peilin Zhao:

Learning Long-Range Action Representation by Two-Stream Mamba Pyramid Network for Figure Skating Assessment. 867-875 - Yan Zhang, Gangyan Zeng, Daiqing Wu, Huawen Shen, Binbin Li, Yu Zhou, Can Ma, Xiaojun Bi:

Gather and Trace: Rethinking Video TextVQA from an Instance-oriented Perspective. 876-885 - Hui Zhang, Yiteng Xu, Yonglin Tian, Yidong Li, Tiago H. Falk, Fei-Yue Wang:

Selective Shift: Towards Personalized Domain Adaptation in Multi-Agent Collaborative Perception. 886-895 - Mingqian Ji, Jian Yang, Shanshan Zhang:

Enhancing Pseudo-Boxes via Data-Level LiDAR-Camera Fusion for Unsupervised 3D Object Detection. 896-904 - Gaoxiang Cong, Liang Li, Jiadong Pan, Zhedong Zhang, Amin Beheshti, Anton van den Hengel, Yuankai Qi, Qingming Huang:

FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing. 905-914 - Wenhui Wu, Guanqi Wen, Le Ou-Yang, Ran Wang, Sam Kwong:

DUIMC: Deep Unbalanced Incomplete Multi-View Clustering via Graph Constrained Imputation and Contrastive Learning. 915-924 - Hao Wang, Xiaobao Wei, Xiaoan Zhang, Jianing Li, Chengyu Bai, Ying Li, Ming Lu, Wenzhao Zheng, Shanghang Zhang:

EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler. 925-934 - Zhongfan Sun, Kan Guo, Yongli Hu, Daxin Tian, Qingqing Gao, Jiapu Wang, Junbin Gao, Yanfeng Sun, Baocai Yin:

Large-Small Model Synergy with Multimodal Fine-Grained Heuristics for Knowledge-Based Visual Question Answering. 935-944 - Peng Chen, Xiaobao Wei, Qingpo Wuwu, Xinyi Wang, Xingyu Xiao, Ming Lu:

MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussians. 945-954 - Peiyuan Jiang, Yao Liu, Qiao Liu, Zongshun Zhang, Jiaye Yang, Lu Liu, Daibing Yao:

DRKF: Decoupled Representations with Knowledge Fusion for Multimodal Emotion Recognition. 955-964 - Tao Ling, Siping Shi, Dan Wang:

Accelerating Long Video Understanding via Compressed Scene Graph-Enabled Chain-of-Thought. 965-974 - Tong Chen, Bowen Du, Jiejie Zhao, Hanyang Xia, Haiquan Wang, Jiakai Wang:

BadMDA: Towards Backdoor Injection during Domain Adaptation to Collapse Multi-Agent Perception. 975-983 - Chen Gao, Youfang Lin, Wenbin Wang, Shuo Zhang:

Epipolar Consistency-based Network for Structure-Aware LF Semantic Segmentation. 984-992 - Jia-Xuan Jiang, Jiashuai Liu, Hongtao Wu, Yifeng Wu, Zhong Wang, Qi Bi, Yefeng Zheng:

Single Domain Generalization for Multimodal Cross-Cancer Prognosis via Dirac Rebalancer and Distribution Entanglement. 993-1002 - Yi Liu, Xinyi Liu, Yi Wan, Panwang Xia, Qiong Wu, Yongjun Zhang:

StereoINR: Cross-View Geometry Consistent Stereo Super Resolution with Implicit Neural Representation. 1003-1012 - Lanhu Wu, Zilin Gao, Hao Fei, Mong-Li Lee, Wynne Hsu:

LEAF-Mamba: Local Emphatic and Adaptive Fusion State Space Model for RGB-D Salient Object Detection. 1013-1022 - Min Li, Jinghui He, Jiachen Li, Delong Han, Jin Wan, Gang Li:

HGCF: Hierarchical Geometry-Color Fusion for Multimodal Industrial Anomaly Detection. 1023-1031 - Qiyuan Zhu, Lujun Li, Dezhi Li, Jiacheng Liu, Pengyu Cheng, Yucheng Xu, Sirui Han, Yike Guo:

Outlier-Aware Model Merging for Efficient Multitask Inference. 1032-1041 - Zhenyang Liu, Sixiao Zheng, Siyu Chen, Cairong Zhao, Longfei Liang, Xiangyang Xue, Yanwei Fu:

A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding. 1042-1051 - Jinbao Wei, Yuhang Chen, Zhijie Wang, Gang Yang, Shimin Tao, Jian Gao, Aiping Liu, Xun Chen:

Rethinking Diffusion Bridge Model with Dual Alignments for Medical Image Synthesis. 1052-1061 - Haichuan Fang, Haoran Zhang, Yulin Du, Qiang Guo, Zhen Tian, Youwei Wang, Yangdong Ye:

CDIB: Consistency Discovery-guided Information Bottleneck for Multi-modal Knowledge Graph Reasoning. 1062-1071 - Yalan Qin, Nan Pu, Hanzhou Wu, Zhaoxin Fan:

Flexible Multi-view Clustering with Dynamic Views Generation. 1072-1081 - Zheng Guan, Xue Wang, Wenhua Qian, Peng Liu, Runzhuo Ma:

Residual Prior-driven Frequency-aware Network for Image Fusion. 1082-1091 - Mulin Chen, Bocheng Wang, Jiaxin Zhong, Zongcheng Miao, Xuelong Li:

Clustering-Oriented Generative Attribute Graph Imputation. 1092-1101 - Taichun Zhou, Zhibin Dong, Siwei Wang, Ke Liang, Miaomiao Li, Xinwang Liu, En Zhu, Xiangjun Dong:

DPFMVC: Dynamic Progressive Fusion for Multi-view Clustering. 1102-1111 - Runlin Yu, Yipu Gong, Wenrui Li, Aiwen Sun, Mengren Zheng:

Discrepancy-Aware Attention Network for Enhanced Audio-Visual Generalized Zero-Shot Learning. 1112-1121 - Ziming Quan, Penglei Wang, Danyang Wu, Jin Xu:

Unsupervised Cross-view Message Passing Method for Multi-view Graph Clustering. 1122-1131 - Mingrui Li, Dong Li, Sijia Hu, Kangxu Wang, Zhenjun Zhao, Hongyu Wang:

SLAM-X: Generalizable Dynamic Removal for NeRF and Gaussian Splatting SLAM. 1132-1140 - Jinjia Peng, Tianhang Cheng, Guangqi Jiang, Huibing Wang:

Prior-oriented Anchor Learning with Coalesced Semantics for Multi-View Clustering. 1141-1150 - Hao Wang, Hanxiao Li, Li Xu:

CrosST: Cross Swin 4D Transformer for Multi-Modal Alzheimer's Detection. 1151-1160 - Binbin Zheng, Aiqiu Wu, Kai Fan, Ao Li, Minghui Wang:

Domain-Specific Interactive Prompting for Generalized Nuclei Classification. 1161-1170 - Shaochen Zhang, Zekun Qi, Runpei Dong, Xiuxiu Bai, Xing Wei:

Positional Prompt Tuning for Efficient 3D Representation Learning. 1171-1180 - Zhicheng Dong, Xiaodong Yue, Yufei Chen, Yuxian Zhou:

Trusted Open-World Multi-View Classification with Dynamic Opinion Aggregation. 1181-1189 - Zihan Wang, Yunhang Shen, Yuan Fang, Zuwei Long, Ke Li, Xing Sun, Jiao Xie, Shaohui Lin:

Towards Universal Perception through Language-Guided Open-World Object Detection. 1190-1199 - Junyu Chen, Jiawei Peng, Yuan Sun, Jian Dai, Xingfeng Li, Zhenwen Ren:

Scalable Unpaired Multi-View Clustering via Anchor-Driven High-Throughput Encoding. 1200-1209 - Zeyan Li, Cankun Guo, Yin Tang:

Modal Symbiosis: Variational Alignment Unveils New Horizons in Multimodal Representation Learning. 1210-1219 - Zihan Fang, Zhiyong Xu, Lan Du, Shide Du, Zhiling Cai, Shiping Wang:

Enhancing Multi-view Open-set Learning via Ambiguity Uncertainty Calibration and View-wise Debiasing. 1220-1228 - Zhangyong Tang, Tianyang Xu, Xuefeng Zhu, Chunyang Cheng, Tao Zhou, Xiaojun Wu, Josef Kittler:

Serial Over Parallel: Learning Continual Unification for Multi-Modal Visual Object Tracking and Benchmarking. 1229-1238 - Weiqi Liu, Yongshan Zhang, Xinxin Wang, Lefei Zhang:

Deep Multi-Level Contrastive Clustering for Multi-Modal Remote Sensing Images. 1239-1247 - Jiaqi Cui, Yilun Li, Xi Wu, Jiliu Zhou, Yan Wang:

PREMISE: Individual Preference-aware Multi-modal Cooperation for Survival Prediction. 1248-1257 - Jiaxing Qi, Yifan Xu, Zhifei Yang, Ruifei Ma, Chao Zhang, Kuifei Yu:

BridgeGLM: Bridging Graph and Language Spaces for Domain Generalization. 1258-1267 - Yating Liu, Yang Zou, Xingyuan Li, Xingyue Zhu, Kaiqi Han, Zhiying Jiang, Long Mau, Jinyuan Liu:

Toward a Training-Free Plug-and-Play Refinement Framework for Infrared and Visible Image Registration and Fusion. 1268-1277 - Cai Xu, Ziqi Wen, Jie Zhao, Wanqing Zhao, Jinlong Yu, Haishun Chen, Ziyu Guan, Wei Zhao:

Beyond Equal Views: Strength-Adaptive Evidential Multi-View Learning. 1278-1287 - Yoorhim Cho, Hongyeob Kim, Semin Kim, Youjia Zhang, Yunseok Choi, Sungeun Hong:

RA-Touch: Retrieval-Augmented Touch Understanding with Enriched Visual Data. 1288-1297 - Xinlei Yu, Changmiao Wang, Hui Jin, Ahmed Elazab, Gangyong Jia, Xiang Wan, Changqing Zou, Ruiquan Ge:

CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation. 1298-1307 - Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li:

StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation. 1308-1317 - Liang Zhao, Shubin Ma, Bo Xu, Qingchen Zhang:

Dual-Learning based Penalized Multi-Align Clustering for Multi-View Incomplete and Disorderly Data. 1318-1326 - Jialei Cui, Jianwei Du, Yanzhe Li, Lei Gao, Hui Jiang, Chenfu Bao:

HAMLET-FFD: Hierarchical Adaptive Multi-modal Learning Embeddings Transformation for Face Forgery Detection. 1327-1336 - Disen Hu, Xun Jiang, Zhe Sun, Hao Yang, Chong Peng, Peng Yan, Heng Tao Shen, Xing Xu:

Geometric Gradient Divergence Modulation for Imbalanced Multimodal Learning. 1337-1345 - Xuanming Jiang, Baoyi An, Zhengwei Zou, Dingyu Nie, Jialie Shen, Xueming Qian, Guoshuai Zhao:

Ear with Eye: Lightweight Multimodal Audio-Visual Network Inspired by Bionic Structures. 1346-1355 - Chengzhou Li, Xiaokang Liu, Qi Jia, Jinyuan Liu, Zhiying Jiang, Longhan Feng, Yu Liu, Zhongxuan Luo, Xin Fan:

Physics-Guided Sonar Image Fine-grained Recognition under Scarce Annotations. 1356-1365 - Mianzimei Yang, Zhipeng Zhou, Jin Zhang, Yuanhao Pu, Hong Xie, Defu Lian:

Conflict-Buffering Optimization by Symmetry Teleportation for Deep Long-Tailed Recognition. 1366-1375 - Jiahao Wang, Fang Liu, Licheng Jiao, Hao Wang, Shuo Li, Lingling Li, Puhua Chen, Xu Liu, Xinyi Wang:

FA3T: Feature-Aware Adversarial Attacks for Multi-modal Tracking. 1376-1385 - Zhiwei Zhang, Ruikai Xu, Weijian Zhang, Zhizhong Zhang, Xin Tan, Jingyu Gong, Yuan Xie, Lizhuang Ma:

PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion. 1386-1394 - Siyuan Zhang, Xiaoping Wang, Jiang Li, Weibin Feng, Xin Zhan, Hongzhi Huang:

HAFUNet: A Hierarchical Attention Fusion Network for Monocular Depth Estimation Integrating Event and Frame Data. 1395-1403 - Ronghui Li, Lingxiao Han, Shi Shu, Yueyao Liu, Yukang Lin, Yue Ma, Jie Guo, Ziwei Liu, Xiu Li:

A Motion is Worth a Hybrid Sentence: Taming Language Model for Unified Motion Generation by Fine-grained Planning. 1404-1413 - Hongyu Jiang, Yuxin Huo, Sirou Sheng, Hong Tao, Chenping Hou:

Scalable One-step Unaligned Multi-view Clustering via Joint High-Order Correlation Learning. 1414-1422 - Xiangping Zheng, Xuan Feng, Bo Wu, Bin Ren, Wei Li, Xiuxin Hao, Xun Liang, Bin Tang, Zhiwen Yu:

Breaking Semantic Barriers: A Zero-Shot Generalized Framework for Graph Anomaly Detection. 1423-1432 - Mi Zheng, Guanglei Yang, Zitong Huang, Zhenhua Guo, Kevin Han, Wangmeng Zuo:

Segmenting Objectiveness and Task-awareness Unknown Region for Autonomous Driving. 1433-1442 - Yuhao Wang, Lingjuan Miao, Zhiqiang Zhou, Lei Zhang, Yajun Qiao:

Infrared and Visible Image Fusion with Language-Driven Loss in CLIP Embedding Space. 1443-1451 - Min Dang, Gang Liu, Jingqi Zhao, Adams Wai-Kin Kong, Nan Luo, Di Wang:

DDFD: Diffusion-Based Denoising Fusion for Object Detection in Infrared-Visible Images. 1452-1461 - Jiahuan Long, Wen Yao, Tingsong Jiang, Jiacheng Hou, Shuai Jia, Junqi Wu, Xiaoya Zhang, Xiaohu Zheng, Chao Ma:

CDUPatch: Color-Driven Universal Adversarial Patch Attack for Dual-Modal Visible-Infrared Detectors. 1462-1470 - Peirong Zhang, Kai Ding, Lianwen Jin:

Capturing More: Learning Multi-Domain Representations for Robust Online Handwriting Verification. 1471-1479 - Zhenxi Wang, Zongyao Yin, Yujie Hou, Xianchuan Yu:

Robust Multi-view Clustering via Pseudo Label Guided Universum Learning. 1480-1489 - Yao Zhang, Ping Huang, Rui Zhang:

Multimodal Dual Population Evolutionary Reinforcement Learning. 1490-1499 - Bo Xu, Jie Wei, Hongya Wang, Ming Du, Hui Song, Yanghua Xiao:

Bridging the Unseen Gap: Label-Enhanced Information Bottleneck Distillation for Multimodal Named Entity Recognition. 1500-1509 - Mingle Zhou, Jiahui Liu, Jin Wan, Gang Li, Min Li:

Exploring Multimodal Prompts For Unsupervised Continuous Anomaly Detection. 1510-1519 - Hongming Wang, Yifeng Wu, Huimin Huang, Hongtao Wu, Jiaxuan Jiang, Xiaodong Zhang, Hao Zheng, Yawen Huang, Xian Wu, Yefeng Zheng, Jinping Xu, Jing Cheng:

BrainSegDMIF: A Dynamic Fusion-enhanced SAM for Brain Lesion Segmentation. 1520-1529 - Tairan Huang, Yili Wang, Qiutong Li, Changlong He, Jianliang Gao:

Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection. 1530-1538 - Naichuan Zheng, Yuchen Du, Hailun Xia, Zeyu Liang:

Signal-SGN: A Spiking Graph Convolutional Network for Skeleton Action Recognition via Learning Temporal-Frequency Dynamics. 1539-1548 - Yang Zhou, Jin Wang, Yuxiao Zhang, Kaixiang Huang, Guodong Lu, Jingru Yang, Shengfeng He:

Art4Math: Handwritten Mathematical Expression Recognition via Multimodal Sketch Grounding. 1549-1558 - Feiyu Peng, Chaobo He, Junwei Cheng, Huijuan Hu, Wenkai Zhang, Youda Mo:

Frequency-refined Graph Convolution Network with Cross-modal Wavelet Denoising for Recommendation. 1559-1568 - Chuan Zeng, Zhao Zhang, Wei Huang, Lei Zhang, Le Yi, Kefu Zhao:

DC2-SR: A Dual-Consistency Guided Curriculum Learning method for Thick-Slice Fetal MRI Super-Resolution. 1569-1578 - An Xiang, Zixuan Huang, Xitong Gao, Kejiang Ye, Cheng-zhong Xu:

BridgeNet: A Unified Multimodal Framework for Bridging 2D and 3D Industrial Anomaly Detection. 1579-1587 - Hui Li, Pengfei Yang, Juanyang Chen, Le Dong, Yanxin Chen, Quan Wang:

MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge Distillation. 1588-1597 - Shifeng Bao, Zhe Xue, Qi Chen, Shilong Ou, Amin Beheshti, Quan Z. Sheng, Anton van den Hengel, Yuankai Qi:

CausalMVC: Causal Content-Style Representation Learning for Deep Multi-View Clustering. 1598-1606 - Wei Li, Junwei Zhu, Honghui Xu, Jiawei Jiang, Jianwei Zheng:

SpecSolver: Solving Spatial-Spectral Fusion via Semantic Transformer. 1607-1616 - Junwei Zhu, Wei Li, Honghui Xu, Jiawei Jiang, Zhi Liu, Jianwei Zheng:

Arbitrary-scale Fusion Neural Operator. 1617-1626 - Zhongyun Bao, Gang Fu, Jianchi Sun, Jing Zhou, Ziqi Yu, Chunxia Xiao:

I 2HDiffuser: Image Illumination Harmonization Meets the Diffusion Model. 1627-1636 - Weitai Kang, Luowei Zhou, Junyi Wu, Changchang Sun, Yan Yan:

Visual Grounding with Attention-Driven Constraint Balancing. 1637-1645 - Pengfei Ren, Jingyu Wang, Haifeng Sun, Qi Qi, Jing Wang, Jianxin Liao:

Rule Meets Learning: Confidence-Aware Multi-View Fusion for Self-Supervised 3D Hand Pose Estimation. 1646-1655 - Bingfeng Liu, Songwei Pei, Shuhuai Wang, Wenzheng Yang, Qian Li, Shangguang Wang:

Prior-Constrained Relevant Feature driven Image Fusion with Hybrid Feature via Mode Decomposition. 1656-1665 - Yue Zhu, Haiwen Diao, Shang Gao, Jiazuo Yu, Jiawen Zhu, Yunzhi Zhuge, Shuai Hao, Xu Jia, Lu Zhang, Ying Zhang, Huchuan Lu:

Regularizing Subspace Redundancy of Low-Rank Adaptation. 1666-1675 - Jintian Ji, Songhe Feng:

Anchors Bring Stability and Efficiency: Fast Tensorial Multi-view Clustering on Shuffled Datasets. 1676-1685 - Ziyu Wang, Yiming Du, Rui Ning, Lusi Li:

Energy-based Deep Incomplete Multi-View Clustering. 1686-1694 - Kai Zhu, Jun Yin:

Neighbor Contrastive Learning with Weakened Consensus Graph for Deep Multi-View Clustering. 1695-1703 - Hankun Liu, Yujian Zhao, Guanglin Niu:

Try Harder: Hard Sample Generation and Learning for Cloth-Changing Person Re-ID. 1704-1713 - Shide Du, Chunming Wu, Zihan Fang, Wendi Zhao, Yilin Wu, Changwei Wang, Shiping Wang:

LargeMvC-Net: Anchor-based Deep Unfolding Network for Large-scale Multi-view Clustering. 1714-1723 - Quangui He, Jiahui Qu, Wenqian Dong, Song Xiao, Qinghao Gao:

Cycle-Consistent Mamba-Based Registration-Fusion Joint Network for Unregistered Hyperspectral Image Super-Resolution. 1724-1733 - Liyuan Cao, Zihang Guo, Huaiwen Zhang:

Event Consistency-aware Robust Fake News Detection. 1734-1743 - Qi Peng, Jialin Cui, Jiayuan Xie, Yi Cai, Qing Li:

Tree-of-Reasoning: Towards Complex Medical Diagnosis via Multi-Agent Reasoning with Evidence Tree. 1744-1753 - Mengzhen Wang, Xunbin Huang, Jiayuan Xie, Shukai Ma, Jiale Men, Dayong Liang, Yi Cai:

From Model Diagram to Code: A Benchmark Dataset and Multi-Agent Framework. 1754-1763 - Ziqiang Shi, Rujie Liu, Jun Takahashi, Shan Jiang:

TrueCount: Improving Open-World Object Counting with Visual-Language Models and Dynamic Multi-Modal Inputs. 1764-1773 - Hong Gao, Xiangkai Xu, Tianqi Zhu, Xiugang Dong, Yiming Bao, Min-Ling Zhang:

Radar-Mamba: 4D Millimeter-Wave Point Cloud Enhancement via State Space Models. 1774-1782 - Jiangyong Yu, Sifan Zhou, Dawei Yang, Shuoyu Li, Shuo Wang, Xing Hu, Chen Xu, Zukang Xu, Changyong Shu, Zhihang Yuan:

MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Static Quantization. 1783-1792 - Peican Zhu, Yubo Jing, Le Cheng, Keke Tang, Yangming Guo:

KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection. 1793-1801 - Runqi Wang, Caoyuan Ma, Jian Zhao, Hanrui Xu, Dongfang Sun, Haoyang Chen, Lin Xiong, Zheng Wang, Xuelong Li:

Leader is Guided: Interactive Motion Generation via Lead-Follow Paradigm and Trajectory Guidance. 1802-1811 - Xuesong Li, Jinguang Tong, Jie Hong, Vivien Rolland, Lars Petersson:

DGNS: Deformable Gaussian Splatting and Dynamic Neural Surface for Monocular Dynamic 3D Reconstruction. 1812-1821 - Pingting Hao, Huijie Zhang, Yongshan Zhang:

Tensor-based Opposing yet Complementary Learning for Multi-view Multi-label Feature Selection. 1822-1831 - Hui Liu, Chen Jia, Fan Shi, Xu Cheng, Mengfei Shi, Xia Xie, Shengyong Chen:

LIDAR: Lightweight Adaptive Cue-Aware Fusion Vision Mamba for Multimodal Segmentation of Structural Cracks. 1832-1841 - Mufan Liu, Wu Ran, Zhiquan He, Zuojie Xie, Hong Lu, Peirong Ma:

Implicit Retinex Decomposition with Chromaticity Disentanglement for Low-Light Image Enhancement. 1842-1851 - Chenbo Zhang, Bing Huangfu, Hongxu Ma, Jihong Guan, Shuigeng Zhou:

Multi-modal Prototype Guided Few-shot Object Detection. 1852-1861 - Qiyin Zhong, Xianglin Qiu, Xiaolei Wang, Zhen Zhang, Gang Liu, Jimin Xiao:

FAMRD: Frequency-Aware Multimodal Reverse Distillation for Industrial Anomaly Detection. 1862-1871 - Lei Xie, Junxiong Huang, Yuanjing Feng, Qingrun Zeng:

Tractography-Guided Dual-Label Collaborative Learning for Multi-Modal Cranial Nerves Parcellation. 1872-1879 - Guoqiang Liang, Chuan Qin, De Cheng, Shizhou Zhang, Yanning Zhang:

Boosting Multi-Modal Alignment: Geometric Feature Separation for Class Incremental Learning. 1880-1889 - Xueheng Li, Xuanhua He, Tao Hu, Jie Zhang, Man Zhou, Chengjun Xie, Yingying Wang, Bo Huang:

Freq-RWKV: Granularity-Aware Spatial-Frequency Synergy via Dual-Domain Recurrent Scanning for Pan-sharpening. 1890-1899 - Lingren Wang, Wenxuan Tu, Jieren Cheng, Jianan Wang, Xiangyan Tang, Chenchen Wang:

Discovering Maximum Frequency Consensus: Lightweight Federated Learning for Medical Image Segmentation. 1900-1909 - Nan Gao, Junchao Zhu, Yilong Zhang, Ronghua Liang, Guodao Sun, Peng Chen:

Dual Teacher with Dempster-Shafer Guidance for Decision Making in Semi-Supervised Small Object Detection. 1910-1919 - Nan Ma, Beining Sun, Yiheng Han, Genbao Xu:

Kinematic Enhanced Hypergraph Convolutional Network for Skeleton-based Human Action Recognition with LLM Training Guides. 1920-1928 - Yufei Zhang, Yicheng Xu, Hongxin Wei, Zhiping Lin, Xiaofeng Zou, Cen Chen, Huiping Zhuang:

Analytic Continual Test-Time Adaptation for Multi-Modality Corruption. 1929-1937 - Pengfei Gu, Hongxiao Wang, Yejia Zhang, Huimin Li, Chaoli Wang, Danny Chen:

TopoImages: Incorporating Local Topology Encoding into Deep Learning Models for Medical Image Classification. 1938-1947 - Dawei Lin, Meng Yuan, Ziming Wang, Tieru Wu, Yuanning Liu:

FreeCAD: A Multimodal Framework for 3D CAD Model Generation from Free-Form Prompts. 1948-1956 - Renjie Lin, Jiacheng Li, Shide Du, Shiping Wang, Le Zhang:

OIMGC-Net: Optimization-inspired Interpretable Multi-view Graph Clustering Network. 1957-1966 - Qi Shen, Junchang Xin, Bing Tian Dai, Shudi Zhang, Xinyao Liu, Zhiqiong Wang:

ElaSleepNet: Exploring an Elastic Multimodal Neural Network for Sleep Staging via Temporal and Contextual Consistency Learning. 1967-1976 - Zeyu Zhu, Ke Liang, Lingyuan Meng, Xingchen Hu, Xinwang Liu, Wanwei Liu, Kunlun He:

SALVG: Latent Variable Gene Augmented Graph Learning for Multi-View Clustering in Spatial Transcriptomics. 1977-1986 - Lamei Di, Bin Zhang, Yiming Wang, Wenxia Zhang:

Frequency Meets Semantics: Text-Visual Fusion with Directional Spectral Enhancement for Salient Object Detection in Optical Remote Sensing Images. 1987-1996 - Miaosen Luo, Yuncheng Jiang, Sijie Mai:

Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis. 1997-2006 - Zeyu Xia, Canqun Yang, Haoang Chi, Tao Tang, Weiming Xiang, Yingbo Cui:

MMF-SV: A Multi-Modal Feature Fusion-Based Structural Variant Caller. 2007-2015 - Ziang Li, Chengxiang Si, Zhenyu Cheng:

Zero in on the Target: A Composite Robust Model for Retrieving Information in Traffic Data to Discover Network Attacks. 2016-2025 - Long Chen, De Cheng, Shizhou Zhang, Yinghui Xing, Di Xu, Yanning Zhang:

Amplitude-aware Domain Style Replay for Lifelong Person Re-identification. 2026-2035 - Jie Qin, Wei Yang, Yan Su, Yiran Zhu, Weizhen Li, Yunyue Pan, Chengchang Pan, Honggang Qi:

HER2 Expression Prediction with Flexible Multi-Modal Inputs via Dynamic Bidirectional Reconstruction. 2036-2043 - Zhaochen Guo, Zhixiang Shen, Xuanting Xie, Liangjian Wen, Zhao Kang:

Disentangling Homophily and Heterophily in Multimodal Graph Clustering. 2044-2053 - Zhishuo Zhao, Yi Lin, Dongyue Guo, Junyu Fan:

AV-RISE: Hierarchical Cross-Modal Denoising for Learning Robust Audio-Visual Speech Representation. 2054-2063 - Jiahao Zhang, Wenzhe Yin, Shujian Yu:

Cross-Modal Retrieval with Cauchy-Schwarz Divergence. 2064-2073 - Xinbo Geng, Fan Shi, Xu Cheng, Chen Jia, Meng Zhao, Shengyong Chen:

LFMamba: Focal Stack-aware State Space Modeling for Light Field Salient Object Detection. 2074-2083 - Xiaodi Xu, Lijie Li, Ye Wang, Tao Ren, Tian Qiao:

WFF: Wavelet-based Information Fusion for Multimodal Knowledge Graph Link Prediction. 2084-2093 - Xuyao Liu, Jiahui Qu, Wenqian Dong:

Breaking the Spatial-Temporal Consistency Constraint: Towards Reference-Based Hyperspectral Image Super-Resolution. 2094-2103 - Yifan Liu, Yu Fang, Zhouhan Lin:

Visual-informed Silent Video Identity Conversion. 2104-2112 - Zebing Yao, Hao Fu, Yuanhang Yang, Guanghua Gu:

Dynamic Optimization Noisy Cross-Modal Hashing. 2113-2121 - Yuhang Lan, Shilin Xu, Chao Su, Run Ye, Dezhong Peng, Yuan Sun:

Multi-view Hashing Classification. 2122-2130 - Jielong Lu, Zhihao Wu, Jiajun Yu, Qianqian Shen, Jiajun Bu, Haishuai Wang:

Where Views Meet Curves: Virtual Anchors for Hyperbolic Multi-View Graph Diffusion. 2131-2140 - Jun Yang, Maoyu Mao:

DiffuSeg: Diffusion-Enhanced Cross-Modal Semantic Segmentation for RGB-D. 2141-2149 - Haochen Yang, Lei Li, Jiacheng Guo, Baolu Li, Minghai Qin, Hongkai Yu, Tianyun Zhang:

DA3D: Domain-Aware Dynamic Adaptation for All-Weather Multimodal 3D Detection. 2150-2158 - Wentao Wu, Xiao Wang, Chenglong Li, Bo Jiang, Jin Tang, Bin Luo, Qi Liu:

CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework. 2159-2168 - Yichen Bao, Yuxuan Liu, Yu Duan, Jing Li, Quanxue Gao:

Multi-view Clustering Based on Probabilistic Tensor Regression. 2169-2177 - Xingchen Li, Wuyang Zhang, Guoliang You, Xiaomeng Chu, Wenhao Yu, Yifan Duan, Yuxuan Xiao, Yanyong Zhang:

CalibWorkflow: A General MLLM-Guided Workflow for Centimeter-Level Cross-Sensor Calibration. 2178-2187 - Yongzheng Liu, Siru Zhong, Gefeng Luo, Weilin Ruan, Yuxuan Liang:

Towards Multi-Scenario Forecasting of Building Electricity Loads with Multimodal Data. 2188-2196 - Yan Chen, Bingbing Jiang, Peng Zhou, Lei Duan, Yuhua Qian, Liang Du:

Balanced Multiple Kernel Clustering with Discrete Partition Entropy Auto Regularization. 2197-2206 - Jiale Zou, Yan Chen, Bingbing Jiang, Peng Zhou, Liang Du, Lei Duan, Yuhua Qian:

Robust Tensor Learning with Graph Diffusion for Scalable Multi-view Graph Clustering. 2207-2215 - Linxin Xiao, Xin Wang, Zeyang Zhang, Yang Yao, Wenwu Zhu:

DyNAS-DDI: Dynamic Pairwise Architecture Search for Generalizable Drug-Drug Interaction LLM. 2216-2225 - Jianxiang Xie, Yao Wu, Yachao Zhang, Xiaopei Zhang, Yuan Xie, Yanyun Qu:

PLATO-TTA: Prototype-Guided Pseudo-Labeling and Adaptive Tuning for Multi-Modal Test-Time Adaptation of 3D Segmentation. 2226-2234 - Shilin Liu, Kyohei Kamikawa, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama:

Context-aware Image-to-Music Generation via Bridging Modalities through Musical Captions. 2235-2243 - Yan Li, Xingchen Hu, Jiyuan Liu, Zhong Liu:

Federated Incomplete Multi-view Clustering with Individual Structure Preservation and Central Representation Tensorization. 2244-2253 - Hanghui Guo, Weijie Shi, Mengze Li, Juncheng Li, Hao Chen, Yue Cui, Jiajie Xu, Jia Zhu, Jiawei Shen, Zhangze Chen, Sirui Han:

Consistent and Invariant Generalization Learning for Short-video Misinformation Detection. 2254-2263 - Ruilin Yao, Yi Rong, Tianyu Zou, Bo Zhang, Jian Li, Shengwu Xiong, Shili Xiong:

MAP: Parameter-Efficient Tuning for Referring Expression Comprehension via Multi-Modal Adaptive Positional Encoding. 2264-2273 - Hongyang Lin, Kuixiang Shao, Peijun Xu, Zhuoyang Bu, Yuyang Jiao, Ziyuan Tang, Chenxi Xiao, Jingyi Yu:

HandCraft: Tactile-Informed Hand-Object Dynamics Capture and Realistic Rendering. 2274-2283 - Linxuan Luo, Pan Mu, Cong Bai:

Physics-Coupled Frequency Dynamic Adaptation Network for Domain Generalized Underwater Object Detection. 2284-2293 - Yanfeng Liu, Lefei Zhang:

Multimodal Decomposed Distillation with Instance Alignment and Uncertainty Compensation for Thermal Object Detection. 2294-2303 - Rui Wang, Yuxuan Liu, Guangyu Yang, Quanxue Gao, Cheng Deng:

Bi-Orthogonal Non-negative Tensor tri-Factorization for Tensorized Label Learning. 2304-2312 - Xin Peng, Bowen Liu, Renxiang Guan, Wenxuan Tu:

Multi-view Graph Clustering with Dual Structure Awareness for Remote Sensing Data. 2313-2322 - Mingliang Yan, Yanhua Yu, Ruochi Zhang, Zhiyuan Liu, Ruicheng Zhang, Yimeng Ren, Kangkang Lu, Zhiyong Huang, Feng Luo, Zhen Cai:

DeepMolTex: Deep Alignment of Molecular Graphs with Large Language Models via Mixture of Modality Experts. 2323-2332 - Xinzhu Li, Juepeng Zheng, Yikun Chen, Xudong Mao, Guanghui Yue, Wei Zhou, Chenlei Lv, Ruomei Wang, Fan Zhou, Baoquan Zhao:

DepthGait: Multi-Scale Cross-Level Feature Fusion of RGB-Derived Depth and Silhouette Sequences for Robust Gait Recognition. 2333-2341 - Tianming Xu, Tiantian Guo, Youdan Feng, Zihan Chen, Qiaoyi Xue, Lingzhi Hu, Yuhang Shi:

Anatomical Region-Guided 3D PET/MR Tumor Segmentation via Medical Record. 2342-2351 - Rongqiang Fang, Yongqi Sun, Jidong Yuan, Hongbo Cao, Jinkun Dong:

A Language-Assisted Semantic-Aware Disentangled Method for Link Prediction on Heterogeneous Graphs. 2352-2361 - Guimin Hu, Yi Xin, Lijie Hu, Zhihong Zhu, Hasti Seifi:

PgM: Partitioner Guided Modal Learning Framework. 2362-2371 - Kaixiang Wang, Xiaojian Ding, Wanqi Yang, Ming Yang:

Label-Semantics-Guided Multi-View Multi-Label Learning via High-Order Semantic Fusion. 2372-2380 - Chenyang Zhou, Monghjaya Ha, Chao Tang, Licheng Wu:

UniMTR: Unified Recognition of Dual-style Traditional Mongolian Scripts via Contrastive Representation Alignment. 2381-2389 - Mingyang Yu, Xiahui Guo, Peng Chen, Zhenkai Li, Yang Shu:

Towards Measuring and Modeling Geometric Structures in Time Series Forecasting via Image Modality. 2390-2398 - Shu-Xun Yang, Xian-Ling Mao, Heyan Huang:

ESTJ: Enhancing Structured Tendency Judgment in Hybrid-Modal Table Understanding. 2399-2408 - Maoxun Yuan, Bo Cui, Tianyi Zhao, Jiayi Wang, Shan Fu, Xue Yang, Xingxing Wei:

UniRGB-IR: A Unified Framework for Visible-Infrared Semantic Tasks via Adapter Tuning. 2409-2418 - Nokap Tony Park:

M2PE-Diff: Music-to-Pose Encoder for Dance Video Generation Leveraging Latent Diffusion Framework. 2419-2428 - Xiaorui Ding, Huan Ma, Changqing Zhang:

A Theoretical Proof of Dynamic Multimodal Fusion Exacerbates Modality Greedy. 2429-2436 - Yiming Xu, Jiarun Chen, Zhen Peng, Zihan Chen, Qika Lin, Lan Ma, Bin Shi, Bo Dong:

Court of LLMs: Evidence-Augmented Generation via Multi-LLM Collaboration for Text-Attributed Graph Anomaly Detection. 2437-2446 - Shanghui Deng, Xiao Zheng, Chang Tang, Kun Sun, Yuanyuan Liu, Xinwang Liu:

Find True Collaborators: Banzhaf Index-based Cross View Alignment for Partially View-aligned Clustering. 2447-2456 - Wenlan Chen, Lu Gao, Cheng Liang, Fei Guo:

Deep Variational Incomplete Multi-View Clustering with Information-Theoretic Guidance. 2457-2466 - Jieyi Ge, Zhaodong Sun, Wei Peng, Chenhang Ying, Yuwei Chen, Kui Ren, Xiaobai Li:

Evidential Remote Physiological Measurement via Uncertainty-aware Fusion of Video and RF. 2467-2475 - Fujian Ren, Wenlan Chen, Lu Gao, Fei Guo, Cheng Liang:

Dual-Level Distribution Alignment for Deep Incomplete Multi-View Clustering. 2476-2485 - Guoyi Li, Die Hu, Xiaomeng Fu, Qirui Tang, Yulei Wu, Xiaodan Zhang, Honglei Lyu:

Entity Graph Alignment and Visual Reasoning for Multimodal Fake News Detection. 2486-2495 - Peng Zhao, Zhiguang Cao, Di Wang, Wen Song, Wei Pang, You Zhou, Yuan Jiang:

Visual-Enhanced Multimodal Framework for Flexible Job Shop Scheduling Problem. 2496-2505 - Yu Zhao, Ying Zhang, Xuhui Sui, Baohang Zhou, Haoze Zhu, Jeff Z. Pan, Xiaojie Yuan:

Dark Side of Modalities: Reinforced Multimodal Distillation for Multimodal Knowledge Graph Reasoning. 2506-2515 - Jianting Tang, Yubo Wang, Haoyu Cao, Linli Xu:

CROP: Integrating Topological and Spatial Structures via Cross-View Prefixes for Molecular LLMs. 2516-2525 - Guyue Jin, Tianming Zhao, Jiacan Yan, Tian Tian:

Contextually-Guided State Space Fusion for Misaligned Multi-Spectral Object Detection. 2526-2535 - Libin Liu, Shen Chen, Sen Jia, Jingzhe Shi, Can Jin, Zongkai Wu, Jenq-Neng Hwang, Lei Li:

Graph Canvas for Controllable 3D Scene Generation. 2536-2545 - Berta Céspedes-Sarrias, Carlos Collado-Capell, Pablo Rodenas-Ruiz, Olena Hrynenko, Andrea Cavallaro:

MM-HSD: Multi-Modal Hate Speech Detection in Videos. 2546-2555
Content: Vision and Language
- Yijie Yang, Lianyong Qi, Weiming Liu, Fan Wang, Jing Du, Yuwen Liu, Xiaolong Xu, Qiang Ni, Wanchun Dou, Xiaokang Zhou:

Joint Test-time Adaptation with Refined Pseudo-labels and Latent Score Matching. 2556-2565 - Hua Wang, Hong Liu, Jiale Ren, Mingxin Tan, Zhongzien Jiang:

CLIP-6D: Empowering CLIP as a Zero-Shot 6D Pose Estimator Through Generalizable Object-Specific Representations. 2566-2575 - Ruipu Wu, Yige Zhang, Jinyu Chen, Linjiang Huang, Shifeng Zhang, Xu Zhou, Liang Wang, Si Liu:

AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation. 2576-2585 - Yihua Shao, Haojin He, Sijie Li, Siyu Chen, Xinwei Long, Fanhu Zeng, Yuxuan Fan, Muyang Zhang, Ziyang Yan, Ao Ma, Xiaochen Wang, Hao Tang, Yan Wang, Shuyan Li:

EventVAD: Training-Free Event-Aware Video Anomaly Detection. 2586-2595 - Qiuyu Liang, Yongqiang Zhang:

SAM based Region-Word Clustering and Inference Score Adjusting for Open-Vocabulary Object Detection. 2596-2605 - Xiao Liang, Jiawei Hu, Di Wang, Zhi Ma, Lin Zhao, Ronghan Li, Bo Wan, Quan Wang:

CheXPO: Preference Optimization for Chest X-ray VLMs with Counterfactual Rationale. 2606-2615 - Qian Sun, Chengzhuo Lu, Wenyu Chen, Wenjie Wei, Jingya Wang, Jieyuan Zhang, Xiaoli Liu, Yalan Ye, Yang Yang, Malu Zhang:

Temporal-coded Spiking Transformer. 2616-2624 - Yuwu Lu, Haoyu Huang, Xue Hu:

Domain-aware Visual Context Prompt for Multi-Source Domain Adaptation. 2625-2633 - Xingke Song, Jianxu Shangguan, Yiran Li, Jialu Zhang, Jianfeng Ren, Ruibin Bai, Xin Chen, Xudong Jiang:

CEARI: Co-Evolutionary Agents for Reassembling and Inpainting Puzzles with Gaps and Missing Pieces. 2634-2642 - Xiaoyu Chen, Yigang Cen, Wanru Xu, Yue Zhang, Yi Jin, Yidong Li, Linna Zhang:

Hierarchical Meta-prototypes Network for Few-shot Action Recognition. 2643-2652 - Kyungjune Lee, Seongjean Kim, Hoseok Tong, Hyucksang Lee, Seongmin Lee, Weisi Lin, Ping An, Sanghoon Lee:

Domain Crossover Non-Rigid Registration for 3D Human Meshes. 2653-2662 - Jingyao Wang, Yiming Chen, Lingyu Si, Changwen Zheng:

Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection. 2663-2672 - Yuxing Liu, Ji Zhang, Xuchuan Zhou, Jingzhong Xiao, Huimin Yang, Jiaxin Zhong:

OoDDINO: A Multi-level Framework for Anomaly Segmentation on Complex Road Scenes. 2673-2682 - Si-Woo Kim, MinJu Jeon, Ye-Chan Kim, Soeun Lee, Taewhan Kim, Dong-Jin Kim:

SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning. 2683-2692 - Xun Zhu, Fanbin Mo, Zheng Zhang, Jiaxi Wang, Yiming Shi, Ming Wu, Chuang Zhang, Miao Li, Ji Wu:

Enhancing Multi-task Learning Capability of Medical Generalist Foundation Model via Image-centric Multi-annotation Data. 2693-2702 - Linpu He, Yanan Li, Bingze Li, Elvis Han Cui, Donghui Wang:

DSS-Prompt: Dynamic-Static Synergistic Prompting for Few-Shot Class-Incremental Learning. 2703-2712 - Yian Li, Wentao Tian, Yang Jiao, Tianwen Qian, Na Zhao, Bin Zhu, Jingjing Chen, Yu-Gang Jiang:

Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning. 2713-2722 - Yifei Deng, Chenglong Li, Futian Wang, Jin Tang:

Learning Hierarchical Cross-modal Association with Intra-modal Context for Text-Image Person Retrieval. 2723-2731 - Xiubo Liang, Hongzhi Wang, Zigen Li, Jinxing Han, Yu Zhao, Weidong Geng:

SGM-Transformer: Rethinking Gradient Information Loss and Compensation in Spiking Neural Networks. 2732-2741 - Qinyue Tong, Ziqian Lu, Jun Liu, Yangming Zheng, Zhe-Ming Lu:

MediSee: Reasoning-Based Pixel-Level Perception in Medical Images. 2742-2751 - Shuyong Gao, Qianyu Guo, Yu'ang Feng, Chunyuan Chen, Xujun Wei, Yan Wang, Wenqiang Zhang:

Progressive Representation Learning for Weakly-Supervised Camouflaged Object Detection. 2752-2761 - Huaihai Lyu, Chaofan Chen, Yuheng Ji, Changsheng Xu:

EgoPrompt: Prompt Learning for Egocentric Action Recognition. 2762-2770 - Yuwu Lu, Chunzhi Liu, Yihan Yang:

CWCP: Generalizing Virtual Reality to Real World with Contextual-Weather Correlation Pairing for Deraining and Desnowing. 2771-2780 - Pei Liu, Xin Liu, Ruoyu Yao, Junming Liu, Siyuan Meng, Ding Wang, Jun Ma:

HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation. 2781-2790 - Yan Zhang, Shiwen He, Lin Yuan, Jiaxu Leng, Xinbo Gao:

DichotomyIR: Universal Image Reconstruction via Dichotomy Classification and Uncertainty Elimination. 2791-2800 - Francesco Tonini, Lorenzo Vaquero, Alessandro Conti, Cigdem Beyan, Elisa Ricci:

Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection. 2801-2810 - Zhilin Huang, Chujun Qin, Yifei Xing, Wenming Yang:

Enhanced Motion-aware Latent Diffusion Models for Video Frame Interpolation. 2811-2820 - Zeming Wei, Junyi Lin, Yang Liu, Weixing Chen, Jingzhou Luo, Guanbin Li, Liang Lin:

3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians. 2821-2830 - Huy Le, Nhat Chung, Tung Kieu, Anh Nguyen, Ngan Le:

BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance. 2831-2840 - Jianghang Lin, Yue Hu, Jiangtao Shen, Yunhang Shen, Liujuan Cao, Shengchuan Zhang, Rongrong Ji:

What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation. 2841-2850 - Zhengyang Liang, Meiyu Liang, Wei Huang, Yawen Li, Wu Liu, Yingxia Shao, Kangkang Lu:

Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Retrieval. 2851-2859 - Tiancheng Gu, Kaicheng Yang, Ziyong Feng, Xingjun Wang, Yanzhao Zhang, Dingkun Long, Yingda Chen, Weidong Cai, Jiankang Deng:

Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs. 2860-2869 - Lin Peng, Cong Wan, Shaokun Wang, Xiang Song, Yuhang He, Yihong Gong:

CIA: Class- and Instance-aware Adaptation for Vision-Language Models. 2870-2879 - Xi Xiao, Yunbei Zhang, Xingjian Li, Tianyang Wang, Xiao Wang, Yuxiang Wei, Jihun Hamm, Min Xu:

Visual Instance-aware Prompt Tuning. 2880-2889 - Yuliang Chen, Xi Lin, Chao Sang, Xiu Su:

DualFPT: Handling Data Heterogeneity in Federated Prompt Tuning from both Generalized and Personalized Perspective. 2890-2899 - Lingbo Zhang, Bingqian Sun, Linghan Cai, Yifeng Wang, Ye Zhang, Songhan Jiang, Kai Zhang, Yongbing Zhang:

Counting by Points: Density-Guided Weakly-Supervised Nuclei Segmentation in Histopathological Images. 2900-2908 - Haodong Chen, Haojian Huang, Xinxiang Yin, Dian Shao:

FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts Reasoning. 2909-2918 - Shaowu Xu, Xibin Jia, Junyu Gao, Qianmei Sun, Jing Chang, Chao Fan:

Cross-Modal Dual-Causal Learning for Long-Term Action Recognition. 2919-2928 - Jiahao Li, Yang Lu, Yachao Zhang, Fangyong Wang, Yuan Xie, Yanyun Qu:

Novel Category Discovery with X-Agent Attention for Open-Vocabulary Semantic Segmentation. 2929-2938 - Jingyuan Fang, Yang Ning, Xiushan Nie, Xinfeng Liu, Zhiyong Cheng:

VLHP: Learning Discriminative Vision-Language Hybrid Prototypes for Weakly Supervised Semantic Segmentation. 2939-2948 - Xin Li, Mingming Gong, Yunfei Wu, Jianxin Dai, Antai Guo, Xinghua Jiang, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun:

DREAM: Document Reconstruction via End-to-end Autoregressive Model. 2949-2957 - Longzhen Yang, Zhangkai Ni, Ying Wen, Yihang Liu, Lianghua He, Heng Tao Shen:

Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation. 2958-2967 - Wenxuan Yang, Qingqv Wei, Chenxi Ma, Weimin Tan, Bo Yan:

Scaling Laws for Data-Efficient Visual Transfer Learning. 2968-2976 - Pengcheng Zheng, Kecheng Chen, Jiaxin Huang, Bohao Chen, Ju Liu, Yazhou Ren, Xiaorong Pu:

Lightweight Medical Image Restoration via Integrating Reliable Lesion-Semantic Driven Prior. 2977-2986 - Kun-Hsiang Lin, Yu-Wen Tseng, Kang-Yang Huang, Jhih-Ciang Wu, Wen-Huang Cheng:

InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing. 2987-2996 - Kai Niu, Liucun Shi, Ke Han, Qinzi Zhao, Yue Wu, Yanning Zhang:

Test-Time Adaptation for Text-Based Person Search. 2997-3006 - Si Chen, Yujia Chen, Xiaotian Yin, Xin Liu, Huakai Lai, Tianzhu Zhang:

PAF: Prototype Adaptive Fusion for Test-Time Adaptation of Vision-Language Models. 3007-3016 - Chunyan She, Fujun Han, Chengyu Fang, Shukai Duan, Lidan Wang:

Exploring Fourier Prior and Event Collaboration for Low-Light Image Enhancement. 3017-3026 - Liang Yao, Fan Liu, Delong Chen, Chuanyi Zhang, Yijun Wang, Ziyun Chen, Wei Xu, Shimin Di, Yuhui Zheng:

RemoteSAM: Towards Segment Anything for Earth Observation. 3027-3036 - Jiawei Ge, Xinyu Zhang, Jiuxin Cao, Xuelin Zhu, Weijia Liu, Qingqing Gao, Biwei Cao, Kun Wang, Chang Liu, Bo Liu, Chen Feng, Ioannis Patras:

Gen4Track: A Tuning-free Data Augmentation Framework via Self-correcting Diffusion Model for Vision-Language Tracking. 3037-3046 - Kangjie Chen, BingQuan Dai, Minghan Qin, Dongbin Zhang, Peihao Li, Yingshuang Zou, Haoqian Wang:

SLGaussian: Fast Language Gaussian Splatting in Sparse Views. 3047-3056 - Jo-Ku Cheng, Zeren Zhang, Ran Chen, Jingyang Deng, Ziran Qin, Jinwen Ma:

GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions. 3057-3066 - Hang Xiong, Runmin Cong, Jinpeng Chen, Chen Zhang, Feng Li, Huihui Bai, Sam Kwong:

MM-Prompt: Multi-modality and Multi-granularity Prompts for Few-Shot Segmentation. 3067-3075 - Jiawei Gu, Ziyue Qiao, Zechao Li:

Activation Shape Matters: OOD Detection with Norm-Entropy Fusion. 3076-3084 - Xinchen Ye, Aokai Zhang, Rui Xu:

Semantics-Driven Contrastive Learning for Real-World Depth Super Resolution. 3085-3093 - Jiawen Lin, Shiran Bian, Yihang Zhu, Wenbin Tan, Yachao Zhang, Yuan Xie, Yanyun Qu:

SeqVLM: Proposal-Guided Multi-View Sequences Reasoning via VLM for Zero-Shot 3D Visual Grounding. 3094-3103 - Yucheng Shu, Yaohui Wang, Lihong Qiao, Feiyan Li, Bin Xiao, Weisheng Li, Xinbo Gao:

The Overlooked Matters: Revisiting Background, Prototype, and Activation in Few-Shot Medical Image Segmentation. 3104-3113 - Jiaxin Peng, Siwang Zhou, Chengqing Li, Yucheng Li, Dunyun Chen:

Mitigating Delivery Artifacts in Real-World Video Super-Resolution. 3114-3123 - Wei Chen, Jianwei Niu, Xuefeng Liu, Xinghao Wu:

Decoupling Dense Video Captioning via Task-specific Prompts. 3124-3132 - Yongxin Li, Ying Cheng, Yaning Pan, Wen He, Qing Wang, Rui Feng, Xiaobo Zhang:

Semantic-Aware Hard Negative Mining for Medical Vision-Language Contrastive Pretraining. 3133-3142 - Jiale Li, Mingrui Wu, Zixiang Jin, Hao Chen, Jiayi Ji, Xiaoshuai Sun, Liujuan Cao, Rongrong Ji:

MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models. 3143-3152 - Hezhao Liu, Yang Lu, Mengke Li, Yiqun Zhang, Shreyank N. Gowda, Chen Gong, Hanzi Wang:

FATE: A Prompt-Tuning-Based Semi-Supervised Learning Framework for Extremely Limited Labeled Data. 3153-3162 - Wangsheng He, Wanru Xu, Ping Guo, Zhenjiang Miao, Yi Tian:

InstructStep: Fine-Grained Localization of Step Content and Relation in Instructional Video. 3163-3172 - Jiaqi Xu, Cuiling Lan, Yan Lu:

Deciphering Functions of Neurons in Vision-Language Models. 3173-3181 - Kamakshya Prasad Nayak, Kamalakar Vijay Thakare, Ashesh Xalxo, Lalit Lohani, Debi Prosad Dogra:

Can Person-Level Attributes Improve Group Re-Identification? 3182-3191 - Changshuo Wang, Shuting He, Xiang Fang, Fangzhe Nan, Prayag Tiwari:

Seeing the Overlooked: Bio-Visual Inspired Weak Saliency Feedback Transformer for Person Re-identification. 3192-3201 - Weihuang Lin, Yiwei Ma, Xiaoshuai Sun, Shuting He, Jiayi Ji, Liujuan Cao, Rongrong Ji:

HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation. 3202-3211 - Da Zhang, Feiyu Wang, Bingyu Li, Zhiyuan Zhao, Junyu Gao, Xuelong Li:

KAID: Knowledge-Aware Interactive Distillation for Vision-Language Models. 3212-3221 - Xiao Hu, Heiko Neumann, Jochen Lang:

A Filtering Framework for Semi-online Referring Video Object Segmentation. 3222-3231 - Ruiqi Dong, Wenjing Pang, Chenjie Pan, Hengyang Lu, Chenyou Fan:

StoryCrafter: Instance-Aligned Multi-Character Storytelling with Diffusion Policy Learning. 3232-3241 - Xiaohan Yu, Zicheng Pan, Yang Zhao, Qin Zhang, Yongsheng Gao:

Contrastive Lie Algebra Learning for Ultra-Fine-Grained Visual Categorization. 3242-3250 - Xiaoxing Hu, Kaicheng Yang, Jun Wang, Haoran Xu, Ziyong Feng, Yupei Wang:

Decoupled Global-Local Alignment for Improving Compositional Understanding. 3251-3260 - Jingxing Guo, Guilian Chen, Yimu Sun, Huisi Wu, Jing Qin:

EchoVim: Making Vision Mamba Docile for Echocardiography Video Segmentation via Dynamic Interaction and Semantic Token-attentive Refinement. 3261-3269 - Haifeng Zhao, Shuo Xu, Leilei Ma, Yufei Zhang, Lei Wang, Dengdi Sun:

Towards Space and Semantics: Object-Purified Representation Learning for Multi-Label Image Classification. 3270-3279 - Junyu Gao, Xuan Yao, Yong Rui, Changsheng Xu:

Building Embodied EvoAgent: A Brain-inspired Paradigm for Bridging Multimodal Large Models and World Models. 3280-3289 - Chen Feng, Nicu Sebe, Georgios Tzimiropoulos, Miguel R. D. Rodrigues, Ioannis Patras:

Unveiling Open-set Noise: Theoretical Insights into Label Noise. 3290-3299 - Zhongrui Gui, Junyu Xie, Tengda Han, Weidi Xie, Andrew Zisserman:

Character-Centric Understanding of Animated Movies. 3300-3309 - Ziyun Dai, Xiaoqiang Li, Shaohua Zhang, Yuanchen Wu, Jide Li:

See Different, Think Better: Visual Variations Mitigating Hallucinations in LVLMs. 3310-3319 - Cheng Ye, Weidong Chen, Peipei Song, Xinyan Liu, Lei Zhang, Zhendong Mao:

Multi-round Mutual Emotion-Cause Pair Extraction for Emotion-Attributed Video Captioning. 3320-3329 - Wenhao Zheng, Chenwei Sun, Wenbo Zhang, Jiancheng Lv, Xianggen Liu:

Target-Guided Bayesian Flow Networks for Quantitatively Constrained CAD Generation. 3330-3339 - Zhiyu Ye, Guowen Li, Haoyuan Liang, Zixi Wang, Shilei Cao, Yushan Lai, Juepeng Zheng:

Quantifying Samples with Invariance for Source-Free Class Incremental Domain Adaptation. 3340-3349 - Shuai Huang, Yongxiong Wang, Huan Luo, Haodong Jing, Chendong Qin, Jingqun Tang:

MINDEV: Multi-modal Integrated Diffusion Framework for Video Reconstruction from EEG Signals. 3350-3359 - Zhijie Rao, Jingcai Guo:

Balancing Cross-Modal Attention for Generalized Zero-Shot Learning. 3360-3369 - Zhenxuan Fang, Shuaibo Wang, Weisheng Dong, Junwei Xu, Fangfang Wu, Xin Li, Guangming Shi:

Beyond Visual Quality: Fidelity-Oriented Diffusion Model for Real-world Image Super-Resolution. 3370-3379 - Peng Ying, Zhongnian Li, Meng Wei, Xinzheng Xu:

Reversible Privacy Preserving on Vision-Language Models via Adversarial Multimodal Key. 3380-3389 - Taras Kucherenko, Derek Peristy, Judith Bütepage:

Evaluating the Evaluators: Towards Human-aligned Metrics for Missing Markers Reconstruction. 3390-3398 - Changho Choi, Youngwoo Shin, Gyojin Han, Dong-Jae Lee, Junmo Kim:

B4DL: A Benchmark for 4D LiDAR LLM in Spatio-Temporal Understanding. 3399-3407 - Fenghe Tang, Bingkun Nian, Jianrui Ding, Wenxin Ma, Quan Quan, Chengqi Dong, Jie Yang, Wei Liu, S. Kevin Zhou:

Mobile U-ViT: Revisiting large kernel and U-shaped ViT for efficient medical image segmentation. 3408-3417 - Ling You, Wenxuan Huang, Xinni Xie, Xiangyi Wei, Bangyan Li, Shaohui Lin, Yang Li, Changbo Wang:

TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation. 3418-3427 - Bowen Guo, Shiwei Gan, Yafeng Yin, Xiao Liu, Zhiwei Jiang, Shunmei Meng:

Sentence-level Segmentation for Long Sign Language Videos with Captions. 3428-3437 - Jiayi Zou, Chaofan Chen, Bing-Kun Bao, Changsheng Xu:

DMC3: Dual-Modal Counterfactual Contrastive Construction for Egocentric Video Question Answering. 3438-3447 - Penglei Sun, Yaoxian Song, Xiangru Zhu, Xiang Liu, Qiang Wang, Yue Liu, Changqun Xia, Tiefeng Li, Yang Yang, Xiaowen Chu:

City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning. 3448-3457 - Yuzhen Niu, Siling Chen, Yuzhong Chen, Fusheng Li, Rui Xu, Hui Da:

CoFiVLA: Synergistic Coarse-Fine Vision-Language Alignment for Image Aesthetic Assessment. 3458-3467 - Duolin Wang, Guanyu Xing, Yanli Liu:

FlowTrack: Integrating Adjacent-Frame Motion Tracking and Adaptive Prediction for Robust Semi-Supervised VOS. 3468-3476 - Lin Zhang, Yi Tian, Xiyun Wang, Wanru Xu, Yi Jin, Yaping Huang:

Differential Contrastive Training for Gaze Estimation. 3477-3486 - Tiancheng Gu, Kaicheng Yang, Chaoyi Zhang, Yin Xie, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai, Jiankang Deng:

RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm. 3487-3496 - Yanting Pei, Fan Yang:

Adaptive Neighbors and Uncertainty Estimation for Source-Free Unsupervised Domain Adaptation with Noisy Labels. 3497-3506 - Bingshuai Liu, Ante Wang, Zijun Min, Chenyang Lyu, Longyue Wang, Zhihao Wang, Xu Han, Peng Li, Jinsong Su:

EditEval: Towards Comprehensive and Automatic Evaluation for Text-guided Video Editing. 3507-3516 - Rui Chen, Lei Sun, Jing Tang, Geng Li, Xiangxiang Chu:

FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos. 3517-3526 - Zizhi Chen, Xinyu Zhang, Minghao Han, Yizhou Liu, Ziyun Qian, Weifeng Zhang, Xukun Zhang, Jingwei Wei, Lihua Zhang:

VLM-based Prompts as the Optimal Assistant for Unpaired Histopathology Virtual Staining. 3527-3536 - Zihao Mo, Junye Chen, Chaowei Fang, Guanbin Li:

PatchWiper: Leveraging Dynamic Patch-Wise Parameters for Real-World Visible Watermark Removal. 3537-3545 - Xueyu Yuan, Jiarui Zhang, Jiangqi Song, Liu Liu, Li Zhang, Dan Guo, Richang Hong, Meng Wang:

DFGAP: Towards Depth-Free Cross-Category GAParts Perception via Uncertainty-Quantified Modeling. 3546-3554 - Yudong Zhang, Ruobing Xie, Xingwu Sun, Yiqing Huang, Jiansheng Chen, Zhanhui Kang, Di Wang, Yu Wang:

DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language Models. 3555-3564 - Wenjie Zhu, Yabin Zhang, Xin Jin, Wenjun Zeng, Lei Zhang:

Knowledge Regularized Negative Feature Tuning of Vision-Language Models for Out-of-Distribution Detection. 3565-3574 - Ji Ma, Wei Suo, Peng Wang, Yanning Zhang:

Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers. 3575-3584 - Xudong Wang, Lei Tan, Pingyang Dai, Liujuan Cao, Rongrong Ji:

GPT-ReID: Learning Fine-grained Representation with GPT for Text-based Person Retrieval. 3585-3594 - Runze Zhao, Fuqing Zhu, Jizhong Han, Songlin Hu:

Visual Perception Uncertainty Learning for Hallucination Detection in Large Vision-Language Models. 3595-3604 - Lei Liu, Xiangdong Su, Guanglai Gao:

Fourier Self-Adaptation for Transferring General Pretrained Models to Specific Domains. 3605-3614 - Yiying Yang, Fukun Yin, Jiayuan Fan, Wanzhang Li, Xin Chen, Gang Yu:

Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE. 3615-3624 - Gefan Ye, Lin Li, Kexin Li, Jun Xiao, Long Chen:

Zero-shot Compositional Action Recognition with Neural Logic Constraints. 3625-3634 - Yijun Wang, Siying Wu, Lubin Gan, Zheyu Zhang, Jing Zhang, Zhangchi Hu, Huyue Zhu, Peixi Wu, Xiaoyan Sun:

MeDKCoOp: Dual Knowledge-guided Graph Prompt Learning for Biomedical Vision-Language Models. 3635-3644 - Jianhui Wang, Yangfan He, Yan Zhong, Xinyuan Song, Jiayi Su, Yuheng Feng, Ruoyu Wang, Hongyang He, Wenyu Zhu, Xinhang Yuan, Miao Zhang, Keqin Li, Jiaqi Chen, Tianyu Shi, Xueqian Wang:

Twin Co-Adaptive Dialogue for Progressive Image Generation. 3645-3653 - Jiayuan Rao, Zifeng Li, Haoning Wu, Ya Zhang, Yanfeng Wang, Weidi Xie:

Multi-Agent System for Comprehensive Soccer Understanding. 3654-3663 - Yuguang Zhang, Qihang Fan, Huaibo Huang:

Vision Transformer with Sparse Scan Prior. 3664-3672 - Shaohui Dai, Yansong Qu, Zheyan Li, Xinyang Li, Shengchuan Zhang, Liujuan Cao:

Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs. 3673-3682 - Qinchen Wu, Difei Gao, Qinghong Lin, Zhuoyu Wu, Mike Zheng Shou:

GUI-Narrator: Detecting and Captioning Computer GUI Actions. 3683-3692 - Liangyu Fu, Junbo Wang, Yuke Li, Qiangguo Jin, Hongsong Wang, Jing Ya, Linjiang Huang, Liang Yao, Jiangbin Zheng, Xuecheng Wu, Zhiyong Wang:

DSACap: Enhancing Visual-Semantic Alignment with Diffusion-based Framework for Image Captioning. 3693-3701 - Meng Wei, Zhongnian Li, Peng Ying, Xinzheng Xu:

Seeing the Undefined: Chain-of-Action for Generative Semantic Labels. 3702-3711 - Zikang Liu, Kun Zhou, Wayne Xin Zhao, Dawei Gao, Yaliang Li, Ji-Rong Wen:

Less is More: High-value Data Selection for Visual Instruction Tuning. 3712-3721 - Mengzu Liu, Junwei Xu, Tao Huang, Fangfang Wu, Le Dong, Xin Li, Weisheng Dong:

Exploring Global Correlations via Polarity Memory for Multispectral Demosaicing. 3722-3730 - Zhaofeng Shi, Heqian Qiu, Lanxiao Wang, Qingbo Wu, Fanman Meng, Hongliang Li:

Unsupervised Ego- and Exo-centric Dense Procedural Activity Captioning via Gaze Consensus Adaptation. 3731-3740 - Chao Yin, Hao Li, Kequan Yang, Jide Li, Pinpin Zhu, Xiaoqiang Li:

Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation. 3741-3750 - Shanding Diao, Yang Zhao, Yuan Chen, Zhao Zhang, Wei Jia, Ronggang Wang:

Multi-Layer Gaussian Splatting for Single-Image Feed-Forward Spatial Scene Reconstruction. 3751-3759 - Yang Ren, Hai Jiang, Wei Li, Menglong Yang, Heng Zhang, Zehua Sheng, Qingsheng Ye, Shuaicheng Liu:

Learning Arbitrary-Scale RAW Image Downscaling with Wavelet-based Recurrent Reconstruction. 3760-3768 - Wenqi Zeng, Yuqi Sun, Chenxi Ma, Weimin Tan, Bo Yan:

MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks. 3769-3778 - Zelei Wu, Xulun Ye, Jieyu Zhao:

Clustering-Based Tail-class Mitigation for New-class Discovery. 3779-3787 - Siqi Song, Limin Yu, Jimin Xiao:

SDP: Spectral-Decomposed Prompting for Continual Learning. 3788-3797 - Shubo Liu, Hongsheng Zhang, Qian Qiao, Qi Wu, Peng Wang:

VLN-ChEnv: Vision-language Navigation in Changeable Environments. 3798-3807 - Kedong Xiu, Sai Qian Zhang:

CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models. 3808-3816 - Fan Yang, Ling Deng, Zhiyong Gan, Qisheng He, Yuanbo Fang, Xiangmin Xu, Shuangping Huang, Tianshui Chen:

Optimal Feature Embedding for Document Large Visual Language Model. 3817-3826 - Lin Li, Guikun Chen, Zhen Wang, Jun Xiao, Long Chen:

Compositional Zero-shot Learning via Progressive Language-based Observations. 3827-3836 - Weimin Cheng, Zhenyu Wang, Tao Huang, Fangfang Wu, Weisheng Dong:

Pushing the Limit of Binarized Neural Network for Image Super Resolution with Smooth Information Transmission. 3837-3846 - Xiang Ma, Litian Xu, Lexin Fang, Caiming Zhang, Lizhen Cui:

Reliable Cross-modal Alignment via Prototype Iterative Construction. 3847-3855 - Ran Chen, Taiyi Su, Hanli Wang:

WaveCL: Wavelet Calibration Learning for Referring Video Object Segmentation. 3856-3864 - Jingxing Guo, Guilian Chen, Yimu Sun, Huisi Wu, Jing Qin:

Hierarchical Spatiotemporal Context Aggregation and Speckle-aware Deformable Convolution for Echocardiography Video Segmentation. 3865-3874 - Junkang Liu, Fanhua Shang, Yuxuan Tian, Hongying Liu, Yuanyuan Liu:

Consistency of Local and Global Flatness for Federated Learning. 3875-3883 - Yangxu Yin, Honglong Chen, Yudong Gao, Peng Sun, Liantao Wu, Zhe Li, Weifeng Liu:

FFCBA: Feature-based Full-target Clean-label Backdoor Attacks. 3884-3892 - Sijing Li, Tianwei Lin, Lingshuai Lin, Wenqiao Zhang, Jiang Liu, Xiaoda Yang, Juncheng Li, Yucheng He, Xiaohui Song, Jun Xiao, Yueting Zhuang, Beng Chin Ooi:

EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model. 3893-3902 - Changtao Miao, Qi Chu, Tao Gong, Zhentao Tan, Zhenchao Jin, Wanyi Zhuang, Man Luo, Honggang Hu, Nenghai Yu:

Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization. 3903-3912 - Shanshan Li, Jiawei Hou, Da Huang, Yanwei Fu, Xiangyang Xue:

Ali-UI: Enhancing Complex Vision-Language Navigation with Alignment of Unified Map and Instruction Parsing. 3913-3922 - Ziming Zhao, Zhaoxuan Li, Tingting Li, Fan Zhang:

Stealthy-AE: Generating Stealthy Adversarial Examples through Online Social Networks. 3923-3931 - Hanning Chen, Yang Ni, Wenjun Huang, Hyunwoo Oh, Yezi Liu, Tamoghno Das, Mohsen Imani:

LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation. 3932-3941 - Yonghyeon Jo, Janghyun Kim, Jinsun Park:

BAC-GCN: Background-Aware CLIP-GCN Framework for Unsupervised Multi-Label Classification. 3942-3951 - Dingwei Zhang, Dong Zhang, Jinhui Tang:

Mitigating Query Selection Bias in Referring Video Object Segmentation. 3952-3961 - Xiangyu Shan, Heng Song, Junwu Zhu:

DFCNet: Dual-Factor Compensatory Clustering Network for Modality-Imbalanced Generalized Zero-Shot Learning. 3962-3971 - Zhiyuan Fan, Keyi Liang:

Video-to-Image Affordance Grounding via Visual Conceptual Learning. 3972-3980 - Qiyan Zhao, Xiaofeng Zhang, Yiheng Li, Yun Xing, Xiaosong Yuan, Feilong Tang, Sinan Fan, Xuhang Chen, Da-Han Wang, Xu-Yao Zhang:

MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models. 3981-3990 - Dexuan Xu, Yanyuan Chen, Yu Huang, Shihao E, Yiwei Lou, Yongzhi Cao, Hanpin Wang, Meikang Qiu:

Medical Vision-Language Pre-training with Multimodal Variational Masked Autoencoder for Robust Medical VQA. 3991-4000 - Yili Li, Gang Xiong, Gaopeng Gou, Xiangyan Qu, Jiamin Zhuang, Zhen Li, Junzheng Shi:

T2VParser: Adaptive Decomposition Tokens for Partial Alignment in Text to Video Retrieval. 4001-4009 - Yizhi Hu, Zezhao Tian, Xingqun Qi, Chen Su, Bingkun Yang, Junhui Yin, Muyi Sun, Man Zhang, Zhenan Sun:

ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension. 4010-4019 - Xiaoqin Wang, Xianxu Hou, Meidan Ding, Junliang Chen, Kaijun Deng, Jinheng Xie, Linlin Shen:

DisFaceRep: Representation Disentanglement for Co-occurring Facial Components in Weakly Supervised Face Parsing. 4020-4029 - Zhenni Yu, Li Zhao, Guobao Xiao, Xiaoqin Zhang:

SAM-TTT: Segment Anything Model via Reverse Parameter Configuration and Test-Time Training for Camouflaged Object Detection. 4030-4038 - Jing Ma, Haochen Sun, Zeyuan Zang, Fangxiang Feng, Caixia Yuan, Lei Ren, Huixing Jiang, Wei Chen, Xiaojie Wang:

VL-DynaRefine: A Vision-Language Dynamic Refinement Approach for Visual Reasoning. 4039-4047 - Jiao Chen, Jiayi He, Fangfang Chen, Zuohong Lv, Jianhua Tang:

Forward-Only Continual Learning. 4048-4057 - Jiahua Bao, Siyao Cheng, Jiaxing Du, Changjiang He, Zeming Lang, Hao Zhang, Jie Liu:

BOLT: Fewer Tokens but More Performance Retention for Efficient Vision-Language Models Inference. 4058-4067 - Ziqi Yuan, Jun Li, Yanghao Li, Yuxiang Huang, Chi Chen, Shuo Wang, Zhinan Gou:

CITR: Efficient Long Video Understanding Needs Causal Importance. 4068-4076 - Qi Li, Yucan Zhou, Jiang Zhou, XingYou Yang, Xiaoyan Gu:

Diverse and Public Features Cooperation via Gradient Rectification for Federated Prompt Learning. 4077-4086 - Shilei Wang, Gong Cheng, Pujian Lai, Dong Gao, Junwei Han:

Multi-State Tracker: Enhancing Efficient Object Tracking via Multi-State Specialization and Interaction. 4087-4096 - Xinyu Zhang, Lingling Zhang, Yanrui Wu, Muye Huang, Jun Liu:

Cognitive Predictive Coding Network: Rethinking the Generalization in Raven's Progressive Matrices. 4097-4106 - Xiaoxuan Mu, Haoyu Tang, Han Jiang, Tianyuan Liang, Qinghai Zheng, Jihua Zhu:

FACE: A Dual-Template and Adaptive Curriculum Framework for Unsupervised Text-Based Person Search. 4107-4116 - Xinyu Huang, Yi-Jie Huang, Youcai Zhang, Weiwei Tian, Rui Feng, Yuejie Zhang, Yanchun Xie, Yaqian Li, Lei Zhang:

Open-Set Image Tagging with Multi-Grained Text Supervision. 4117-4126 - Zhihao Wang, Shiyu Liu, Zhiwei He, Kangjie Zheng, Liangying Shao, Junfeng Yao, Jinsong Su:

Gloss Matters: Unlocking the Potential of Non-Autoregressive Sign Language Translation. 4127-4136 - Jiye Xie, Yifei Gao, Liangliang You, Xiang Xu, Haoran Xu, Zhiqiang Kou, Kexue Fu, Youyang Qu, Wenjie Yang, Jianwei Guo, Weiliang Meng, Longxiang Gao, Haoran Yang, Changwei Wang, Yu Zhang:

Collaboration Wins More: Dual-Modal Collaborative Attention Reinforcement for Mitigating Large Vision Language Models Hallucination. 4137-4146 - Xinzhe Xia, Weiguang Zhao, Yuyao Yan, Guanyu Yang, Rui Zhang, Kaizhu Huang, Xi Yang:

Towards Training-Free Open-World Classification with 3D Generative Models. 4147-4155 - Mingyu Fu, Wei Suo, Ji Ma, Lin Yuanbo Wu, Peng Wang, Yanning Zhang:

Mitigating Information Loss under High Pruning Rates for Efficient Large Vision Language Models. 4156-4165 - Zijun Xu, Jiahao Guo, Chunjie Zhang, Zhongyuan Wang, Chunxia Xiao, Chao Liang:

Quantum Interference-Inspired Who-What-Where Composite-Semantics Instance Search for Story Videos. 4166-4174 - Lihong Qiao, Shiyi Gao, Yucheng Shu, Bin Xiao, Weisheng Li, Xinbo Gao:

Pathology-Aware Reconstruction with Discriminative Knowledge Boosting Alignment for Che-Xray Vision-Language Pre-training. 4175-4184 - Rongzhen Zhao, Yi Zhao, Juho Kannala, Joni Pajarinen:

Slot Attention with Re-Initialization and Self-Distillation. 4185-4192 - Qucheng Peng, Chen Bai, Guoxiang Zhang, Bo Xu, Xiaotong Liu, Xiaoyin Zheng, Chen Chen, Cheng Lu:

NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving. 4193-4202 - Zizhuo Li, Chunbao Su, Fan Fan, Jun Huang, Jiayi Ma:

CorrNeXt: Making the ConvNet-Style Correspondence Pruner Stronger for Two-View Geometry. 4203-4212 - Jinxu Zhang, Qiyuan Fan, Yongqi Yu, Yu Zhang:

DREAM: Integrating Hierarchical Multimodal Retrieval with Multi-page Multimodal Language Model for Documents VQA. 4213-4221 - Junyi Wang, Yue Qi:

Visual Localization using Hybrid Feature Grid and Learned Weighted Global Point Cloud. 4222-4231 - Yifan Zhang, Yang Shi, Weichen Yu, Qingsong Wen, Xue Wang, Wenjing Yang, Zhang Zhang, Liang Wang, Rong Jin:

Debiasing Multimodal Large Language Models via Penalization of Language Priors. 4232-4241 - Xiaolei Bo, Feiyang Yang, Feilong Xu, Xiaoli Zhang:

Cross-Counter-Repeat Attention for Enhanced Understanding of Visual Semantics in Radiology Report Generation. 4242-4250 - Jiacheng Ruan, Zongyun Zhang, Jingsheng Gao, Wenzhen Yuan, Ting Liu, Yuzhuo Fu:

MPI-CD: Multi-Path Information Contrastive Decoding for Mitigating Hallucinations in Large Vision-Language Models. 4251-4260 - Hao Sun, Fenggen Yu, Huiyao Xu, Tao Zhang, Changqing Zou:

LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View Synthesis. 4261-4270 - Hongchen Wei, Zhenzhong Chen:

RealVG: Unleashing MLLMs for Training-Free Spatio-Temporal Video Grounding in the Wild. 4271-4280 - Hongchen Wei, Zhenzhong Chen:

Visual Context Window Extension: A New Perspective for Long Video Understanding. 4281-4289 - Yu Liu, Kun Sun, Chang Tang, Yuhua Qian, Xin Li:

TPDepth: Leveraging Text Prompts with ControlNet to Boost Diffusion-based Depth Estimation. 4290-4299 - Yingxin Lai, Hongyang Wang, Jing Yang, Xiangui Kang, Bin Li, Linlin Shen, Zitong Yu:

GM-DF: Generalized Multi-Scenario Deepfake Detection. 4300-4309 - Kun Zhai, Siheng Chen, Xingjun Ma, Yu-Gang Jiang:

FedAPT: Federated Adversarial Prompt Tuning for Vision-Language Models. 4310-4318 - Jie Wan, Jianhao Fu, Ziqi Yang, Kui Ren:

BTUAP: Boosting the Transferability of Universal Adversarial Perturbations in the Black-box Setting under various data dependencies. 4319-4328 - Hui Wu, Haoquan Zhai, Yuchen Li, Hengyi Cai, Peirong Zhang, Yidan Zhang, Lei Wang, Chunle Wang, Yingyan Hou, Shuaiqiang Wang, Dawei Yin:

MARA: A Multimodal Adaptive Retrieval-Augmented Framework for Document Question Answering. 4329-4338 - Bocheng Pan, Hailong Shi, Xingyu Gao:

DR-VQA: Decompose-then-Reconstruct for Visual Question Answering in BLV Assistance. 4339-4348 - Wei Jia, Li Jin, Kaiwen Wei, Yuying Shang, Nayu Liu, Zhicong Lu, Qing Liu, Linhao Zhang, Jiang Zhong, Yanfeng Hu:

U-MERE: Unconstrained Multimodal Entity and Relation Extraction with Collaborative Modeling and Order-Sensitive Optimization. 4349-4358 - Luyao Ren, Wenxin Yu, Zhiqiang Zhang, Chang Liu:

EMIFS: Efficient Multi-scale Information Fusion Self-supervision for Medical Image Segmentation. 4359-4368 - Chenxi Zhang, Qing Zhang, Jiayun Wu, Youwei Pang:

CGCOD: Class-Guided Camouflaged Object Detection. 4369-4377 - Wenzheng Yang, Songwei Pei, Bingfeng Liu, Qian Li, Shangguang Wang:

OGDepth: Leveraging Object Guidance in Diffusion Models for Enhanced Monocular Depth Estimation. 4378-4387 - Xueyi Zhang, Peiyin Zhu, Yuan Liao, Xiyu Wang, Mingrui Lao, Siqi Cai, Yanming Guo, Haizhou Li:

TrustCLIP: Learning from Noisy Labels via Semantic Label Verification and Trust-aligned Gradient Projection. 4388-4397 - Yikun Ji, Yan Hong, Jiahui Zhan, Haoxing Chen, Jun Lan, Huijia Zhu, Weiqiang Wang, Liqing Zhang, Jianfu Zhang:

Towards Explainable Fake Image Detection with Multi-Modal Large Language Models. 4398-4407 - Xiaodong Wang, Hongmin Hu, Fei Yan, Junwen Lu, Zhiqiang Zeng, Weidong Hong, Zhedong Zheng:

UniAD: Integrating Geometric and Semantic Cues for Unified Anomaly Detection. 4408-4417 - Runwei Situ, Yi Cai, Yong Xu, Jiexin Wang:

Ground and Reconstruct: Entity-Region Bidirectional Alignment Pre-Training for Low-Resource GMNER. 4418-4426 - Yongquan Xue, Zhaoru Guo, Zhaozhao Su, Chong Peng, Jun Feng, Pan Zhou, Marcin Pietron, Xiyuan Wang, Liejun Wang, Panpan Zheng:

Rodecon-net: Medical Image Segmentation via Robust Decoupling and Contrast-enhanced Fusion. 4427-4435 - Wenxi Huang, Xiaojun Chen, Qin Zhang, Ting Wan, Ziqi Liu, Liangjie Zhang:

MRBench: A Multi-Image Reasoning Benchmark with Adaptive Knowledge Retrieval. 4436-4445 - Xuanliu Zhu, Yiqiao Chai, Runnan Li, Mingying Lan, Li Gao:

CrossMind-VL: Multi-Subject Mind-to-Video Decoding with Multimodal LLM Semantic Grounding. 4446-4454 - Jiaqing Fan, Hanwen Qian, Mengjuan Jiang, Fanzhang Li:

PeriodVOS: Learning Periodic Patterns for Unsupervised Video Object Segmentation via Adaptive Contextual Coupling. 4455-4463 - Xiangzhao Hao, Kuan Zhu, Hongyu Guo, Haiyun Guo, Ning Jiang, Quan Lu, Ming Tang, Jinqiao Wang:

Referring Expression Instance Retrieval and A Strong End-to-End Baseline. 4464-4473 - Lifeng Lin, Rongfeng Lu, Quan Chen, Haofan Ren, Ming Lu, Yaoqi Sun, Chenggang Yan, Anke Xue:

VGNC: Reducing the Overfitting of Sparse-view 3DGS via Validation-guided Gaussian Number Control. 4474-4483 - Sidun Liu, Wenyu Li, Peng Qiao, Yong Dou:

Regist3R: Incremental Registration with Stereo Foundation Model. 4484-4493 - Zichi Liu, Yinggui Wang, Tao Wei, Chao Ma:

AnchorSync: Global Consistency Optimization for Long Video Editing. 4494-4503 - Hongxu Ma, Chenbo Zhang, Lu Zhang, Jiaogen Zhou, Jihong Guan, Shuigeng Zhou:

Fine-grained Zero-Shot Object Detection. 4504-4513 - Hongxu Ma, Guanshuo Wang, Fufu Yu, Qiong Jia, Shouhong Ding:

MS-DETR: Towards Effective Video Moment Retrieval and Highlight Detection by Joint Motion-Semantic Learning. 4514-4523 - Hao Ruan, Jinliang Lin, Yingxin Lai, Zhiming Luo, Shaozi Li:

HCCM: Hierarchical Cross-Granularity Contrastive and Matching Learning for Natural Language-Guided Drones. 4524-4533 - Yun Li, Lina Yao, Zhe Liu:

Compositional Zero-Shot Learning with Contextualized Cues and Adaptive Contrastive Training. 4534-4541 - Zhuming Wang, Yihao Zheng, Jiarui Li, Yaofei Wu, Yan Huang, Zun Li, Lifang Wu, Liang Wang:

VicKAM: Visual Conceptual Knowledge Guided Action Map for Weakly Supervised Group Activity Recognition. 4542-4551 - Yuzhen Li, Min Liu, Yuan Bian, Xueping Wang, Zhaoyang Li, Gen Li, Yaonan Wang:

Dual Enhancement on 3D Vision-Language Perception for Monocular 3D Visual Grounding. 4552-4561 - Yiliang Zhu, Dayan Wu, Qinghang Su, Zexian Yang, Zheng Lin, Weiping Wang:

Mitigating the Evolving Semantic Entanglement in Continual Learning of Vision-Language Models. 4562-4570 - Xiongwei Dang, Wenxuan Liu, Xian Zhong, Zheng Wang:

SegTraj: A Segmented-Trajectory-Aware Spatio-Temporal Graph Convolutional Network for Social Group Detection. 4571-4579 - Sifan Zuo, Youfa Liu, Bo Du:

CSDN: CLIP-Driven Similarity-Aligned Distillation Network for Weakly-Supervised Object Localization. 4580-4589 - Dirui Xie, Xiaofang Hu, Zihan Wei, Zhengqiqi Yang, Yanlian Jiang, Yue Zhou:

Learning Structural Priors via Laplacian RWKV Diffusion with Light-Effect Dataset for Nighttime Visibility Enhancement. 4590-4599 - Biao Chen, Kunbin He, Zhikun Zheng, Mengmeng Jing, Lin Zuo:

Chain-of-Thought Guided Semantic Debiasing for Low-Shot Vision-Language Tasks. 4600-4609 - Shengli Zhou, Yang Liu, Feng Zheng:

Learn 3D VQA Better with Active Selection and Reannotation. 4610-4618 - Kun Ding, Ying Wang, Shiming Xiang:

EvoVLMA: Evolutionary Vision-Language Model Adaptation. 4619-4628 - Yang Liu, Zhiyong Zhang:

DSP: Dense-Sparse Parallel Networks for Self-supervised 3D Multi-person Pose Estimation from Multiple Views. 4629-4638 - Meng Chu, Yicong Li, Tat-Seng Chua:

GraphVideoAgent: Enhancing Long-form Video Understanding with Entity Relation Graphs. 4639-4648 - Hancong Wang, Yue Yu, Hairong Zheng, Tong Zhang:

Test-Time Adaptation of Medical Vision-Language Models with Mixture of Modality Experts. 4649-4658 - Zixuan Wan, Jiqing Zhang, Yushan Wang, Hu Lin, Yafei Wang, Zetian Mi, Xin Yang, Xianping Fu, Huibing Wang:

Eye-based Emotion Recognition via Event-Driven Sparse Transformers. 4659-4668 - Guoxin Zhang, Zhonghong Ou, Kaiwen Xue, Jiangfeng Sun, Yifan Zhu, Siyuan Yao, Yiran Shen, Meina Song:

DGFSD: Bridging the Gap between Dense and Sparse for Fully Sparse 3D Object Detection. 4669-4678 - Benlong Wu, Yuang Qi, Xiuwei Shang, Weiming Zhang, Nenghai Yu, Kejiang Chen:

MMPro: A Decoupled Perception-Thinking-Execution Framework for Secure GUI Agent. 4679-4687 - Shengqian Zhu, Chengrong Yu, Wenbo Qi, Jiafei Wu, Ying Song, Guangjun Li, Zhang Yi, Xiaogang Xu, Junjie Hu:

PRIME: Prototype-Driven Class Incremental Learning for Medical Image Segmentation. 4688-4697 - Qile Su, Shoutai Zhu, Shuai Zhang, Baoyu Liang, Chao Tong:

EventFormer: A Node-graph Hierarchical Attention Transformer for Action-centric Video Event Prediction. 4698-4707 - Haijing Liu, Tao Pu, Hefeng Wu, Keze Wang, Liang Lin:

DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label Recognition. 4708-4717 - Mahiro Ukai, Shuhei Kurita, Nakamasa Inoue:

STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models. 4718-4727 - Shiying Lin, Rong Hu, Zuoyong Li, Qinghua Lin, Jiawei Wu, Changqing Zhang:

Gradient-Aware Revitalization of Non-Effective Samples in Medical Image Segmentation. 4728-4737 - Chang Su, Beihong Jin, Fusang Zhang, Siheng Li, Zhi Wang:

Self-Supervised Human Mesh Recovery from Partial Point Cloud via a Self-Improving Loop. 4738-4747 - Ruoxuan Li, Xiangyu Wu, Yang Yang:

Noise Self-Correction via Relation Propagation for Robust Cross-Modal Retrieval. 4748-4757 - Yangyang Xu, Xi Ye, Duo Su:

Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained Experts. 4758-4767 - Siran Peng, Tianshuo Zhang, Li Gao, Xiangyu Zhu, Haoyuan Zhang, Kai Pang, Zhen Lei:

WMamba: Wavelet-based Mamba for Face Forgery Detection. 4768-4777 - Nanxing Hu, Xiaoyue Duan, Jinchao Zhang, Guoliang Kang:

Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models. 4778-4787 - Yiwen Liang, Hui Chen, Yizhe Xiong, Zihan Zhou, Mengyao Lyu, Zijia Lin, Shuaicheng Niu, Sicheng Zhao, Jungong Han, Guiguang Ding:

Advancing Reliable Test-Time Adaptation of Vision-Language Models under Visual Variations. 4788-4797 - Chunpeng Wang, Wenlong Ma, Li Zou, Zhiqiu Xia, Qi Li, Bin Ma, Yunan Liu:

Toward Robust Deepfake Detection: A Proactive Method Based on Watermarking and Knowledge Distillation. 4798-4807 - Futa Waseda, Saku Sugawara, Isao Echizen:

Quality Text, Robust Vision: The Role of Language in Enhancing Visual Robustness of Vision-Language Models. 4808-4816 - Zhenghao Liu, Xingsheng Zhu, Tianshuo Zhou, Xinyi Zhang, Xiaoyuan Yi, Yukun Yan, Ge Yu, Maosong Sun:

Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts. 4817-4826 - Garry Yang, Zizhe Chen, Man Hon Wong, Haoyu Lei, Yongqiang Chen, Zhenguo Li, Kaiwen Zhou, James Cheng:

MESH - Understanding Videos Like Human: Measuring Hallucinations in Large Video Models. 4827-4836

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














