


default search action
31st ACM Multimedia 2023: Ottawa, ON, Canada
- Abdulmotaleb El-Saddik, Tao Mei, Rita Cucchiara, Marco Bertini, Diana Patricia Tobon Vallejo, Pradeep K. Atrey, M. Shamim Hossain:
Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023. ACM 2023
Keynote Talks
- Chang Wen Chen
:
Internet of Video Things: Technical Challenges and Emerging Applications. 1-2 - Alejandro Jaimes
:
Multimodal AI & LLMs for Peacekeeping and Emergency Response. 3-4 - Ralf Steinmetz
:
Transition and Adaptability: The Cornerstone of Resilience in Future Networked Multimedia Systems and Beyond. 5-6
Oral Session I: Understanding Multimedia Content -- Media Interpretation
- Hao Shen
, Zhong-Qiu Zhao
, Yulun Zhang
, Zhao Zhang
:
Mutual Information-driven Triple Interaction Network for Efficient Image Dehazing. 7-16 - Yang Jiao
, Zequn Jie
, Jingjing Chen
, Lin Ma
, Yu-Gang Jiang
:
Suspected Objects Matter: Rethinking Model's Prediction for One-stage Visual Grounding. 17-26 - Sophyani Banaamwini Yussif
, Ning Xie
, Yang Yang
, Heng Tao Shen
:
Self-Relational Graph Convolution Network for Skeleton-Based Action Recognition. 27-36 - Qian Ning
, Fangfang Wu
, Weisheng Dong
, Xin Li
, Guangming Shi
:
Exploring Correlations in Degraded Spatial Identity Features for Blind Face Restoration. 37-45 - Chuhao Zhou
, Jinxing Li
, Huafeng Li
, Guangming Lu
, Yong Xu
, Min Zhang
:
Video-based Visible-Infrared Person Re-Identification via Style Disturbance Defense and Dual Interaction. 46-55 - Wenmiao Hu
, Yichen Zhang
, Yuxuan Liang
, Xianjing Han
, Yifang Yin
, Hannes Kruppa
, See-Kiong Ng
, Roger Zimmermann
:
PetalView: Fine-grained Location and Orientation Extraction of Street-view Images via Cross-view Local Search. 56-66 - Haorui Wang
, Yibo Hu
, Yangfu Zhu
, Jinsheng Qi
, Bin Wu
:
Shifted GCN-GAT and Cumulative-Transformer based Social Relation Recognition for Long Videos. 67-76 - Jilong Wang
, Saihui Hou
, Yan Huang
, Chunshui Cao
, Xu Liu
, Yongzhen Huang
, Liang Wang
:
Causal Intervention for Sparse-View Gait Recognition. 77-85 - Digbalay Bose
, Rajat Hebbar
, Tiantian Feng
, Krishna Somandepalli
, Anfeng Xu
, Shrikanth Narayanan
:
MM-AU: Towards Multimodal Understanding of Advertisement Videos. 86-95 - Huiwei Lin
, Shanshan Feng
, Baoquan Zhang
, Hongliang Qiao
, Xutao Li
, Yunming Ye
:
UER: A Heuristic Bias Addressing Approach for Online Continual Learning. 96-104 - Peng Wu
, Xiankai Lu
, Jianbing Shen
, Yilong Yin
:
Clip Fusion with Bi-level Optimization for Human Mesh Reconstruction from Monocular Videos. 105-115 - Jinkai Zheng
, Xinchen Liu
, Shuai Wang
, Lihao Wang
, Chenggang Yan
, Wu Liu
:
Parsing is All You Need for Accurate Gait Recognition in the Wild. 116-124 - Dingyi Zhang
, Yingming Li
, Zhongfei Zhang
:
Multi-Scale Similarity Aggregation for Dynamic Metric Learning. 125-134 - Yue Feng
, Zhengye Zhang
, Rong Quan
, Limin Wang
, Jie Qin
:
RefineTAD: Learning Proposal-free Refinement for Temporal Action Detection. 135-143 - Zhenguang Liu
, Xinyang Yu
, Ruili Wang
, Shuai Ye
, Zhe Ma
, Jianfeng Dong
, Sifeng He
, Feng Qian
, Xiaobo Zhang
, Roger Zimmermann
, Lei Yang
:
Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization. 144-152 - Dongbao Yang
, Yu Zhou
, Xiaopeng Hong
, Aoting Zhang
, Xin Wei
, Linchengxi Zeng
, Zhi Qiao
, Weiping Wang
:
Pseudo Object Replay and Mining for Incremental Object Detection. 153-162 - Shiqin Wang
, Xin Xu
, Xianzheng Ma
, Kui Jiang
, Zheng Wang
:
Informative Classes Matter: Towards Unsupervised Domain Adaptive Nighttime Semantic Segmentation. 163-172 - Ye Tian
, Mengyu Yang
, Lanshan Zhang
, Zhizhen Zhang
, Yang Liu
, Xiaohui Xie
, Xirong Que
, Wendong Wang
:
View while Moving: Efficient Video Recognition in Long-untrimmed Videos. 173-183 - Yimin Deng
, Huaizhen Tang
, Xulong Zhang
, Jianzong Wang
, Ning Cheng
, Jing Xiao
:
PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion. 184-192 - Gege Shi
, Xueyang Fu
, Chengzhi Cao
, Zheng-Jun Zha
:
Alleviating Spatial Misalignment and Motion Interference for UAV-based Video Recognition. 193-202 - Yang Liu
, Zhaoyang Xia
, Mengyang Zhao
, Donglai Wei
, Yuzheng Wang
, Siao Liu
, Bobo Ju
, Gaoyun Fang
, Jing Liu
, Liang Song
:
Learning Causality-inspired Representation Consistency for Video Anomaly Detection. 203-212 - Dongyue Guo
, Yi Lin
, Xuehang You
, Zhongping Yang
, Jizhe Zhou
, Bo Yang
, Jianwei Zhang
, Han Shi
, Shasha Hu
, Zheng Zhang
:
M2ATS: A Real-world Multimodal Air Traffic Situation Benchmark Dataset and Beyond. 213-221 - Jianghu Lu
, Shikun Li
, Kexin Bao
, Pengju Wang
, Zhenxing Qian
, Shiming Ge
:
Federated Learning with Label-Masking Distillation. 222-232 - Lingxiao Lu
, Jiangtong Li
, Junyan Cao
, Li Niu
, Liqing Zhang
:
Painterly Image Harmonization using Diffusion Model. 233-241 - Xingran Xie
, Ting Jin
, Boxiang Yun
, Qingli Li
, Yan Wang
:
Exploring Hyperspectral Histopathology Image Segmentation from a Deformable Perspective. 242-251 - Runhua Jiang
, Yahong Han
:
Uncertainty-Aware Variate Decomposition for Self-supervised Blind Image Deblurring. 252-260
Oral Session II: Understanding Multimedia Content -- Multimodal Fusion and Embedding
- Chao Sun
, Min Chen
, Jialiang Cheng
, Han Liang
, Chuanbo Zhu
, Jincai Chen
:
SCLAV: Supervised Cross-modal Contrastive Learning for Audio-Visual Coding. 261-270 - Feng Lin
, Kaiqiang Fu
, Hao Luo
, Ziyue Zhan
, Zhibo Wang
, Zhenguang Liu
, Lorenzo Cavallaro
, Kui Ren
:
Cross-Modal and Multi-Attribute Face Recognition: A Benchmark. 271-279 - Ye Wang
, Junyang Chen
, Mengzhu Wang, Hao Li
, Wei Wang, Houcheng Su, Zhihui Lai
, Wei Wang, Zhenghan Chen
:
A Closer Look at Classifier in Adversarial Domain Generalization. 280-289 - Mengzhu Wang
, Jianlong Yuan
, Zhibin Wang
:
Mixture-of-Experts Learner for Single Long-Tailed Domain Generalization. 290-299 - Chao Zhang
, Jingwen Wei
, Bo Wang
, Zechao Li
, Chunlin Chen
, Huaxiong Li
:
Robust Spectral Embedding Completion Based Incomplete Multi-view Clustering. 300-308 - Jinhui Pang
, Zixuan Wang
, Jiliang Tang
, Mingyan Xiao
, Nan Yin
:
SA-GDA: Spectral Augmentation for Graph Domain Adaptation. 309-318 - Xihong Yang
, Cheng Tan
, Yue Liu
, Ke Liang
, Siwei Wang
, Sihang Zhou
, Jun Xia
, Stan Z. Li
, Xinwang Liu
, En Zhu
:
CONVERT: Contrastive Graph Clustering with Reliable Augmentation. 319-327 - Jintian Ji
, Songhe Feng
:
High-order Complementarity Induced Fast Multi-View Clustering with Enhanced Tensor Rank Minimization. 328-336 - Xihong Yang
, Jiaqi Jin
, Siwei Wang
, Ke Liang
, Yue Liu
, Yi Wen
, Suyuan Liu
, Sihang Zhou
, Xinwang Liu
, En Zhu
:
DealMVC: Dual Contrastive Calibration for Multi-view Clustering. 337-346 - Junming Hou
, Qi Cao
, Ran Ran
, Che Liu
, Junling Li
, Liang-Jian Deng
:
Bidomain Modeling Paradigm for Pansharpening. 347-357 - Yingying Wang
, Yunlong Lin
, Ge Meng
, Zhenqi Fu
, Yuhang Dong
, Linyu Fan
, Hedeng Yu
, Xinghao Ding
, Yue Huang:
Learning High-frequency Feature Enhancement and Alignment for Pan-sharpening. 358-367 - Xingfeng Li
, Yinghui Sun
, Quansen Sun
, Jia Dai
, Zhenwen Ren
:
Distribution Consistency based Fast Anchor Imputation for Incomplete Multi-view Clustering. 368-376 - Yushen Wei
, Yang Liu
, Hong Yan
, Guanbin Li
, Liang Lin
:
Visual Causal Scene Refinement for Video Question Answering. 377-386 - Hongye Liu
, Xianhai Xie
, Yang Gao
, Zhou Yu
:
Parameter-Efficient Transfer Learning for Audio-Visual-Language Tasks. 387-396 - Xi Chen
, Yun Xiong
, Siqi Wang
, Haofen Wang
, Tao Sheng
, Yao Zhang
, Yu Ye
:
ReCo: A Dataset for Residential Community Layout Planning. 397-405 - Runmin Cong
, Hongyu Liu
, Chen Zhang
, Wei Zhang
, Feng Zheng
, Ran Song
, Sam Kwong
:
Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection. 406-416 - Jinrong Cui
, Yuting Li
, Yulu Fu
, Jie Wen
:
Multi-view Self-Expressive Subspace Clustering Network. 417-425 - Jian Huang
, Yanli Ji
, Yang Yang
, Heng Tao Shen
:
Cross-modality Representation Interactive Learning for Multimodal Sentiment Analysis. 426-434 - Yixuan Ma
, Xiaolin Zhang
, Peng Zhang
, Kun Zhan
:
Entropy Neural Estimation for Graph Contrastive Learning. 435-443 - Liguo Zhang
, Zilin Tian
, Yunfei Long
, Sizhao Li
, Guisheng Yin
:
Cross-modal and Cross-medium Adversarial Attack for Audio. 444-453 - Liang Peng
, Xin Wang
, Xiaofeng Zhu
:
Unsupervised Multiplex Graph learning with Complementary and Consistent Information. 454-462 - Yixuan Wu
, Jintai Chen
, Jiahuan Yan
, Yiheng Zhu
, Danny Z. Chen
, Jian Wu
:
GCL: Gradient-Guided Contrastive Learning for Medical Image Segmentation with Multi-Perspective Meta Labels. 463-471 - Zhiying Jiang
, Zengxi Zhang
, Jinyuan Liu
, Xin Fan
, Risheng Liu
:
Multi-Spectral Image Stitching via Spatial Graph Reasoning. 472-480 - Jiaming Zhuo
, Can Cui
, Kun Fu
, Bingxin Niu
, Dongxiao He
, Yuanfang Guo
, Zhen Wang
, Chuan Wang
, Xiaochun Cao
, Liang Yang
:
Propagation is All You Need: A New Framework for Representation Learning and Classifier Training on Graphs. 481-489 - Yao Wu
, Mingwei Xing, Yachao Zhang
, Yuan Xie
, Jianping Fan
, Zhongchao Shi
, Yanyun Qu
:
Cross-modal Unsupervised Domain Adaptation for 3D Semantic Segmentation via Bidirectional Fusion-then-Distillation. 490-498
Oral Session III: Understanding Multimedia Content -- Vision and Language
- Yinjie Zhao
, Lichen Zhao
, Qian Yu
, Lu Sheng
, Jing Zhang
, Dong Xu
:
Distortion-aware Transformer in 360° Salient Object Detection. 499-508 - Zixiao Wang
, Hongtao Xie
, Yuxin Wang
, Jianjun Xu
, Boqiang Zhang
, Yongdong Zhang
:
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition. 509-518 - Bo Zou
, Chao Yang
, Chengbin Quan
, Youjian Zhao
:
SpaceCLIP: A Vision-Language Pretraining Framework With Spatial Reconstruction On Text. 519-528 - Xu Huang
, Jin Liu
, Zhizhong Zhang
, Yuan Xie
:
Improving Cross-Modal Recipe Retrieval with Component-Aware Prompted CLIP Embedding. 529-537 - Shuhan Kong
, Liang Li
, Beichen Zhang
, Wenyu Wang
, Bin Jiang
, Chenggang Yan
, Changhao Xu
:
Dynamic Contrastive Learning with Pseudo-samples Intervention for Weakly Supervised Joint Video MR and HD. 538-546 - Zheng Yuan, Qiao Jin, Chuanqi Tan, Zhengyun Zhao, Hongyi Yuan, Fei Huang, Songfang Huang:
RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training. 547-556 - Xiao Wang
, Yaoyu Li
, Tian Gan
, Zheng Zhang
, Jingjing Lv
, Liqiang Nie
:
RTQ: Rethinking Video-language Understanding Based on Image-text Model. 557-566 - Shanshan Zhong
, Zhongzhan Huang
, Wushao Wen
, Jinghui Qin
, Liang Lin
:
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models. 567-578 - Xin Dong
, Rui Wang
, Siyuan Liang
, Aishan Liu
, Lihua Jing
:
Face Encryption via Frequency-Restricted Identity-Agnostic Attacks. 579-588 - Peipei Song
, Dan Guo
, Xun Yang
, Shengeng Tang
, Erkun Yang
, Meng Wang
:
Emotion-Prior Awareness Network for Emotional Video Captioning. 589-600 - Dong Liu
, Qirong Mao
, Lijian Gao
, Qinghua Ren
, Zhenghan Chen
, Ming Dong
:
TE-KWS: Text-Informed Speech Enhancement for Noise-Robust Keyword Spotting. 601-610 - Jiancheng Pan
, Qing Ma
, Cong Bai
:
A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval. 611-620 - Nirmalendu Prakash
, Han Wang
, Nguyen-Khoi Hoang
, Ming Shan Hee
, Roy Ka-Wei Lee
:
PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using Large Language Models. 621-631 - Yue Lv, Jinxi Xiang, Jun Zhang, Wenming Yang, Xiao Han
, Wei Yang:
Dynamic Low-Rank Instance Adaptation for Universal Neural Image Compression. 632-642 - Leigang Qu
, Shengqiong Wu
, Hao Fei
, Liqiang Nie
, Tat-Seng Chua
:
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation. 643-654 - Yue Zhang
, Suchen Wang
, Shichao Kan
, Zhenyu Weng
, Yigang Cen
, Yap-Peng Tan
:
POAR: Towards Open Vocabulary Pedestrian Attribute Recognition. 655-665 - Shengshan Hu
, Wei Liu
, Minghui Li
, Yechao Zhang
, Xiaogeng Liu
, Xianlong Wang
, Leo Yu Zhang
, Junhui Hou
:
PointCRT: Detecting Backdoor in 3D Point Cloud via Corruption Robustness. 666-675 - Rui Qin
, Ming Sun
, Fangyuan Zhang
, Xing Wen
, Bin Wang
:
Blind Image Super-resolution with Rich Texture-Aware Codebook. 676-687 - Zizhang Wu
, Zhuozheng Li
, Zhi-Gang Fan
, Yunzhe Wu
, Jian Pu
, Xianzhi Li
:
V2Depth: Monocular Depth Estimation via Feature-Level Virtual-View Simulation and Refinement. 688-697 - Kai Chen
, Zhipeng Wei
, Jingjing Chen
, Zuxuan Wu
, Yu-Gang Jiang
:
GCMA: Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos. 698-708 - Lianyu Hu
, Liqing Gao
, Zekang Liu
, Chi-Man Pun
, Wei Feng
:
AdaBrowse: Adaptive Video Browser for Efficient Continuous Sign Language Recognition. 709-718 - Lingfeng Li
, Gangming Zhao
, Yizhou Yu
, Jinpeng Li
:
Dynamic Triple Reweighting Network for Automatic Femoral Head Necrosis Diagnosis from Computed Tomography. 719-727 - Liu Liu, Jianming Du
, Hao Wu
, Xun Yang
, Zhenguang Liu
, Richang Hong
, Meng Wang
:
Category-Level Articulated Object 9D Pose Estimation via Reinforcement Learning. 728-736 - Qichao Ying
, Jiaxin Liu
, Sheng Li
, Haisheng Xu
, Zhenxing Qian
, Xinpeng Zhang
:
RetouchingFFHQ: A Large-scale Dataset for Fine-grained Face Retouching Detection. 737-746 - Xueyi Zhang
, Chengwei Zhang
, Tao Wang
, Jun Tang
, Songyang Lao
, Haizhou Li
:
Slow-Fast Time Parameter Aggregation Network for Class-Incremental Lip Reading. 747-756 - Yang Bai
, Jingyao Wang
, Min Cao
, Chen Chen
, Ziqiang Cao
, Liqiang Nie
, Min Zhang
:
Text-based Person Search without Parallel Image-Text Data. 757-767 - Jiawei Liang
, Siyuan Liang
, Aishan Liu
, Ke Ma
, Jingzhi Li
, Xiaochun Cao
:
Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation. 768-778 - Sun'ao Liu
, Yiheng Zhang
, Zhaofan Qiu
, Hongtao Xie
, Yongdong Zhang
, Ting Yao
:
CARIS: Context-Aware Referring Image Segmentation. 779-788 - Shizhou Zhang
, Qingchun Yang
, De Cheng
, Yinghui Xing
, Guoqiang Liang
, Peng Wang
, Yanning Zhang
:
Ground-to-Aerial Person Search: Benchmark Dataset and Approach. 789-799 - Fan Jiang
, Zilei Wang
:
Sparse Sharing Relation Network for Panoptic Driving Perception. 800-808
Oral Session IV: Engaging Users with Multimedia -- Emotional and Social Signals
- Daoming Zong
, Chaoyue Ding
, Baoxiang Li
, Jiakui Li
, Ken Zheng
, Qunyan Zhou
:
AcFormer: An Aligned and Compact Transformer for Multimodal Sentiment Analysis. 833-842 - Zeng Tao
, Yan Wang
, Zhaoyu Chen
, Boyang Wang
, Shaoqi Yan
, Kaixun Jiang
, Shuyong Gao
, Wenqiang Zhang
:
Freq-HD: An Interpretable Frequency-based High-Dynamics Affective Clip Selection Method for in-the-Wild Facial Expression Recognition in Videos. 843-852 - Peiguang Jing
, Xianyi Liu
, Ji Wang
, Yinwei Wei
, Liqiang Nie
, Yuting Su
:
StyleEDL: Style-Guided High-order Attention Network for Image Emotion Distribution Learning. 853-861 - Junjie Zhu
, Bingjun Luo
, Ao Sun
, Jinghang Tan
, Xibin Zhao
, Yue Gao
:
Variance-Aware Bi-Attention Expression Transformer for Open-Set Facial Expression Recognition in the Wild. 862-870 - Zixin Zhang
, Fan Qi
, Shuai Li
, Changsheng Xu
:
AffectFAL: Federated Active Affective Computing with Non-IID Data. 871-882 - Peiliang Gong
, Ziyu Jia
, Pengpai Wang
, Yueying Zhou
, Daoqiang Zhang
:
ASTDF-Net: Attention-Based Spatial-Temporal Dual-Stream Fusion Network for EEG-Based Emotion Recognition. 883-892
Oral Session V: Engaging Users with Multimedia -- Multimedia Search and Recommendation
- Yishu Liu
, Qingpeng Wu
, Zheng Zhang
, Jingyi Zhang
, Guangming Lu
:
Multi-Granularity Interactive Transformer Hashing for Cross-modal Retrieval. 893-902 - Wenjie Wang
, Xinyu Lin
, Liuhui Wang
, Fuli Feng
, Yinwei Wei
, Tat-Seng Chua
:
Equivariant Learning for Out-of-Distribution Cold-start Recommendation. 903-914 - Haokun Wen
, Xian Zhang
, Xuemeng Song
, Yinwei Wei
, Liqiang Nie
:
Target-Guided Composed Image Retrieval. 915-923 - Haoxuan Li
, Yi Bin
, Junrong Liao
, Yang Yang
, Heng Tao Shen
:
Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination. 924-934 - Xin Zhou
, Zhiqi Shen
:
A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation. 935-943 - Guiwei Zhang
, Yongfei Zhang
, Zichang Tan
:
ProtoHPE: Prototype-guided High-frequency Patch Enhancement for Visible-Infrared Person Re-identification. 944-954 - Wei Ji
, Xiangyan Liu
, An Zhang
, Yinwei Wei
, Yongxin Ni
, Xiang Wang
:
Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation. 955-965 - Junyang Chen
, Jialong Wang
, Zhijiang Dai
, Huisi Wu
, Mengzhu Wang
, Qin Zhang
, Huan Wang
:
Zero-shot Micro-video Classification with Neural Variational Inference in Graph Prototype Network. 966-974 - Zhiguo Chen
, Xun Jiang
, Xing Xu
, Zuo Cao
, Yijun Mo
, Heng Tao Shen
:
Joint Searching and Grounding: Multi-Granularity Video Content Retrieval. 975-983 - Yuyuan Li
, Chaochao Chen
, Xiaolin Zheng
, Yizhao Zhang
, Zhongxuan Han
, Dan Meng
, Jun Wang
:
Making Users Indistinguishable: Attribute-wise Unlearning in Recommender Systems. 984-994 - Dugang Liu
, Yang Qiao
, Xing Tang
, Liang Chen
, Xiuqiang He
, Zhong Ming
:
Prior-Guided Accuracy-Bias Tradeoff Learning for CTR Prediction in Multimedia Recommendation. 995-1003 - Haoyue Bai
, Min Hou
, Le Wu
, Yonghui Yang
, Kun Zhang
, Richang Hong
, Meng Wang
:
GoRec: A Generative Cold-start Recommendation Framework. 1004-1012 - Jingzhi Li
, Fengling Li
, Lei Zhu
, Hui Cui
, Jingjing Li:
Prototype-guided Knowledge Transfer for Federated Unsupervised Cross-modal Hashing. 1013-1022
Oral Session VI: Engaging Users with Multimedia -- Interactions and Quality of Experience
- Shuai He
, Anlong Ming
, Shuntian Zheng
, Haobin Zhong
, Huadong Ma
:
EAT: An Enhancer for Aesthetics-Oriented Transformers. 1023-1032 - Sicheng Yang
, Zilin Wang
, Zhiyong Wu
, Minglei Li
, Zhensong Zhang
, Qiaochu Huang
, Lei Hao
, Songcen Xu
, Xiaofei Wu
, Changpeng Yang
, Zonghong Dai
:
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons. 1033-1044 - Haoning Wu
, Erli Zhang
, Liang Liao
, Chaofeng Chen
, Jingwen Hou
, Annan Wang
, Wenxiu Sun
, Qiong Yan
, Weisi Lin
:
Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach. 1045-1054 - Guangming Zhu
, Siyuan Wang
, Qing Cheng
, Kelong Wu
, Hao Li
, Liang Zhang
:
Sketch Input Method Editor: A Comprehensive Dataset and Methodology for Systematic Input Recognition. 1055-1065 - Tengchuan Kou
, Xiaohong Liu
, Wei Sun
, Jun Jia
, Xiongkuo Min
, Guangtao Zhai
, Ning Liu
:
StableVQA: A Deep No-Reference Quality Assessment Model for Video Stability. 1066-1076 - Jianjun Xiang
, Yuanjie Dang
, Peng Chen
, Ronghua Liang
, Ruohong Huan
, Zhengyu Zhang
:
Spatial-angular Quality-aware Representation Learning for Blind Light Field Image Quality Assessment. 1077-1087 - Yunlong Dong
, Xiaohong Liu
, Yixuan Gao
, Xunchu Zhou, Tao Tan
, Guangtao Zhai
:
Light-VQA: A Multi-Dimensional Quality Assessment Model for Low-Light Video Enhancement. 1088-1097 - Kun Yuan
, Zishang Kong
, Chuanchuan Zheng
, Ming Sun
, Xing Wen
:
Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment. 1098-1107 - Kaiyuan Hu
, Haowen Yang
, Yili Jin
, Junhua Liu
, Yongting Chen
, Miao Zhang
, Fangxin Wang
:
Understanding User Behavior in Volumetric Video Watching: Dataset, Analysis and Prediction. 1108-1116 - Xiangfei Sheng
, Leida Li
, Pengfei Chen
, Jinjian Wu
, Weisheng Dong
, Yuzhe Yang
, Liwu Xu
, Yaqian Li
, Guangming Shi
:
AesCLIP: Multi-Attribute Contrastive Learning for Image Aesthetics Assessment. 1117-1126
Oral Session VII: Engaging Users with Multimedia -- Metaverse, Art and Culture
- Zheng Wei
, Xian Xu
, Lik-Hang Lee
, Wai Tong
, Huamin Qu
, Pan Hui
:
Feeling Present! From Physical to Virtual Cinematography Lighting Education with Metashadow. 1127-1136 - Shao-Kui Zhang
, Jia-Hong Liu
, Yike Li
, Tianyi Xiong
, Ke-Xin Ren
, Hongbo Fu
, Song-Hai Zhang
:
Automatic Generation of Commercial Scenes. 1137-1147 - Yang Chen
, Yingwei Pan
, Yehao Li
, Ting Yao
, Tao Mei
:
Control3D: Towards Controllable Text-to-3D Generation. 1148-1156 - Yuqing Zhang
, Zhou Fang
, Xinyu Yang
, Shengyu Zhang
, Baoyi He
, Huaiyong Dou
, Junchi Yan
, Yongquan Zhang
, Fei Wu
:
Reconnecting the Broken Civilization: Patchwork Integration of Fragments from Ancient Manuscripts. 1157-1166
Oral Session VIII: Engaging Users with Multimedia -- Multimedia Applications
- Zixin Wang
, Yadan Luo
, Zhi Chen
, Sen Wang
, Zi Huang
:
Cal-SFDA: Source-Free Domain-adaptive Semantic Segmentation with Differentiable Expected Calibration Error. 1167-1178 - Runmin Cong
, Mengyao Sun
, Sanyi Zhang
, Xiaofei Zhou
, Wei Zhang
, Yao Zhao
:
Frequency Perception Network for Camouflaged Object Detection. 1179-1189 - Xiaoshuai Wu
, Xin Liao
, Bo Ou
:
SepMark: Deep Separable Watermarking for Unified Source Tracing and Deepfake Detection. 1190-1201 - Runmin Cong
, Yuchen Guan
, Jinpeng Chen
, Wei Zhang
, Yao Zhao
, Sam Kwong
:
SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection. 1202-1211 - Hao Tan
, Weichao Kong
, Feng Zhang
, Wenjin Qin
, Jianjun Wang
:
High-Order Tensor Recovery Coupling Multilayer Subspace Priori with Application in Video Restoration. 1212-1220 - Chen Wang
, Jiadai Sun
, Lina Liu
, Chenming Wu
, Zhelun Shen
, Dayan Wu
, Yuchao Dai
, Liangjun Zhang
:
Digging into Depth Priors for Outdoor Neural Radiance Fields. 1221-1230 - Fanrui Zhang
, Jiawei Liu
, Qiang Zhang
, Esther Sun
, Jingyi Xie
, Zheng-Jun Zha
:
ECENet: Explainable and Context-Enhanced Network for Muti-modal Fact verification. 1231-1240 - Baochen Xiong
, Xiaoshan Yang
, Yaguang Song
, Yaowei Wang
, Changsheng Xu
:
Client-Adaptive Cross-Model Reconstruction Network for Modality-Incomplete Multimodal Federated Learning. 1241-1249 - Jinpeng Lin
, Min Zhou
, Ye Ma
, Yifan Gao
, Chenxi Fei
, Yangjian Chen
, Zhang Yu
, Tiezheng Ge
:
AutoPoster: A Highly Automatic and Content-aware Design System for Advertising Poster Generation. 1250-1260 - Gangyan Zeng
, Yuan Zhang
, Yu Zhou
, Bo Fang
, Guoqing Zhao
, Xin Wei
, Weiping Wang
:
Filling in the Blank: Rationale-Augmented Prompt Tuning for TextVQA. 1261-1272 - Liuhan Chen
, Yirou Wang
, Yongyong Chen
:
End-to-end XY Separation for Single Image Blind Deblurring. 1273-1282 - Junxian Chen
, Ying Liu
, Yiqi Liang
, Dandan Long
, Xiaolin He
, Ruihui Li
:
SD-Net: Spatially-Disentangled Point Cloud Completion Network. 1283-1293 - Jiawei Jiang
, Yuchao Feng
, Jiacheng Chen
, Dongyan Guo
, Jianwei Zheng
:
Latent-space Unfolding for MRI Reconstruction. 1294-1302 - Hongpeng Lin
, Ludan Ruan
, Wenke Xia
, Peiyu Liu
, Jingyuan Wen
, Yixin Xu
, Di Hu
, Ruihua Song
, Wayne Xin Zhao
, Qin Jin
, Zhiwu Lu
:
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World. 1303-1313 - Pengteng Li
, Ying He
, F. Richard Yu
, Pinhao Song
, Dongfu Yin
, Guang Zhou
:
IGG: Improved Graph Generation for Domain Adaptive Object Detection. 1314-1324 - De Cheng
, Lingfeng He
, Nannan Wang
, Shizhou Zhang
, Zhen Wang
, Xinbo Gao
:
Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID. 1325-1333 - Xun Jiang
, Zailei Zhou
, Xing Xu
, Yang Yang
, Guoqing Wang
, Heng Tao Shen
:
Faster Video Moment Retrieval with Point-Level Supervision. 1334-1342 - Xianliang Huang
, Jiajie Gou
, Shuhang Chen
, Zhizhou Zhong
, Jihong Guan
, Shuigeng Zhou
:
IDDR-NGP: Incorporating Detectors for Distractors Removal with Instant Neural Radiance Field. 1343-1351 - Junzhe Zhang
, Tong Chen
, Dandan Ding
, Zhan Ma
:
G-PCC++: Enhanced Geometry-based Point Cloud Compression. 1352-1363 - Zhengcong Fei
, Mingyuan Fan
, Junshi Huang
:
Gradient-Free Textual Inversion. 1364-1373 - Qiaosong Qi
, Le Zhuo
, Aixi Zhang
, Yue Liao
, Fei Fang
, Si Liu
, Shuicheng Yan
:
DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation. 1374-1382 - Peihuan Huang
, Gaofeng Cao
, Fei Zhou
, Guoping Qiu
:
Video Inverse Tone Mapping Network with Luma and Chroma Mapping. 1383-1391 - Qi Jia
, Xiaomei Feng
, Yu Liu
, Xin Fan
, Longin Jan Latecki
:
Learning Pixel-wise Alignment for Unsupervised Image Stitching. 1392-1400 - Han Yan
, Haijun Zhang
, Xiangyu Mu
, Jicong Fan
, Zhao Zhang
:
FashionDiff: A Controllable Diffusion Model Using Pairwise Fashion Elements for Intelligent Design. 1401-1411 - Wei Yu
, Qi Zhu
, Naishan Zheng
, Jie Huang
, Man Zhou
, Feng Zhao
:
Learning Non-Uniform-Sampling for Ultra-High-Definition Image Enhancement. 1412-1421 - Haoxing Chen
, Zhangxuan Gu
, Yaohui Li
, Jun Lan
, Changhua Meng
, Weiqiang Wang
, Huaxiong Li
:
Hierarchical Dynamic Image Harmonization. 1422-1430 - Sha Guo
, Zhuo Chen
, Yang Zhao
, Ning Zhang
, Xiaotong Li
, Lingyu Duan
:
Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach. 1431-1442 - Kaixun Jiang
, Zhaoyu Chen
, Xinyu Zhou
, Jingyu Zhang
, Lingyi Hong
, Jiafeng Wang
, Bo Li
, Yan Wang
, Wenqiang Zhang
:
Towards Decision-based Sparse Attacks on Video Recognition. 1443-1454 - Mingqi Fang
, Lingyun Yu
, Hongtao Xie
, Junqiang Wu
, Zezheng Wang
, Jiahong Li
, Yongdong Zhang
:
RAIRNet: Region-Aware Identity Rectification for Face Forgery Detection. 1455-1464 - Xiao He
, Chang Tang
, Xin Zou
, Wei Zhang
:
Multispectral Object Detection via Cross-Modal Conflict-Aware Learning. 1465-1474 - Huan Zheng
, Zhao Zhang
, Jicong Fan
, Richang Hong
, Yi Yang, Shuicheng Yan
:
Decoupled Cross-Scale Cross-View Interaction for Stereo Image Enhancement in the Dark. 1475-1484 - Kexin Li
, Zongxin Yang
, Lei Chen
, Yi Yang, Jun Xiao
:
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation. 1485-1494 - Zisong Chen
, Chunyu Lin
, Lang Nie
, Zhijie Shen
, Kang Liao
, Yuanzhouhan Cao
, Yao Zhao
:
S-OmniMVS: Incorporating Sphere Geometry into Omnidirectional Stereo Matching. 1495-1503 - Yichen Zhang
, Yifang Yin
, Ying Zhang
, Zhenguang Liu
, Zheng Wang
, Roger Zimmermann
:
Prototypical Cross-domain Knowledge Transfer for Cervical Dysplasia Visual Inspection. 1504-1514 - Yuchen Sun
, Qianqian Xu
, Zitai Wang
, Qingming Huang
:
When Measures are Unreliable: Imperceptible Adversarial Perturbations toward Top-k Multi-Label Learning. 1515-1526 - Bowei Xu
, Hao Chen
, Zhan Ma
:
Karma: Adaptive Video Streaming via Causal Sequence Modeling. 1527-1535 - Xinting Liao
, Chaochao Chen
, Weiming Liu
, Pengyang Zhou
, Huabin Zhu
, Shuheng Shen
, Weiqiang Wang
, Mengling Hu
, Yanchao Tan
, Xiaolin Zheng
:
Joint Local Relational Augmentation and Global Nash Equilibrium for Federated Learning with Non-IID Data. 1536-1545 - Jin Wang
, Jiade Chen
, Yunhui Shi
, Nam Ling
, Baocai Yin
:
SSPU-Net: A Structure Sensitive Point Cloud Upsampling Network with Multi-Scale Spatial Refinement. 1546-1555 - Haoyue Wang
, Sheng Li
, Silu Cao
, Rui Yang
, Jishen Zeng
, Zhenxing Qian
, Xinpeng Zhang
:
On Physically Occluded Fake Identity Document Detection. 1556-1564 - Deqi Li
, Shi-Sheng Huang
, Tianyu Shen
, Hua Huang
:
Dynamic View Synthesis with Spatio-Temporal Feature Warping from Sparse Views. 1565-1576
Oral Session IX: Engaging Users with Multimedia -- Social-good, Fairness and Transparency
- Shengfang Zhai
, Yinpeng Dong
, Qingni Shen
, Shi Pu
, Yuejian Fang
, Hang Su
:
Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning. 1577-1587 - Jingxuan Tan
, Nan Zhong
, Zhenxing Qian
, Xinpeng Zhang
, Sheng Li
:
Deep Neural Network Watermarking against Model Extraction Attack. 1588-1597 - Yu Bai
, Bo Zhang
, Zheng Zhang
, Wu Liu
, Jinwen Li
, Xiangyang Gong
, Wendong Wang
:
CoCa: A Connectivity-Aware Cascade Framework for Histology Gland Segmentation. 1598-1606 - Bo Zhang
, Yunpeng Tan
, Zheng Zhang
, Wu Liu
, Hui Gao
, Zhijun Xi
, Wendong Wang
:
Factorized Omnidirectional Representation based Vision GNN for Anisotropic 3D Multimodal MR Image Segmentation. 1607-1615 - Rui Hu
, Yahan Tu
, Jitao Sang
:
Echoes: Unsupervised Debiasing via Pseudo-bias Labeling in an Echo Chamber. 1616-1624 - Luxin Cai
, Naiyue Chen
, Yuanzhouhan Cao
, Jiahuan He
, Yidong Li
:
FedCE: Personalized Federated Learning Method based on Clustering Ensembles. 1625-1633
Oral Session X: Multimedia systems -- Data Systems Management and Indexing
- Naoki Ono
, Yusuke Matsui
:
Relative NN-Descent: A Fast Index Construction for Graph-Based Approximate Nearest Neighbor Search. 1659-1667 - Cheng Xiong
, Chuan Qin
, Guorui Feng
, Xinpeng Zhang
:
Flexible and Secure Watermarking for Latent Diffusion Model. 1668-1676 - Rukai Wei
, Yu Liu
, Jingkuan Song
, Heng Cui
, Yanzhao Xie
, Ke Zhou
:
CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing. 1677-1688
Oral Session XI: Multimedia systems -- Systems and Middleware, Transport and Delivery
- Rui Lu
, Lai Wei
, Shuntao Zhu
, Chuang Hu
, Dan Wang
:
Pagoda: Privacy Protection for Volumetric Video Streaming through Poisson Diffusion Model. 1689-1697 - Yuyang Leng
, Renyuan Liu
, Hongpeng Guo
, Songqing Chen
, Shuochao Yao
:
ScaleFlow: Efficient Deep Vision Pipeline with Closed-Loop Scale-Adaptive Inference. 1698-1706 - Tianchi Huang
, Rui-Xiao Zhang
, Chenglei Wu
, Lifeng Sun
:
Optimizing Adaptive Video Streaming with Human Feedback. 1707-1718
Poster Session I: Understanding Multimedia Content -- Media Interpretation
- Hao Tang
, Jun Liu
, Shuanglin Yan
, Rui Yan
, Zechao Li
, Jinhui Tang
:
M3Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition. 1719-1728 - Chen Cheng
, Jingkuan Song
, Xiaosu Zhu
, Junchen Zhu
, Lianli Gao
, Hengtao Shen
:
CUCL: Codebook for Unsupervised Continual Learning. 1729-1737 - Yang Liu
, Chen Chen
, Can Wang
, Xulin King
, Mengyuan Liu
:
Regress Before Construct: Regress Autoencoder for Point Cloud Self-supervised Learning. 1738-1749 - Bo Wang
, Zhao Zhang
, Suiyi Zhao
, Haijun Zhang
, Richang Hong
, Meng Wang
:
CropCap: Embedding Visual Cross-Partition Dependency for Image Captioning. 1750-1758 - Yanqi Wu
, Xue Song
, Jingjing Chen
, Yu-Gang Jiang
:
Generalizing Face Forgery Detection via Uncertainty Learning. 1759-1767 - Bingqing Zhang
, Sen Wang
, Yifan Liu
, Brano Kusy
, Xue Li
, Jiajun Liu
:
Object Detection Difficulty: Suppressing Over-aggregation for Faster and Better Video Object Detection. 1768-1778 - Yuanshen Guan
, Ruikang Xu
, Mingde Yao
, Lizhi Wang
, Zhiwei Xiong
:
Mutual-Guided Dynamic Network for Image Fusion. 1779-1788 - Chenxi Xie
, Changqun Xia
, Tianshu Yu
, Jia Li
:
Frequency Representation Integration for Camouflaged Object Detection. 1789-1797 - Tao Wang
, Lei Jin
, Zhang Wang
, Xiaojin Fan
, Yu Cheng
, Yinglei Teng
, Junliang Xing
, Jian Zhao
:
DecenterNet: Bottom-Up Human Pose Estimation Via Decentralized Pose Representation. 1798-1808 - Jingyi Wang
, Can Zhang
, Jinfa Huang
, Botao Ren
, Zhidong Deng
:
Improving Scene Graph Generation with Superpixel-Based Interaction Learning. 1809-1820 - Shifeng Xia
, Lin Geng
, Ningzhong Liu
, Han Sun
, Jie Qin
:
Lifelong Scene Text Recognizer via Expert Modules. 1821-1830 - Zhen Ye
, Wei Xue
, Xu Tan
, Jie Chen
, Qifeng Liu
, Yike Guo
:
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model. 1831-1839 - Runhao Zeng
, Qi Deng
, Huixuan Xu
, Shuaicheng Niu
, Jian Chen
:
Exploring Motion Cues for Video Test-Time Adaptation. 1840-1850 - Yan Shu
, Wei Wang
, Yu Zhou
, Shaohui Liu
, Aoting Zhang
, Dongbao Yang
, Weiping Wang
:
Perceiving Ambiguity and Semantics without Recognition: An Efficient and Effective Ambiguous Scene Text Detector. 1851-1862 - Jiaming Chu
, Lei Jin
, Xiaojin Fan
, Yinglei Teng
, Yunchao Wei
, Yuqiang Fang
, Junliang Xing
, Jian Zhao
:
Single-Stage Multi-human Parsing via Point Sets and Center-based Offsets. 1863-1873 - Chengxiao Sun
, Yan Xu
, Jialun Pei
, Haopeng Fang
, He Tang
:
Partitioned Saliency Ranking with Dense Pyramid Transformers. 1874-1883 - Jianbiao Mei
, Yu Yang
, Mengmeng Wang
, Zizhang Li
, Xiaojun Hou
, Jongwon Ra
, Laijian Li
, Yong Liu
:
CenterLPS: Segment Instances by Centers for LiDAR Panoptic Segmentation. 1884-1894 - Zhenhua Ning
, Zhuotao Tian
, Guangming Lu
, Wenjie Pei
:
Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement. 1895-1904 - Mu Chen
, Zhedong Zheng
, Yi Yang, Tat-Seng Chua
:
PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation. 1905-1914 - Xinyan Zu
, Haiyang Yu
, Bin Li
, Xiangyang Xue
:
Weakly-Supervised Text Instance Segmentation. 1915-1923 - Wenjie Xuan
, Shanshan Zhao
, Yu Yao
, Juhua Liu
, Tongliang Liu
, Yixin Chen
, Bo Du
, Dacheng Tao
:
PNT-Edge: Towards Robust Edge Detection with Noisy Labels by Learning Pixel-level Noise Transitions. 1924-1932 - Pan Gao
, Haoyue Tian
, Jie Qin
:
Video Frame Interpolation with Flow Transformer. 1933-1942 - Xianghao Kong
, Wentao Jiang
, Jinrang Jia
, Yifeng Shi
, Runsheng Xu
, Si Liu
:
DUSA: Decoupled Unsupervised Sim2Real Adaptation for Vehicle-to-Everything Collaborative Perception. 1943-1954 - Ruiqi Zhang
, Jie Chen
, Qiang Wang
:
Explicifying Neural Implicit Fields for Efficient Dynamic Human Avatar Modeling via a Neural Explicit Surface. 1955-1963 - Shili Zhou
, Xuhao Jiang
, Weimin Tan
, Ruian He
, Bo Yan
:
MVFlow: Deep Optical Flow Estimation of Compressed Videos with Motion Vector Prior. 1964-1974 - Ri Cheng
, Xuhao Jiang
, Ruian He
, Shili Zhou
, Weimin Tan
, Bo Yan
:
Uncertainty-Guided Spatial Pruning Architecture for Efficient Frame Interpolation. 1975-1986 - Junshan Hu
, Liansheng Zhuang
, Weisong Dong
, Shiming Ge
, Shafei Wang
:
Learning Generalized Representations for Open-Set Temporal Action Localization. 1987-1996 - Jie Gao
, Bineng Zhong
, Yan Chen
:
Unambiguous Object Tracking by Exploiting Target Cues. 1997-2005 - Keran Wang
, Hongtao Xie
, Yuxin Wang
, Dongming Zhang
, Yadong Qu
, Zuan Gao
, Yongdong Zhang
:
Masked Text Modeling: A Self-Supervised Pre-training Method for Scene Text Detection. 2006-2015 - Jiamin Chen
, Jianlou Si
, Naihao Liu
, Yao Wu
, Li Niu
, Chen Qian
:
Object Part Parsing with Hierarchical Dual Transformer. 2016-2024 - Xugong Qin
, Pengyuan Lyu
, Chengquan Zhang
, Yu Zhou
, Kun Yao
, Peng Zhang
, Hailun Lin
, Weiping Wang
:
Towards Robust Real-Time Scene Text Detection: From Semantic to Instance Representation Learning. 2025-2034 - Xiyao Ma
, Shiqi Liu
, Xiaoliang Xie
, Xiao-Hu Zhou
, Zengguang Hou
, Xinkai Qu
, Wenzheng Han
, Ming Wang
, Meng Song
, Lin-Sen Zhang
:
Towards Flexible and Universal: A Novel Endpoint-based Framework for Vessel Structural Information Extraction. 2035-2044 - Sejin Park
, Taehyung Lee
, Yeejin Lee
, Byeongkeun Kang
:
FDCNet: Feature Drift Compensation Network for Class-Incremental Weakly Supervised Object Localization. 2045-2053 - Meng Shen
, Yanzuo Lu
, Yanxu Hu
, Andy J. Ma
:
Collaborative Learning of Diverse Experts for Source-free Universal Domain Adaptation. 2054-2065 - Wentao Yang
, Zhe Li
, Dezhi Peng
, Lianwen Jin
, Mengchao He
, Cong Yao
:
Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition. 2066-2077 - Kejun Lin
, Zhixiang Wang
, Zheng Wang
, Yinqiang Zheng
, Shin'ichi Satoh
:
Beyond Domain Gap: Exploiting Subjectivity in Sketch-Based Person Retrieval. 2078-2089 - Ben Sha, Baopu Li, Tao Chen, Jiayuan Fan, Tao Sheng:
Rethinking Pseudo-Label-Based Unsupervised Person Re-ID with Hierarchical Prototype-based Graph. 2090-2100 - Kehua Guo
, Rui Ding
, Tian Qiu
, Xiangyuan Zhu
, Zheng Wu
, Liwei Wang
, Hui Fang
:
Single Domain Generalization via Unsupervised Diversity Probe. 2101-2111 - Ruijin Liu
, Ning Lu
, Dapeng Chen
, Cheng Li
, Zejian Yuan
, Wei Peng
:
PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer. 2112-2120 - Houzhang Fang
, Zikai Liao
, Lu Wang
, Qingshan Li
, Yi Chang
, Luxin Yan
, Xuhua Wang
:
DANet: Multi-scale UAV Target Detection with Dynamic Feature Perception and Scale-aware Knowledge Distillation. 2121-2130 - Bo Dong
, Jialun Pei
, Rongrong Gao
, Tian-Zhu Xiang
, Shuo Wang
, Huan Xiong
:
A Unified Query-based Paradigm for Camouflaged Instance Segmentation. 2131-2138 - Jialun Pei
, Zhangjun Zhou
, Yueming Jin
, He Tang
, Pheng-Ann Heng
:
Unite-Divide-Unite: Joint Boosting Trunk and Structure for High-accuracy Dichotomous Image Segmentation. 2139-2147 - Yuxiang Cai
, Meng Xi
, Yongheng Shang
, Jianwei Yin
:
Exploring High-Correlation Source Domain Information for Multi-Source Domain Adaptation in Semantic Segmentation. 2148-2158 - Linfeng Tan
, Jiangtong Li
, Li Niu
, Liqing Zhang
:
Deep Image Harmonization in Dual Color Spaces. 2159-2167 - Wenyu Zhang
, Xin Deng
, Baojun Jia
, Xingtong Yu
, Yifan Chen
, Jin Ma
, Qing Ding
, Xinming Zhang
:
Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution. 2168-2179 - Yanqi Bao
, Yuxin Li
, Jing Huo
, Tianyu Ding
, Xinyue Liang
, Wenbin Li
, Yang Gao
:
Where and How: Mitigating Confusion in Neural Radiance Fields from Sparse Inputs. 2180-2188 - Hang Guo
, Tao Dai
, Mingyan Zhu
, Guanghao Meng
, Bin Chen
, Zhi Wang
, Shu-Tao Xia
:
One-stage Low-resolution Text Recognition with High-resolution Knowledge Transfer. 2189-2198 - Muxin Liao
, Shishun Tian
, Yuhang Zhang
, Guoguang Hua
, Wenbin Zou
, Xia Li
:
Calibration-based Dual Prototypical Contrastive Learning Approach for Domain Generalization Semantic Segmentation. 2199-2210 - Wentian Xin
, Qiguang Miao, Yi Liu, Ruyi Liu, Chi-Man Pun
, Cheng Shi
:
Skeleton MixFormer: Multivariate Topology Representation for Skeleton-based Action Recognition. 2211-2220 - Xiaojie Li
, Shaowei He
, Jianlong Wu
, Yue Yu
, Liqiang Nie
, Min Zhang
:
Mask Again: Masked Knowledge Distillation for Masked Video Modeling. 2221-2232 - Mingxuan Zhang
, Xiao Wu
, Zhaoquan Yuan
, Qi He
, Xiang Huang
:
Human-Object-Object Interaction: Towards Human-Centric Complex Interaction Detection. 2233-2242 - Yilun Zhang
, Yuqian Fu
, Xingjun Ma
, Lizhe Qi
, Jingjing Chen
, Zuxuan Wu
, Yu-Gang Jiang
:
On the Importance of Spatial Relations for Few-shot Action Recognition. 2243-2251 - Jiarui Yu
, Haoran Li
, Yanbin Hao
, Bin Zhu
, Tong Xu
, Xiangnan He
:
CgT-GAN: CLIP-guided Text GAN for Image Captioning. 2252-2263 - Xiaojie Li
, Jianlong Wu
, Shaowei He
, Shuo Kang
, Yue Yu
, Liqiang Nie
, Min Zhang
:
Fine-grained Key-Value Memory Enhanced Predictor for Video Representation Learning. 2264-2274 - Ziyang Gong
, Fuhao Li
, Yupeng Deng
, Wenjun Shen
, Xianzheng Ma
, Zhenming Ji
, Nan Xia
:
Train One, Generalize to All: Generalizable Semantic Segmentation from Single-Scene to All Adverse Scenes. 2275-2284 - Cheng Zhang
, Yu Zhu
, Qingsen Yan
, Jinqiu Sun
, Yanning Zhang
:
All-in-one Multi-degradation Image Restoration Network via Hierarchical Degradation Representation. 2285-2293 - Ziyu Yang
, Sucheng Ren
, Zongwei Wu
, Nanxuan Zhao
, Junle Wang
, Jing Qin
, Shengfeng He
:
NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos. 2294-2304 - Zengbin Wang
, Saihui Hou
, Man Zhang
, Xu Liu
, Chunshui Cao
, Yongzhen Huang
, Shibiao Xu
:
LandmarkGait: Intrinsic Human Parsing for Gait Recognition. 2305-2314 - Wenjia Ren
, Qingmin Liao
, Zhijing Shao
, Xiangru Lin
, Xin Yue
, Yu Zhang
, Zongqing Lu
:
Patchmatch Stereo++: Patchmatch Binocular Stereo with Continuous Disparity Optimization. 2315-2325 - Rui Wang
, Cong Zou
, Weizhong Zhang
, Zixuan Zhu
, Lihua Jing
:
Consistency-aware Feature Learning for Hierarchical Fine-grained Visual Classification. 2326-2334 - Jun Yu
, Peng He
, Ziqi Peng
:
FSR-Net: Deep Fourier Network for Shadow Removal. 2335-2343 - Tianwei Yu
, Peng Chen
, Yuanjie Dang
, Ruohong Huan
, Ronghua Liang
:
Multi-Speed Global Contextual Subspace Matching for Few-Shot Action Recognition. 2344-2352 - Haonan Wang
, Jie Liu
, Jie Tang
, Gangshan Wu
:
Lightweight Super-Resolution Head for Human Pose Estimation. 2353-2361 - Yunkee Chae
, Junghyun Koo
, Sungho Lee
, Kyogu Lee
:
Exploiting Time-Frequency Conformers for Music Audio Enhancement. 2362-2370 - Jiaming Liu
, Yue Wu
, Maoguo Gong
, Qiguang Miao
, Wenping Ma
, Cai Xu
:
Exploring Dual Representations in Large-Scale Point Clouds: A Simple Weakly Supervised Semantic Segmentation Framework. 2371-2380 - Keke Chen
, Xiangbo Shu
, Guo-Sen Xie
, Rui Yan
, Jinhui Tang
:
Foreground/Background-Masked Interaction Learning for Spatio-temporal Action Detection. 2381-2390 - Xin Wang, Benyuan Meng, Hong Chen
, Yuan Meng, Ke Lv, Wenwu Zhu:
TIVA-KG: A Multimodal Knowledge Graph with Text, Image, Video and Audio. 2391-2399 - Wanqing Zhao
, Yuta Nakashima
, Haiyuan Chen
, Noboru Babaguchi
:
Enhancing Fake News Detection in Social Media via Label Propagation on Cross-modal Tweet Graph. 2400-2408 - Xingxing Yang
, Jie Chen
, Zaifeng Yang
:
Cooperative Colorization: Exploring Latent Cross-Domain Priors for NIR Image Spectrum Translation. 2409-2417 - Yihao Huang
, Liangru Sun
, Qing Guo
, Felix Juefei-Xu
, Jiayi Zhu
, Jincao Feng
, Yang Liu
, Geguang Pu
:
ALA: Naturalness-aware Adversarial Lightness Attack. 2418-2426 - Liya Ji
, Chan Ho Park
, Zhefan Rao
, Qifeng Chen
:
Neural Image Popularity Assessment with Retrieval-augmented Transformer. 2427-2436 - Yanchao Liu
, Xina Cheng
, Takeshi Ikenaga
:
A Figure Skating Jumping Dataset for Replay-Guided Action Quality Assessment. 2437-2445 - Yeying Jin
, Beibei Lin
, Wending Yan
, Yuan Yuan
, Wei Ye
, Robby T. Tan
:
Enhancing Visibility in Nighttime Haze Images Using Guided APSF and Gradient Adaptive Convolution. 2446-2457 - Xiang Li
, Yandong Wen
, Muqiao Yang
, Jinglu Wang
, Rita Singh
, Bhiksha Raj
:
Rethinking Voice-Face Correlation: A Geometry View. 2458-2467 - Baiang Li
, Huan Zheng
, Zhao Zhang
, Yang Zhao
, Zhongqiu Zhao
, Haijun Zhang
:
Dynamic Grouped Interaction Network for Low-Light Stereo Image Enhancement. 2468-2476 - Jiafu Wu
, Jian Li
, Jiangning Zhang
, Boshen Zhang
, Mingmin Chi
, Yabiao Wang
, Chengjie Wang
:
PVG: Progressive Vision Graph for Vision Recognition. 2477-2486 - Chenyi Zhuang
, Pan Gao
, Aljosa Smolic
:
StylePrompter: All Styles Need Is Attention. 2487-2497 - Pengling Zhang
, Huibin Yan
, Wenhui Wu
, Shuoyao Wang
:
Improving Federated Person Re-Identification through Feature-Aware Proximity and Aggregation. 2498-2506 - Xizhe Xue
, Dongdong Yu
, Lingqiao Liu
, Yu Liu, Satoshi Tsutsui
, Ying Li
, Zehuan Yuan
, Ping Song
, Mike Zheng Shou
:
Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization. 2507-2515 - Dongliang Zhu
, Ruimin Hu
, Shengli Song
, Xiang Guo
, Xixi Li
, Zheng Wang
:
Cross-Illumination Video Anomaly Detection Benchmark. 2516-2525 - Yuanbin Fu
, Xiaojie Guo
:
Practical Edge Detection via Robust Collaborative Learning. 2526-2534 - Haoyi Xiu
, Xin Liu
, Weimin Wang
, Kyoung-Sook Kim
, Masashi Matsuoka
:
MSECNet: Accurate and Robust Normal Estimation for 3D Point Clouds by Multi-Scale Edge Conditioning. 2535-2543 - Xiao Liu
, Xiuya Shi
, Lufei Chen
, Linbo Qing
, Chao Ren
:
Efficient Parallel Multi-Scale Detail and Semantic Encoding Network for Lightweight Semantic Segmentation. 2544-2552 - Jiquan Zhong
, Xiaolin Huang
, Xiao Yu
:
Multi-Frame Self-Supervised Depth Estimation with Multi-Scale Feature Fusion in Dynamic Scenes. 2553-2563 - Yudong Mao
, Peilin Chen
, Shurun Wang
, Shiqi Wang
, Dapeng Wu
:
Peering into The Sketch: Ultra-Low Bitrate Face Compression for Joint Human and Machine Perception. 2564-2572 - Xiaodong Jin
, Taiping Zhang
:
MTSN: Multiscale Temporal Similarity Network for Temporal Action Localization. 2573-2581 - Guanzhou Ke
, Yang Yu
, Guoqing Chao
, Xiaoli Wang
, Chenyang Xu
, Shengfeng He
:
Disentangling Multi-view Representations Beyond Inductive Bias. 2582-2590 - Lei Zhao
, Le Han
, Min Yao
, Nenggan Zheng
:
Implicit Decouple Network for Efficient Pose Estimation. 2591-2599 - Zhenjie Chen
, Hongsong Wang
, Jie Gui
:
Occluded Skeleton-Based Human Action Recognition with Dual Inhibition Training. 2625-2634 - Xujie Kang
, Kanglin Liu
, Jiang Duan
, Yuanhao Gong
, Guoping Qiu
:
P2I-NET: Mapping Camera Pose to Image via Adversarial Learning for New View Synthesis in Real Indoor Environments. 2635-2643 - Wenpeng Xing
, Jie Chen
, Ka Chun Cheung
, Simon See
:
IRCasTRF: Inverse Rendering by Optimizing Cascaded Tensorial Radiance Fields, Lighting, and Materials From Multi-view Images. 2644-2653 - Zhiqi Yu
, Jingjing Li
, Zhekai Du
, Fengling Li
, Lei Zhu
, Yang Yang
:
Noise-Robust Continual Test-Time Domain Adaptation. 2654-2662 - Zeyu Wang
, Fabien Colonnier
, Jinghong Zheng
, Jyotibdha Acharya
, Wenyu Jiang
, Kejie Huang
:
TIRDet: Mono-Modality Thermal InfraRed Object Detection Based on Prior Thermal-To-Visible Translation. 2663-2672 - Junzhe Cai
, Shuiyan Chen
, Heng Li
, Beihao Xia
, Zimin Mao
, Wei Yuan
:
HARP: Let Object Detector Undergo Hyperplasia to Counter Adversarial Patches. 2673-2683 - Lei Xu
, Rei Kawakami
, Nakamasa Inoue
:
Scale-space Tokenization for Improving the Robustness of Vision Transformers. 2684-2693 - Kosuke Mizufune
, Shunsuke Tanaka
, Toshihide Yukitake
, Tatsushi Matsubayashi
:
Margin MCC: Chance-Robust Metric for Video Boundary Detection with Allowed Margin. 2694-2703 - Liangchen Song
, Xuan Gong
, Helong Zhou
, Jiajie Chen
, Qian Zhang
, David S. Doermann
, Junsong Yuan
:
Exploring the Knowledge Transferred by Response-Based Teacher-Student Distillation. 2704-2713 - Feng Gao
, Jiaxu Leng
, Ji Gan
, Xinbo Gao
:
Selecting Learnable Training Samples is All DETRs Need in Crowded Pedestrian Detection. 2714-2722 - Qiankun Li
, Xiaolong Huang
, Zhifan Wan
, Lanqing Hu
, Shuzhe Wu
, Jie Zhang
, Shiguang Shan
, Zengfu Wang
:
Data-Efficient Masked Video Modeling for Self-supervised Action Recognition. 2723-2733 - Teng Fu
, Xiaocong Wang
, Haiyang Yu
, Ke Niu
, Bin Li
, Xiangyang Xue
:
DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions. 2734-2743 - Peiran Xu
, Yadong Mu
:
Co-Salient Object Detection with Semantic-Level Consensus Extraction and Dispersion. 2744-2755 - Xuenan Xu
, Zhiling Zhang
, Zelin Zhou
, Pingyue Zhang
, Zeyu Xie
, Mengyue Wu
, Kenny Q. Zhu
:
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data. 2756-2764 - Bingyang Wang
, Tanlin Li
, Jiannan Wu
, Yi Jiang
, Huchuan Lu
, You He
:
A Simple Baseline for Open-World Tracking via Self-training. 2765-2774 - Yuxuan Zhao
, Jin Ma
, Zhongang Qi
, Zehua Xie
, Yu Luo
, Qiusheng Kang
, Ying Shan
:
VTLayout: A Multi-Modal Approach for Video Text Layout. 2775-2784 - Rajat Hebbar
, Digbalay Bose
, Shrikanth Narayanan
:
SEAR: Semantically-grounded Audio Representations. 2785-2794 - Zongyuan Yang
, Baolin Liu
, Yongping Xiong
, Lan Yi
, Guibin Wu
, Xiaojun Tang
, Ziqi Liu
, Junjie Zhou
, Xing Zhang
:
DocDiff: Document Enhancement via Residual Diffusion Models. 2795-2806 - Boshen Xu
, Sipeng Zheng
, Qin Jin
:
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World. 2807-2816 - Sihan Ma
, Qiong Cao
, Hongwei Yi
, Jing Zhang
, Dacheng Tao
:
GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction. 2817-2828 - Hui Lu
, Xixin Wu
, Zhiyong Wu
, Helen Meng
:
SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody. 2829-2837 - Xiaohan Wang
, Yuehu Liu
, Xinhang Song
, Beibei Wang
, Shuqiang Jiang
:
Generating Explanations for Embodied Action Decision from Visual Observation. 2838-2846 - Jieteng Yao
, Junjie Chen
, Li Niu
, Bin Sheng
:
Scene-aware Human Pose Generation using Transformer. 2847-2855 - Wanying Zhang
, Shen Zhao
, Fanyang Meng
, Songtao Wu
, Mengyuan Liu
:
Dynamic Compositional Graph Convolutional Network for Efficient Composite Human Motion Prediction. 2856-2864 - Jiaqi Li
, Yiran Wang
, Zihao Huang
, Jinghong Zheng
, Ke Xian
, Zhiguo Cao
, Jianming Zhang
:
Diffusion-Augmented Depth Prediction with Sparse Annotations. 2865-2876 - Chunwei Wu
, Guitao Cao
, Yan Li
, Xidong Xi
, Wenming Cao
, Hong Wang
:
Chaos to Order: A Label Propagation Perspective on Source-Free Domain Adaptation. 2877-2887 - Lianggangxu Chen
, Jiale Lu
, Youqi Song
, Changbo Wang
, Gaoqi He
:
Beware of Overcorrection: Scene-induced Commonsense Graph for Scene Graph Generation. 2888-2897 - Haiyang Yu
, Xiaocong Wang
, Ke Niu
, Bin Li
, Xiangyang Xue
:
Scene Text Segmentation with Text-Focused Transformers. 2898-2907 - Liangwei Jiang
, Jiaxin Chen
, Di Huang
, Yunhong Wang
:
MIEP: Channel Pruning with Multi-granular Importance Estimation for Object Detection. 2908-2917
Poster Session II: Understanding Multimedia Content -- Multimodal Fusion and Embedding
- Shanshan Wang
, Yiyang Chen
, Zhenwei He
, Xun Yang
, Mengzhu Wang
, Quanzeng You
, Xingyi Zhang
:
Disentangled Representation Learning with Causality for Unsupervised Domain Adaptation. 2918-2926 - Jie Wen
, Gehui Xu
, Chengliang Liu
, Lunke Fei
, Chao Huang
, Wei Wang
, Yong Xu
:
Localized and Balanced Efficient Incomplete Multi-view Clustering. 2927-2935 - Mengzhu Wang
, Junyang Chen
, Huan Wang
, Huisi Wu
, Zhidan Liu
, Qin Zhang
:
Interpolation Normalization for Contrast Domain Generalization. 2936-2945 - Yujing Liu
, Zongqian Wu
, Zhengyu Lu
, Guoqiu Wen
, Junbo Ma
, Guangquan Lu
, Xiaofeng Zhu
:
Multi-teacher Self-training for Semi-supervised Node Classification with Noisy Labels. 2946-2954 - Liang Yang
, Jiayi Wang
, Tingting Zhang
, Dongxiao He
, Chuan Wang
, Yuanfang Guo
, Xiaochun Cao
, Bingxin Niu
, Zhen Wang
:
Long Short-Term Graph Memory Against Class-imbalanced Over-smoothing. 2955-2963 - Zitan Chen
, Zhuang Qi
, Xiao Cao
, Xiangxian Li
, Xiangxu Meng
, Lei Meng
:
Class-level Structural Relation Modeling and Smoothing for Visual Representation Learning. 2964-2972 - Shengkai Sun
, Daizong Liu
, Jianfeng Dong
, Xiaoye Qu
, Junyu Gao
, Xun Yang
, Xun Wang
, Meng Wang
:
Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding. 2973-2984 - Pan Mu
, Zhiying Du
, Jinyuan Liu
, Cong Bai
:
Little Strokes Fell Great Oaks: Boosting the Hierarchical Features for Multi-exposure Image Fusion. 2985-2993 - Jing Wang
, Songhe Feng
, Gengyu Lyu
, Zhibin Gu
:
Triple-Granularity Contrastive Learning for Deep Multi-View Subspace Clustering. 2994-3002 - Zhao Su
, Yong Yang
, Shuying Huang
, Weiguo Wan
, Wei Tu
, Hangyuan Lu
, Changjie Chen
:
CTCP: Cross Transformer and CNN for Pansharpening. 3003-3011 - Yonghua Zhu
, Zhenyun Deng
, Yang Chen
, Robert Amor
, Michael Witbrock
:
Chain of Propagation Prompting for Node Classification. 3012-3020 - Yi Wen
, Suyuan Liu
, Xinhang Wan
, Siwei Wang
, Ke Liang
, Xinwang Liu
, Xihong Yang
, Pei Zhang
:
Efficient Multi-View Graph Clustering with Local and Global Structure Preservation. 3021-3030 - Yi Wen
, Siwei Wang
, Ke Liang
, Weixuan Liang
, Xinhang Wan
, Xinwang Liu
, Suyuan Liu
, Jiyuan Liu
, En Zhu
:
Scalable Incomplete Multi-View Clustering with Structure Alignment. 3031-3040 - Yi Bin
, Haoxuan Li
, Yahui Xu
, Xing Xu
, Yang Yang
, Heng Tao Shen
:
Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval. 3041-3050 - Cai Xu
, Zehui Li
, Ziyu Guan
, Wei Zhao
, Xiangyu Song
, Yue Wu
, Jianxin Li
:
Unbalanced Multi-view Deep Learning. 3051-3059 - Shuping Zhao
, Lunke Fei
, Jie Wen
, Bob Zhang
, Pengyang Zhao
:
Incomplete Multi-View Clustering with Regularized Hierarchical Graph. 3060-3068 - Man-Sheng Chen
, Jia-Qi Lin
, Chang-Dong Wang
, Wudong Xi
, Dong Huang
:
On Regularizing Multiple Clusterings for Ensemble Clustering by Graph Tensor Learning. 3069-3077 - Guixu Lin
, Jin Han
, Mingdeng Cao
, Zhihang Zhong
, Yinqiang Zheng
:
Event-guided Frame Interpolation and Dynamic Range Expansion of Single Rolling Shutter Image. 3078-3088 - Peng Zhou
, Liang Du
:
Learnable Graph Filter for Multi-view Clustering. 3089-3098 - Zhuang Qi
, Lei Meng
, Zitan Chen
, Han Hu
, Hui Lin
, Xiangxu Meng
:
Cross-Silo Prototypical Calibration for Federated Learning with Non-IID Data. 3099-3107 - Hai Zhou
, Zhe Xue
, Ying Liu
, Boang Li
, Junping Du
, Meiyu Liang
, Yuankai Qi
:
CALM: An Enhanced Encoding and Confidence Evaluating Framework for Trustworthy Multi-view Learning. 3108-3116 - Houlun Chen
, Xin Wang
, Xiaohan Lan
, Hong Chen
, Xuguang Duan
, Jia Jia
, Wenwu Zhu
:
Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding. 3117-3128 - Lei Liu
, Chenglong Li
, Yun Xiao
, Jin Tang
:
Quality-Aware RGBT Tracking via Supervised Reliability Learning and Weighted Residual Guidance. 3129-3137 - Yang Wang
, Bo Dong
, Yuji Zhang
, Yunduo Zhou
, Haiyang Mei
, Ziqi Wei
, Xin Yang:
Event-Enhanced Multi-Modal Spiking Neural Network for Dynamic Obstacle Avoidance. 3138-3148 - Yujun Ma
, Benjia Zhou
, Ruili Wang
, Pichao Wang
:
Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition. 3149-3160 - Peng Zhao
, Qiangchang Wang
, Yilong Yin
:
M3R: Masked Token Mixup and Cross-Modal Reconstruction for Zero-Shot Learning. 3161-3171 - Yicong Li
, Xun Yang
, An Zhang
, Chun Feng
, Xiang Wang
, Tat-Seng Chua
:
Redundancy-aware Transformer for Video Question Answering. 3172-3180 - Wanting Yin
, Hongtao Xie
, Lei Zhang
, Jiannan Ge
, Pandeng Li
, Chuanbin Liu
, Yongdong Zhang
:
Frequency-based Zero-Shot Learning with Phase Augmentation. 3181-3189 - Shiyuan Yang
, Xiaodong Chen
, Jing Liao
:
Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model. 3190-3199 - Fangjian Lin
, Jianlong Yuan
, Sitong Wu
, Fan Wang
, Zhibin Wang
:
UniNeXt: Exploring A Unified Architecture for Vision Recognition. 3200-3208 - Junjie Wu
, Chen Gong
, Ziqiang Cao
, Guohong Fu
:
MCG-MNER: A Multi-Granularity Cross-Modality Generative Framework for Multimodal NER with Instruction. 3209-3218 - Siran Peng
, Chenhao Guo
, Xiao Wu
, Liang-Jian Deng
:
U2Net: A General Framework with Spatial-Spectral-Integrated Double U-Net for Image Fusion. 3219-3227 - Yansheng Qiu
, Ziyuan Zhao
, Hongdou Yao
, Delin Chen
, Zheng Wang
:
Modal-aware Visual Prompting for Incomplete Multi-modal Brain Tumor Segmentation. 3228-3239 - Hui Tang
, Xun Liang
:
Where to Find Fascinating Inter-Graph Supervision: Imbalanced Graph Classification with Kernel Information Bottleneck. 3240-3249 - Wuyuan Xie
, Kaimin Wang
, Yakun Ju
, Miaohui Wang
:
pmBQA: Projection-based Blind Point Cloud Quality Assessment via Multimodal Learning. 3250-3258 - Zihao Zhang
, Qianqian Wang
, Zhiqiang Tao
, Quanxue Gao
, Wei Feng
:
Dropping Pathways Towards Deep Multi-View Graph Subspace Clustering Networks. 3259-3267 - Penglei Wang
, Danyang Wu
, Rong Wang
, Feiping Nie
:
Multi-view Graph Clustering via Efficient Global-Local Spectral Embedding Fusion. 3268-3276 - Hao Wang
, Zhi-Qi Cheng
, Jingdong Sun
, Xin Yang
, Xiao Wu
, Hongyang Chen
, Yan Yang
:
Debunking Free Fusion Myth: Online Multi-view Anomaly Detection with Disentangled Product-of-Experts Modeling. 3277-3286 - Yunlong Lin
, Zhenqi Fu
, Ge Meng
, Yingying Wang
, Yuhang Dong
, Linyu Fan
, Hedeng Yu
, Xinghao Ding
:
Domain-irrelevant Feature Learning for Generalizable Pan-sharpening. 3287-3296 - Qingwei Wang
, Jinyu Yang
, Xiaosheng Yu
, Fangyi Wang
, Peng Chen
, Feng Zheng
:
Depth-aided Camouflaged Object Detection. 3297-3306 - Wei Ji
, Jingjing Li
, Cheng Bian
, Zhicheng Zhang
, Li Cheng
:
SemanticRT: A Large-Scale Dataset and Method for Robust Semantic Segmentation in Multispectral Images. 3307-3316 - Zhuo Chen
, Jiaoyan Chen
, Wen Zhang
, Lingbing Guo
, Yin Fang
, Yufeng Huang
, Yichi Zhang
, Yuxia Geng
, Jeff Z. Pan
, Wenting Song
, Huajun Chen
:
MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid. 3317-3327 - Jiayi Zhang
, Weixin Li
:
Multi-Modal and Multi-Scale Temporal Fusion Architecture Search for Audio-Visual Video Parsing. 3328-3336 - Jiaqi Li
, Guilin Qi
, Chuanyi Zhang
, Yongrui Chen
, Yiming Tan
, Chenlong Xia
, Ye Tian
:
Incorporating Domain Knowledge Graph into Multimodal Movie Genre Classification with Self-Supervised Attention and Contrastive Learning. 3337-3345 - Yong Yang
, Mengzhen Li
, Shuying Huang
, Hangyuan Lu
, Wei Tu
, Weiguo Wan
:
Multi-scale Spatial-Spectral Attention Guided Fusion Network for Pansharpening. 3346-3354 - Xuehao Wang
, Shuai Li
, Chenglizhao Chen
, Aimin Hao
, Hong Qin
:
Modality Profile - A New Critical Aspect to be Considered When Generating RGB-D Salient Object Detection Training Set. 3355-3364 - Meng Liu
, Ke Liang
, Dayu Hu
, Hao Yu
, Yue Liu
, Lingyuan Meng
, Wenxuan Tu
, Sihang Zhou
, Xinwang Liu
:
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification. 3365-3374 - Mufeng Yao
, Jiaqi Wang
, Jinlong Peng
, Mingmin Chi
, Chao Liu
:
FOLT: Fast Multiple Object Tracking from UAV-captured Videos Based on Optical Flow. 3375-3383 - Zihan Li
, Yuan Zheng
, Xiangde Luo
, Dandan Shan
, Qingqi Hong
:
ScribbleVC: Scribble-supervised Medical Image Segmentation with Vision-Class Embedding. 3384-3393 - Jiaqing Fan
, Tiankang Su
, Kaihua Zhang
, Bo Liu, Qingshan Liu
:
Temporally Efficient Gabor Transformer for Unsupervised Video Object Segmentation. 3394-3402 - Haowei Wang
, Jiji Tang
, Jiayi Ji
, Xiaoshuai Sun
, Rongsheng Zhang
, Yiwei Ma
, Minda Zhao
, Lincheng Li
, Zeng Zhao
, Tangjie Lv
, Rongrong Ji
:
Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation. 3403-3414 - Kongming Liang
, Xinran Wang
, Haiwen Zhang
, Zhanyu Ma
, Jun Guo
:
Hierarchical Visual Attribute Learning in the Wild. 3415-3423 - Qiang Zhang
, Jiawei Liu
, Fanrui Zhang
, Jingyi Xie
, Zheng-Jun Zha
:
Hierarchical Semantic Enhancement Network for Multimodal Fake News Detection. 3424-3433 - Meng Shen
, Yizheng Huang
, Jianxiong Yin
, Heqing Zou
, Deepu Rajan
, Simon See
:
Towards Balanced Active Learning for Multimodal Classification. 3434-3445 - Shiping Ge
, Zhiwei Jiang
, Yafeng Yin
, Cong Wang
, Zifeng Cheng
, Qing Gu
:
Learning Event-Specific Localization Preferences for Audio-Visual Event Localization. 3446-3454 - Zongwei Wu
, Jingjing Wang
, Zhuyun Zhou
, Zhaochong An
, Qiuping Jiang
, Cédric Demonceaux
, Guolei Sun
, Radu Timofte
:
Object Segmentation by Mining Cross-Modal Semantics. 3455-3464 - Wenxin Ni
, Qianqian Xu
, Yangbangyan Jiang
, Zongsheng Cao
, Xiaochun Cao
, Qingming Huang
:
PSNEA: Pseudo-Siamese Network for Entity Alignment between Multi-modal Knowledge Graphs. 3489-3497 - Xinyue Chen
, Jie Xu
, Yazhou Ren
, Xiaorong Pu
, Ce Zhu
, Xiaofeng Zhu
, Zhifeng Hao
, Lifang He
:
Federated Deep Multi-View Clustering with Global Self-Supervision. 3498-3506 - Sung Jin Um
, Dongjin Kim
, Jung Uk Kim
:
Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization. 3507-3516 - Fangming Zhong
, Chenglong Chu
, Zijie Zhu
, Zhikui Chen
:
Hypergraph-Enhanced Hashing for Unsupervised Cross-Modal Retrieval via Robust Similarity Guidance. 3517-3527 - Yue Liu
, Ke Liang
, Jun Xia
, Xihong Yang
, Sihang Zhou
, Meng Liu
, Xinwang Liu
, Stan Z. Li
:
Reinforcement Graph Clustering with Unknown Cluster Number. 3528-3537 - Jingyu Wu
, Shi Chen
, Shuyu Gan
, Weijun Li
, Changyuan Yang
, Lingyun Sun
:
Cultural Self-Adaptive Multimodal Gesture Generation Based on Multiple Culture Gesture Dataset. 3538-3549 - Xin Zou
, Chang Tang
, Xiao Zheng
, Zhenglai Li
, Xiao He
, Shan An
, Xinwang Liu
:
DPNET: Dynamic Poly-attention Network for Trustworthy Multi-modal Classification. 3550-3559 - Zhihao Zhang
, Yiwei Chen
, Weizhan Zhang
, Caixia Yan
, Qinghua Zheng
, Qi Wang
, Wangdu Chen
:
Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer. 3560-3568 - Jinda Lu
, Shuo Wang
, Xinyu Zhang
, Yanbin Hao
, Xiangnan He
:
Semantic-based Selection, Synthesis, and Supervision for Few-shot Learning. 3569-3578 - Jinyong Wen
, Shiming Xiang
, Chunhong Pan
:
Exploring Universal Principles for Graph Contrastive Learning: A Statistical Perspective. 3579-3589 - Deepanway Ghosal
, Navonil Majumder
, Ambuj Mehrish
, Soujanya Poria
:
Text-to-Audio Generation using Instruction Guided Latent Diffusion Model. 3590-3598 - Shangyu Xing
, Fei Zhao
, Zhen Wu
, Chunhui Li
, Jianbing Zhang
, Xinyu Dai
:
DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking. 3599-3608 - Shaokui Gu
, Xu Yuan
, Liang Zhao
, Zhenjiao Liu
, Yan Hu
, Zhikui Chen
:
MVCIR-net: Multi-view Clustering Information Reinforcement Network. 3609-3618 - Yixi Liu
, Yuze Tan
, Hongjie Wu
, Shudong Huang
, Yazhou Ren
, Jiancheng Lv
:
Preserving Local and Global Information: An Effective Metric-based Subspace Clustering. 3619-3627 - Jiaming Gu
, Jingyu Zhang
, Muyang Zhang
, Weiliang Meng
, Shibiao Xu
, Jiguang Zhang
, Xiaopeng Zhang
:
FeaCo: Reaching Robust Feature-Level Consensus in Noisy Pose Conditions. 3628-3636 - Masayasu Muraoka
, Bishwaranjan Bhattacharjee
, Michele Merler
, Graeme Blackwood
, Yulong Li
, Yang Zhao
:
Cross-Lingual Transfer of Large Language Model by Visually-Derived Supervision Toward Low-Resource Languages. 3637-3646 - Jingyang Yuan
, Xiao Luo
, Yifang Qin
, Zhengyang Mao
, Wei Ju
, Ming Zhang
:
ALEX: Towards Effective Graph Transfer Learning with Noisy Labels. 3647-3656 - Chenwei Zhang
, Yuxuan Hu
, Min Yang
, Chengming Li
, Xiping Hu
:
Skeletal Spatial-Temporal Semantics Guided Homogeneous-Heterogeneous Multimodal Network for Action Recognition. 3657-3666 - Zhong Chen
, Zhizhong Zhang
, Xin Tan
, Yanyun Qu
, Yuan Xie
:
Unveiling the Power of CLIP in Unsupervised Visible-Infrared Person Re-Identification. 3667-3675 - Haowen Wang
, Zhipeng Fan
, Zhen Zhao
, Zhengping Che
, Zhiyuan Xu
, Dong Liu
, Feifei Feng
, Yakun Huang
, Xiuquan Qiao
, Jian Tang
:
DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field. 3676-3685 - Yuechen Wang
, Wengang Zhou
, Zhenbo Lu
, Houqiang Li
:
Text-Only Training for Visual Storytelling. 3686-3695 - Zihao Zhang
, Jie Wang
, Yahong Han
:
Saliency Prototype for RGB-D and RGB-T Salient Object Detection. 3696-3705 - Zhu Liu
, Jinyuan Liu
, Benzhuang Zhang
, Long Ma
, Xin Fan
, Risheng Liu
:
PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant Semantic Segmentation. 3706-3714 - Baogui Xu
, Chengjin Xu
, Bing Su
:
Cross-Modal Graph Attention Network for Entity Alignment. 3715-3723 - Yuwei Zhou
, Xin Wang
, Hong Chen
, Xuguang Duan
, Wenwu Zhu
:
Intra- and Inter-Modal Curriculum for Multimodal Learning. 3724-3735 - Yaobin Zhang
, Jianming Lv
, Chen Liu
, Hongmin Cai
:
Graph based Spatial-temporal Fusion for Multi-modal Person Re-identification. 3736-3744 - Yuanbin Wang
, Shaofei Huang
, Yulu Gao
, Zhen Wang
, Rui Wang
, Kehua Sheng
, Bo Zhang
, Si Liu
:
Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation. 3745-3754 - Zhaojian Li
, Bin Zhao
, Yuan Yuan
:
Bio-Inspired Audiovisual Multi-Representation Integration via Self-Supervised Learning. 3755-3764 - Junyin Wang
, Chenghu Du
, Hui Li
, Shengwu Xiong
:
DLFusion: Painting-Depth Augmenting-LiDAR for Multimodal Fusion 3D Object Detection. 3765-3776 - Wenna Wang
, Tao Zhuo
, Xiuwei Zhang
, Mingjun Sun
, Hanlin Yin
, Yinghui Xing
, Yanning Zhang
:
Automatic Network Architecture Search for RGB-D Semantic Segmentation. 3777-3786 - Nuo Chen
, Jin Xie
, Jing Nie
, Jiale Cao, Zhuang Shao
, Yanwei Pang:
Attentive Alignment Network for Multispectral Pedestrian Detection. 3787-3795 - Dong Chen
, Siliang Tang
, Zijin Shen
, Guoming Wang
, Jun Xiao
, Yueting Zhuang
, Carl Yang
:
FedAA: Using Non-sensitive Modalities to Improve Federated Learning while Preserving Image Privacy. 3796-3806 - Mengze Li
, Haoyu Zhang
, Juncheng Li
, Zhou Zhao
, Wenqiao Zhang
, Shengyu Zhang
, Shiliang Pu
, Yueting Zhuang
, Fei Wu
:
Unsupervised Domain Adaptation for Video Object Grounding with Cascaded Debiasing Learning. 3807-3816 - Zhengyang Mao
, Wei Ju
, Yifang Qin
, Xiao Luo
, Ming Zhang
:
RAHNet: Retrieval Augmented Hybrid Network for Long-tailed Graph Classification. 3817-3826 - Youngjoon Jang
, Kyeongha Rho
, Jong-Bin Woo
, Hyeongkeun Lee
, Jihwan Park
, Youshin Lim
, Byeong-Yeol Kim
, Joon Son Chung
:
That's What I Said: Fully-Controllable Talking Face Generation. 3827-3836 - Quanmin Liang
, Xiawu Zheng
, Kai Huang
, Yan Zhang
, Jie Chen
, Yonghong Tian
:
Event-Diffusion: Event-Based Image Reconstruction and Restoration with Diffusion Models. 3837-3846 - Han Fang
, Zhifei Yang
, Xianghao Zang
, Chao Ban
, Zhongjiang He
, Hao Sun
, Lanxiang Zhou
:
Mask to Reconstruct: Cooperative Semantics Completion for Video-text Retrieval. 3847-3856 - Yixuan Ma
, Kun Zhan
:
Self-Contrastive Graph Diffusion Network. 3857-3865 - Yiyang Chen
, Shanshan Zhao
, Changxing Ding
, Liyao Tang
, Chaoyue Wang
, Dacheng Tao
:
Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic Segmentation. 3866-3875 - Ren Wang
, Haoliang Sun
, Xiushan Nie
, Yuxiu Lin
, Xiaoming Xi
, Yilong Yin
:
Multi-View Representation Learning via View-Aware Modulation. 3876-3886 - Boxiang Yun
, Xingran Xie
, Qingli Li
, Yan Wang
:
Uni-Dual: A Generic Unified Dual-Task Medical Self-Supervised Learning Framework. 3887-3896 - Yifan Dong
, Suhang Wu
, Fandong Meng
, Jie Zhou
, Xiaoli Wang
, Jianxin Lin
, Jinsong Su
:
Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering. 3897-3907 - Shilong Li
, Boyu Qiao
, Kun Li
, Qianqian Lu
, Meng Lin
, Wei Zhou
:
Multi-modal Social Bot Detection: Learning Homophilic and Heterophilic Connections Adaptively. 3908-3916 - Weibing Zhao
, Haiming Zhang
, Chaoda Zheng
, Xu Yan
, Shuguang Cui
, Zhen Li
:
CPU: Codebook Lookup Transformer with Knowledge Distillation for Point Cloud Upsampling. 3917-3925 - Mohit Tomar
, Abhisek Tiwari
, Tulika Saha
, Sriparna Saha
:
Your tone speaks louder than your face! Modality Order Infused Multi-modal Sarcasm Detection. 3926-3933 - Jieming Wang
, Ziyan Li
, Jianfei Yu
, Li Yang
, Rui Xia
:
Fine-Grained Multimodal Named Entity Recognition and Grounding with a Generative Framework. 3934-3943 - Wei Liu
, Xinlei Yang
, Zhenhua Li
, Feng Qian
:
SkipStreaming: Pinpointing User-Perceived Redundancy in Correlated Web Video Streaming through the Lens of Scenes. 3944-3953 - Zhao Yang
, Bing Su
, Ji-Rong Wen
:
Synthesizing Long-Term Human Motions with Diffusion Models via Coherent Sampling. 3954-3964 - Haichao Zhang
, Yi Xu
, Hongsheng Lu
, Takayuki Shimizu
, Yun Fu
:
Layout Sequence Prediction From Noisy Mobile Modality. 3965-3974 - Chenyang Lyu, Wenxi Li
, Tianbo Ji
, Longyue Wang
, Liting Zhou, Cathal Gurrin
, Linyi Yang
, Yi Yu, Yvette Graham
, Jennifer Foster
:
Graph-Based Video-Language Learning with Multi-Grained Audio-Visual Alignment. 3975-3984 - Meng Liu
, Fenglei Zhang
, Xin Luo
, Fan Liu
, Yinwei Wei
, Liqiang Nie
:
Advancing Video Question Answering with a Multi-modal and Multi-layer Question Enhancement Network. 3985-3993 - Wenrui Li
, Xi-Le Zhao
, Zhengyu Ma
, Xingtao Wang
, Xiaopeng Fan
, Yonghong Tian
:
Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning. 3994-4002 - Qianru Qiu
, Xueting Wang
, Mayu Otani
:
Multimodal Color Recommendation in Vector Graphic Documents. 4003-4011 - Hengcan Shi
, Munawar Hayat
, Jianfei Cai
:
Open-Vocabulary Object Detection via Scene Graph Discovery. 4012-4021 - Jushuo Chen
, Feifei Dai
, Xiaoyan Gu
, Jiang Zhou
, Bo Li
, Weiping Wang
:
Universal Domain Adaptive Network Embedding for Node Classification. 4022-4030 - Chenyu Yang
, Mengxi Chen
, Yanfeng Wang
, Yu Wang
:
Uncertainty-Guided End-to-End Audio-Visual Speaker Diarization for Far-Field Recordings. 4031-4041 - Tianyu Liu
, Peng Zhang
, Wei Huang
, Yufei Zha
, Tao You
, Yanning Zhang
:
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization. 4042-4052 - Yuhuan Lu
, Bangchao Deng
, Weijian Yu
, Dingqi Yang
:
HELIOS: Hyper-Relational Schema Modeling from Knowledge Graphs. 4053-4064 - Zhongfan Sun
, Yongli Hu
, Qingqing Gao
, Huajie Jiang
, Junbin Gao
, Yanfeng Sun
, Baocai Yin
:
Breaking the Barrier Between Pre-training and Fine-tuning: A Hybrid Prompting Model for Knowledge-Based VQA. 4065-4073 - Ziteng Wen
, Hai Xu
, Chenyu Liu
, Tao Guo
, Jinshui Hu
, Xuming He
, Fengren Wang
, Shun Lou
, Haibo Fan
:
OccluBEV: Occlusion Aware Spatiotemporal Modeling for Multi-view 3D Object Detection. 4074-4083
Poster Session III: Understanding Multimedia Content -- Vision and Language
- Xingyu Shen
, Xiang Zhang
, Xun Yang
, Yibing Zhan
, Long Lan
, Jianfeng Dong
, Hongzhou Wu
:
Semantics-Enriched Cross-Modal Alignment for Complex-Query Video Moment Retrieval. 4109-4118 - Yun Liu
, Zhongsheng Yan
, Sixiang Chen
, Tian Ye
, Wenqi Ren
, Erkang Chen
:
NightHazeFormer: Single Nighttime Haze Removal Using Prior Query Transformer. 4119-4128 - Hua Li
, Junyan Liang
, Wenjie Li
, Wenhui Wu
:
FSNet: Frequency Domain Guided Superpixel Segmentation Network for Complex Scenes. 4129-4137 - Zhi Chen
, Peng-Fei Zhang
, Jingjing Li
, Sen Wang
, Zi Huang
:
Zero-Shot Learning by Harnessing Adversarial Samples. 4138-4146 - Tian Ye
, Sixiang Chen
, Yun Liu
, Wenhao Chai
, Jinbin Bai
, Wenbin Zou
, Yunchen Zhang
, Mingchao Jiang
, Erkang Chen
, Chenghao Xue
:
Sequential Affinity Learning for Video Restoration. 4147-4156 - Yiwei Ma
, Xiaoshuai Sun
, Jiayi Ji
, Guannan Jiang
, Weilin Zhuang
, Rongrong Ji
:
Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval. 4157-4168 - Rui Xu
, Le Hui
, Yuehui Han
, Jianjun Qian
, Jin Xie:
Transformer-based Point Cloud Generation Network. 4169-4177 - Jun Guo
, Xingyu Zheng
, Aishan Liu
, Siyuan Liang
, Yisong Xiao
, Yichao Wu
, Xianglong Liu
:
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks. 4178-4189 - Daizong Liu
, Xiaoye Qu
, Jianfeng Dong
, Guoshun Nan
, Pan Zhou
, Zichuan Xu
, Lixing Chen
, He Yan
, Yu Cheng:
Filling the Information Gap between Video and Query for Language-Driven Moment Retrieval. 4190-4199 - Zhibo Tian
, Xiaolin Zhang
, Peng Zhang
, Kun Zhan
:
Improving Semi-Supervised Semantic Segmentation with Dual-Level Siamese Structure Network. 4200-4208 - Jiarui Yang
, Chuan Wang
, Zeming Liu
, Jiahong Wu
, Dongsheng Wang
, Liang Yang
, Xiaochun Cao
:
Focusing on Flexible Masks: A Novel Framework for Panoptic Scene Graph Generation with Relation Constraints. 4209-4218 - Chunyu Xie
, Heng Cai
, Jincheng Li
, Fanjing Kong
, Xiaoyu Wu
, Jianfei Song
, Henrique Morimitsu
, Lin Yao
, Dexin Wang
, Xiangzheng Zhang
, Dawei Leng
, Baochang Zhang
, Xiangyang Ji
, Yafeng Deng
:
CCMB: A Large-scale Chinese Cross-modal Benchmark. 4219-4227 - Sixiang Chen
, Tian Ye
, Yun Liu
, Jinbin Bai
, Haoyu Chen
, Yunlong Lin
, Jun Shi
, Erkang Chen
:
CPLFormer: Cross-scale Prototype Learning Transformer for Image Snow Removal. 4228-4239 - Xuan Yao
, Junyu Gao
, Mengyuan Chen
, Changsheng Xu
:
Video Entailment via Reaching a Structure-Aware Cross-modal Consensus. 4240-4249 - Cheng Chen
, Yunqing Chen
, Shuang Song
, Jianan Wang
, Huansheng Ning
, Ruoxiu Xiao
:
Cerebrovascular Segmentation in TOF-MRA with Topology Regularization Adversarial Model. 4250-4259 - Jiale Yu
, Baopeng Zhang
, Qirui Li
, Haoyang Chen
, Zhu Teng
:
Hierarchical Reasoning Network with Contrastive Learning for Few-Shot Human-Object Interaction Recognition. 4260-4268 - Sixiang Chen
, Tian Ye
, Chenghao Xue
, Haoyu Chen
, Yun Liu
, Erkang Chen
, Lei Zhu
:
Uncertainty-Driven Dynamic Degradation Perceiving and Background Modeling for Efficient Single Image Desnowing. 4269-4280 - Chenpeng Du
, Qi Chen
, Tianyu He
, Xu Tan
, Xie Chen
, Kai Yu
, Sheng Zhao
, Jiang Bian
:
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder. 4281-4289 - Jiexin Wang
, Yujie Zhou, Wenwen Qiang
, Ying Ba
, Bing Su, Ji-Rong Wen
:
Spatio-Temporal Branching for Motion Prediction using Motion Increments. 4290-4299 - Zhenqian Wu
, Yazhou Ren
, Xiaorong Pu
, Zhifeng Hao
, Lifang He
:
Generative Neutral Features-Disentangled Learning for Facial Expression Recognition. 4300-4308 - Tingting Wang
, Yongxu Ye
, Faming Fang
, Guixu Zhang
, Ming Xu
:
Deep Algorithm Unrolling with Registration Embedding for Pansharpening. 4309-4318 - Huilin Zhu
, Jingling Yuan, Xian Zhong, Zhengwei Yang
, Zheng Wang, Shengfeng He
:
DAOT: Domain-Agnostically Aligned Optimal Transport for Domain-Adaptive Crowd Counting. 4319-4329 - Wei Ji
, Renjie Liang
, Lizi Liao
, Hao Fei
, Fuli Feng
:
Partial Annotation-based Video Moment Retrieval via Iterative Learning. 4330-4339 - Yirui Shen
, Jingxuan Kang
, Shuang Li
, Zhenjie Yu
, Shuigen Wang
:
Style Transfer Meets Super-Resolution: Advancing Unpaired Infrared-to-Visible Image Translation with Detail Enhancement. 4340-4348 - Chongyang Zhao
, Yuankai Qi
, Qi Wu
:
Mind the Gap: Improving Success Rate of Vision-and-Language Navigation by Revisiting Oracle Success Routes. 4349-4358 - Xinda Liu
, Yaohui Zhu
, Linhu Liu
, Jiang Tian
, Lili Wang
:
Feature-Suppressed Contrast for Self-Supervised Food Pre-training. 4359-4367 - Yuchen Zhou
, Guang Tan
, Mengtang Li
, Chao Gou
:
Learning from Easy to Hard Pairs: Multi-step Reasoning Network for Human-Object Interaction Detection. 4368-4377 - Chengyang Fang
, Jiangnan Li
, Liang Li
, Can Ma
, Dayong Hu
:
Separate and Locate: Rethink the Text in Text-based Visual Question Answering. 4378-4388 - Yunshi Lan
, Xiang Li
, Xin Liu
, Yang Li
, Wei Qin
, Weining Qian
:
Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts. 4389-4400 - Jie Xu
, Shanshan Zhang
, Jian Yang
:
Adaptive Decoupled Pose Knowledge Distillation. 4401-4409 - Li Li
, Chenwei Wang
, You Qin
, Wei Ji
, Renjie Liang
:
Biased-Predicate Annotation Identification via Unbiased Visual Predicate Representation. 4410-4420 - Huan Liu
, Lu Zhang
, Jihong Guan
, Shuigeng Zhou
:
Zero-Shot Object Detection by Semantics-Aware DETR with Adaptive Contrastive Loss. 4421-4430 - Tao Jin
, Xize Cheng
, Linjun Li
, Wang Lin
, Ye Wang, Zhou Zhao
:
Rethinking Missing Modality Learning from a Decoding Perspective. 4431-4439 - Zhijin Ge
, Fanhua Shang
, Hongying Liu
, Yuanyuan Liu
, Liang Wan
, Wei Feng
, Xiaosen Wang
:
Improving the Transferability of Adversarial Examples with Arbitrary Style Transfer. 4440-4449 - Xin Wang
, Zihao Wu
, Hong Chen
, Xiaohan Lan
, Wenwu Zhu
:
Mixup-Augmented Temporally Debiased Video Grounding with Content-Location Disentanglement. 4450-4459 - Yaya Shi
, Haowei Liu
, Haiyang Xu
, Zongyang Ma
, Qinghao Ye
, Anwen Hu
, Ming Yan
, Ji Zhang
, Fei Huang
, Chunfeng Yuan
, Bing Li
, Weiming Hu
, Zheng-Jun Zha
:
Learning Semantics-Grounded Vocabulary Representation for Video-Text Retrieval. 4460-4470 - Jiawei Li
, Jiansheng Chen
, Jinyuan Liu
, Huimin Ma
:
Learning a Graph Neural Network with Cross Modality Interaction for Image Fusion. 4471-4479 - Chaoya Jiang
, Haiyang Xu
, Wei Ye
, Qinghao Ye
, Chenliang Li
, Ming Yan
, Bin Bi
, Shikun Zhang
, Fei Huang
, Ji Zhang
:
COPA : Efficient Vision-Language Pre-training through Collaborative Object- and Patch-Text Alignment. 4480-4491 - Shuyu Yang
, Yinan Zhou
, Zhedong Zheng
, Yaxiong Wang
, Li Zhu
, Yujiao Wu
:
Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark. 4492-4501 - Shiwei Gan
, Yafeng Yin
, Zhiwei Jiang
, Lei Xie
, Sanglu Lu
:
Towards Real-Time Sign Language Recognition and Translation on Edge Devices. 4502-4512 - Qiwei Li
, Zuchao Li
, Xiantao Cai
, Bo Du
, Hai Zhao
:
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling. 4513-4523 - Shaokun Wang
, Weiwei Shi
, Yuhang He
, Yifan Yu
, Yihong Gong
:
Non-Exemplar Class-Incremental Learning via Adaptive Old Class Reconstruction. 4524-4534 - Ruixiang Jiang
, Lingbo Liu
, Changwen Chen
:
CLIP-Count: Towards Text-Guided Zero-Shot Object Counting. 4535-4545 - Fuxiang Yang
, Tonghua Su
, Xiang Zhou
, Donglin Di
, Zhongjie Wang
, Songze Li
:
Self-Supervised Cross-Language Scene Text Editing. 4546-4554 - Feng Chen
, Jiajia Liu
, Kaixiang Ji
, Wang Ren
, Jian Wang
, Jingdong Chen
:
Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER. 4555-4563 - Liang He
, Hongke Wang
, Yongchang Cao
, Zhen Wu
, Jianbing Zhang
, Xinyu Dai
:
MORE: A Multimodal Object-Entity Relation Extraction Dataset with a Benchmark Evaluation. 4564-4573 - Ziyue Wu
, Junyu Gao
, Changsheng Xu
:
Weakly-supervised Video Scene Graph Generation via Unbiased Cross-modal Learning. 4574-4583 - Jiong Yin
, Liang Li
, Jiehua Zhang
, Chenggang Yan
, Lei Zhang
, Zunjie Zhu
:
Reducing Intrinsic and Extrinsic Data Biases for Moment Localization with Natural Language. 4584-4594 - Yaoming Wang
, Yuchen Liu
, Xiaopeng Zhang
, Jin Li
, Bowen Shi
, Chenglin Li
, Wenrui Dai
, Hongkai Xiong
, Qi Tian
:
VioLET: Vision-Language Efficient Tuning with Collaborative Multi-modal Gradients. 4595-4605 - Junyi Zeng
, Chong Bao
, Rui Chen
, Zilong Dong
, Guofeng Zhang
, Hujun Bao
, Zhaopeng Cui
:
Mirror-NeRF: Learning Neural Radiance Fields for Mirrors with Whitted-Style Ray Tracing. 4606-4615 - Hongbin Xu
, Weitao Chen
, Yang Liu
, Zhipeng Zhou
, Haihong Xiao
, Baigui Sun
, Xuansong Xie
, Wenxiong Kang
:
Semi-supervised Deep Multi-view Stereo. 4616-4625 - Chen Jiang
, Hong Liu
, Xuzheng Yu
, Qing Wang
, Yuan Cheng
, Jia Xu
, Zhongyi Liu
, Qingpei Guo
, Wei Chu
, Ming Yang
, Yuan Qi
:
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning. 4626-4636 - Tian Gan
, Xiao Wang
, Yan Sun
, Jianlong Wu
, Qingpei Guo
, Liqiang Nie
:
Temporal Sentence Grounding in Streaming Videos. 4637-4646 - Decheng Liu
, Weizhao Yang
, Chunlei Peng
, Nannan Wang
, Ruimin Hu
, Xinbo Gao
:
Modality-agnostic Augmented Multi-Collaboration Representation for Semi-supervised Heterogenous Face Recognition. 4647-4656 - Yifan Li
, Yaochen Li
, Wenneng Tang
, Zhifeng Zhu
, Jinhuo Yang
, Yuehu Liu
:
Swin-UNIT: Transformer-based GAN for High-resolution Unpaired Image Translation. 4657-4665 - Xiaoxiong Du
, Jun Peng
, Yiyi Zhou
, Jinlu Zhang
, Siting Chen
, Guannan Jiang
, Xiaoshuai Sun
, Rongrong Ji
:
PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks. 4666-4677 - Jingzheng Li
, Hailong Sun
:
LiFT: Transfer Learning in Vision-Language Models for Downstream Adaptation and Generalization. 4678-4687 - Manman Zhang
, Ge Luo
, Yuchen Ma
, Sheng Li
, Zhenxing Qian
, Xinpeng Zhang
:
VCMaster: Generating Diverse and Fluent Live Video Comments Based on Multimodal Contexts. 4688-4696 - Fulong Ye
, Yuxing Long
, Fangxiang Feng
, Xiaojie Wang
:
Whether you can locate or not? Interactive Referring Expression Generation. 4697-4706 - Yiming Li
, Xiaoshan Yang
, Changsheng Xu
:
Iterative Learning with Extra and Inner Knowledge for Long-tail Dynamic Scene Graph Generation. 4707-4715 - Jing Zhang
, Yingshuai Xie
, Xiaoqiang Liu
:
Improving Image Captioning through Visual and Semantic Mutual Promotion. 4716-4724 - Minghao Zhu
, Xiao Lin
, Ronghao Dang
, Chengju Liu
, Qijun Chen
:
Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning. 4725-4736 - Zhuoling Li
, Yong Wang
:
Better Integrating Vision and Semantics for Improving Few-shot Classification. 4737-4746 - Mingrui Lao
, Nan Pu
, Yu Liu
, Zhun Zhong
, Erwin M. Bakker
, Nicu Sebe
, Michael S. Lew
:
Multi-Domain Lifelong Visual Question Answering via Self-Critical Distillation. 4747-4758 - Xue Song
, Jingjing Chen
, Yu-Gang Jiang
:
Relation Triplet Construction for Cross-modal Text-to-Video Retrieval. 4759-4767 - Shuyi Ouyang
, Hongyi Wang
, Ziwei Niu
, Zhenjia Bai
, Shiao Xie
, Yingying Xu
, Ruofeng Tong
, Yen-Wei Chen
, Lanfen Lin
:
HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification. 4768-4777 - Haonan Zhang
, Lianli Gao
, Pengpeng Zeng
, Alan Hanjalic
, Heng Tao Shen
:
Depth-Aware Sparse Transformer for Video-Language Learning. 4778-4787 - Chuanpeng Yang
, Fuqing Zhu
, Jizhong Han
, Songlin Hu
:
Invariant Meets Specific: A Scalable Harmful Memes Detection Framework. 4788-4797 - Wuyuan Xie
, Miaohui Wang
:
A Method of Micro-Geometric Details Preserving in Surface Reconstruction from Gradient. 4798-4806 - Wenhui Li
, Yan Wang
, Yuting Su
, Lanjun Wang
, Weizhi Nie
, An-An Liu
:
Progressive Positive Association Framework for Image and Text Retrieval. 4807-4815 - Fangzheng Tian
, Sungchan Kim
:
Globally-Robust Instance Identification and Locally-Accurate Keypoint Alignment for Multi-Person Pose Estimation. 4816-4827 - Kun Zhang
, Lei Zhang
, Bo Hu
, Mengxiao Zhu
, Zhendong Mao
:
Unlocking the Power of Cross-Dimensional Semantic Dependency for Image-Text Matching. 4828-4837 - Zhiqing Chen
, Yawei Luo
, Jian Shao
, Yi Yang, Chunping Wang, Lei Chen
, Jun Xiao
:
Dark Knowledge Balance Learning for Unbiased Scene Graph Generation. 4838-4847 - Yanbiao Ma
, Licheng Jiao
, Fang Liu
, Shuyuan Yang
, Xu Liu
, Lingling Li
:
Orthogonal Uncertainty Representation of Data Manifold for Robust Long-Tailed Learning. 4848-4857 - Rundong He
, Rongxue Li
, Zhongyi Han
, Xihong Yang
, Yilong Yin
:
Topological Structure Learning for Weakly-Supervised Out-of-Distribution Detection. 4858-4866 - Weikang Wang
, Jing Liu
, Yuting Su
, Weizhi Nie
:
Efficient Spatio-Temporal Video Grounding with Semantic-Guided Feature Decomposition. 4867-4876 - Jiale Lu
, Lianggangxu Chen
, Youqi Song
, Shaohui Lin
, Changbo Wang
, Gaoqi He
:
Prior Knowledge-driven Dynamic Scene Graph Generation with Causal Inference. 4877-4885 - Junwen Chen
, Jie Zhu
, Yu Kong
:
ATM: Action Temporality Modeling for Video Question Answering. 4886-4895 - Shaoxiang Guo
, Qing Cai
, Lin Qi
, Junyu Dong
:
CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting. 4896-4907 - Ying Yang
, Mulin Chen
, Xuelong Li
:
A Multitask Framework for Graffiti-to-Image Translation. 4908-4916 - Zihao Wang
, Weichen Zhang
, Weihong Bao
, Fei Long
, Chun Yuan
:
Adaptive Contrastive Learning for Learning Robust Representations under Label Noise. 4917-4927 - Yunyi Xuan
, Weijie Chen
, Shicai Yang
, Di Xie
, Luojun Lin
, Yueting Zhuang
:
Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification. 4928-4938 - Yanzhe Chen
, Huasong Zhong
, Xiangteng He
, Yuxin Peng
, Lele Cheng
:
Real20M: A Large-scale E-commerce Dataset for Cross-domain Retrieval. 4939-4948 - Dongsheng Xu
, Wenye Zhao
, Yi Cai
, Qingbao Huang
:
Zero-TextCap: Zero-shot Framework for Text-based Image Captioning. 4949-4957 - Zhaoxin Wang
, Handing Wang
, Cong Tian
, Yaochu Jin
:
Adversarial Training of Deep Neural Networks Guided by Texture and Structural Information. 4958-4967 - Xu Gu
, Yuchong Sun
, Feiyue Ni
, Shizhe Chen
, Xihua Wang
, Ruihua Song
, Boyuan Li
, Xiang Cao
:
TeViS: Translating Text Synopses to Video Storyboards. 4968-4979 - Nan Xi
, Jingjing Meng
, Junsong Yuan
:
Chain-of-Look Prompting for Verb-centric Surgical Triplet Recognition in Endoscopic Videos. 5007-5016 - Wencan Huang
, Daizong Liu
, Wei Hu
:
Dense Object Grounding in 3D Scenes. 5017-5026 - Xiaoxuan He
, Siming Fu
, Xinpeng Ding
, Yuchen Cao
, Hualiang Wang
:
Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition. 5027-5037 - Kanzhi Cheng
, Wenpo Song
, Zheng Ma
, Wenhao Zhu
, Zixuan Zhu
, Jianbing Zhang
:
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model. 5038-5047 - Yue Wang
, Jinlong Peng
, Jiangning Zhang
, Ran Yi
, Liang Liu
, Yabiao Wang
, Chengjie Wang
:
Toward High Quality Facial Representation Learning. 5048-5058 - Zikai Gao
, Peng Qiao
, Yong Dou
:
HAAN: Human Action Aware Network for Multi-label Temporal Action Detection. 5059-5069 - Baoli Sun
, Xinchen Ye
, Zhihui Wang
, Haojie Li
, Zhiyong Wang
:
Exploring Coarse-to-Fine Action Token Localization and Interaction for Fine-grained Video Action Recognition. 5070-5078 - Zhe Wang
, Jiaoyan Guan
, Mengping Yang
, Ting Xiao
, Ziqiu Chi
:
Semantic-Aware Generator and Low-level Feature Augmentation for Few-shot Image Generation. 5079-5088 - Bowen Yuan
, Sisi You
, Bing-Kun Bao
:
Self-PT: Adaptive Self-Prompt Tuning for Low-Resource Visual Question Answering. 5089-5098 - Ping Wang
, Xin Yuan
:
SAUNet: Spatial-Attention Unfolding Network for Image Compressive Sensing. 5099-5108 - Lin Deng
, Yuzhong Zhong
, Maoning Wang
, Jianwei Zhang
:
CONICA: A Contrastive Image Captioning Framework with Robust Similarity Learning. 5109-5119 - Zikang Liu
, Sihan Chen
, Longteng Guo
, Handong Li
, Xingjian He
, Jing Liu
:
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner. 5120-5131 - Jiali Chen
, Zhenjun Guo
, Jiayuan Xie
, Yi Cai
, Qing Li
:
Deconfounded Visual Question Generation with Causal Inference. 5132-5142 - Jing Zhao
, Heliang Zheng
, Chaoyue Wang
, Long Lan
, Wanrong Huang
, Wenjing Yang
:
Null-text Guidance in Diffusion Models is Secretly a Cartoon-style Creator. 5143-5152 - Wenqing Wang
, Kaifeng Gao
, Yawei Luo
, Tao Jiang
, Fei Gao
, Jian Shao
, Jianwen Sun
, Jun Xiao
:
Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation. 5153-5163 - Shuo Yang
, Zirui Shang
, Xinxiao Wu
:
Probability Distribution Based Frame-supervised Language-driven Action Localization. 5164-5173 - Yaoyuan Liang
, Zhao Yang
, Yansong Tang
, Jiashuo Fan
, Ziran Li
, Jingang Wang
, Philip H. S. Torr
, Shao-Lun Huang
:
LUNA: Language as Continuing Anchors for Referring Expression Comprehension. 5174-5184 - Xuming Hu
, Junzhe Chen
, Aiwei Liu
, Shiao Meng
, Lijie Wen
, Philip S. Yu
:
Prompt Me Up: Unleashing the Power of Alignments for Multimodal Entity and Relation Extraction. 5185-5194 - Xiao Liang
, Di Wang
, Quan Wang
, Bo Wan
, Lingling An
, Lihuo He
:
Language-Guided Visual Aggregation Network for Video Question Answering. 5195-5203 - Jue Chen
, Huan Yuan
, Jianchao Tan
, Bin Chen
, Chengru Song
, Di Zhang
:
Resource Constrained Model Compression via Minimax Optimization for Spiking Neural Networks. 5204-5213 - Huimin Huang
, Yawen Huang
, Shiao Xie
, Lanfen Lin
, Ruofeng Tong, Yen-Wei Chen
, Yuexiang Li
, Yefeng Zheng
:
Semi-Supervised Convolutional Vision Transformer with Bi-Level Uncertainty Estimation for Medical Image Segmentation. 5214-5222 - Qian Yang
, Qian Chen
, Wen Wang
, Baotian Hu
, Min Zhang
:
Enhancing Multi-modal Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation. 5223-5234 - Linbo Wang
, Jing Wu
, Xianyong Fang
, Zhengyi Liu
, Chenjie Cao
, Yanwei Fu
:
Local Consensus Enhanced Siamese Network with Reciprocal Loss for Two-view Correspondence Learning. 5235-5243 - Rui Cao
, Ming Shan Hee
, Adriel Kuek
, Wen-Haw Chong
, Roy Ka-Wei Lee
, Jing Jiang
:
Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection. 5244-5252 - Tiantian Gong
, Guodong Du
, Junsheng Wang
, Yongkang Ding
, Liyan Zhang
:
Prototype-guided Cross-modal Completion and Alignment for Incomplete Text-based Person Re-identification. 5253-5261 - Yuanhao Zhai
, Mingzhen Huang
, Tianyu Luan
, Lu Dong
, Ifeoma Nwogu
, Siwei Lyu
, David S. Doermann
, Junsong Yuan
:
Language-guided Human Motion Synthesis with Atomic Actions. 5262-5271 - Yuan Zhang
, Weihua Chen
, Yichen Lu
, Tao Huang
, Xiuyu Sun
, Jian Cao
:
Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty. 5272-5280 - Yu Zhao
, Hao Fei
, Yixin Cao
, Bobo Li
, Meishan Zhang
, Jianguo Wei
, Min Zhang
, Tat-Seng Chua
:
Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role Labeling. 5281-5291 - Ziqiao Peng
, Yihao Luo
, Yue Shi
, Hao Xu
, Xiangyu Zhu
, Hongyan Liu
, Jun He
, Zhaoxin Fan
:
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces. 5292-5301 - Yujie Zhou
, Wenwen Qiang
, Anyi Rao
, Ning Lin
, Bing Su
, Jiaqi Wang
:
Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization. 5302-5310 - Guojin Zhong
, Jin Yuan
, Pan Wang
, Kailun Yang
, Weili Guan
, Zhiyong Li
:
Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation. 5311-5320 - Jiafeng Mao
, Xueting Wang
, Kiyoharu Aizawa
:
Guided Image Synthesis via Initial Image Editing in Diffusion Model. 5321-5329 - Song Yang
, Qiang Li
, Wenhui Li
, Min Liu
, Xuanya Li
, Anan Liu
:
External Knowledge Dynamic Modeling for Image-text Retrieval. 5330-5338 - Qiang Wang
, Junlong Du
, Ke Yan
, Shouhong Ding
:
Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning. 5339-5347 - Zhou Zhou
, Jiahao Chao
, Jiali Gong
, Hongfan Gao
, Zhenbing Zeng
, Zhengfeng Yang
:
Enhancing Real-Time Super Resolution with Partial Convolution and Efficient Variance Attention. 5348-5357 - Binyi Su
, Hua Zhang
, Zhong Zhou
:
HSIC-based Moving Weight Averaging for Few-Shot Open-Set Object Detection. 5358-5369 - Zhihong Chen
, Zilei Wang
, Yixin Zhang
:
Exploiting Low-confidence Pseudo-labels for Source-free Object Detection. 5370-5379 - Runnan Chen
, Xinge Zhu
, Nenglun Chen
, Wei Li
, Yuexin Ma
, Ruigang Yang
, Wenping Wang
:
Bridging Language and Geometric Primitives for Zero-shot Point Cloud Segmentation. 5380-5388 - Yuehui Han
, Jiaxin Chen
, Jianjun Qian
, Jin Xie:
Graph Spectral Perturbation for 3D Point Cloud Contrastive Learning. 5389-5398 - Jiahua Rao
, Zifei Shan
, Longpo Liu
, Yao Zhou
, Yuedong Yang
:
Retrieval-based Knowledge Augmented Vision Language Pre-training. 5399-5409 - Yulin Jin
, Xiaoyu Zhang
, Jian Lou
, Xiaofeng Chen
:
ACQ: Few-shot Backdoor Defense via Activation Clipping and Quantizing. 5410-5418 - Yi Tang
, Hiroshi Kawasaki
, Takafumi Iwaguchi
:
Underwater Image Enhancement by Transformer-based Diffusion Model with Non-uniform Sampling for Skip Strategy. 5419-5427 - Zhenghan Chen
, Changzeng Fu
, Ruoxue Wu
, Ye Wang
, Xunzhu Tang, Xiaoxuan Liang
:
LGFat-RGCN: Faster Attention with Heterogeneous RGCN for Medical ICD Coding Generation. 5428-5435 - Jianlong Yuan
, Jinchao Ge
, Zhibin Wang
, Yifan Liu
:
Semi-supervised Semantic Segmentation with Mutual Knowledge Distillation. 5436-5444 - Tao Niu
, Yihang Lou
, Yinglei Teng
, Jianzhong He
, Yiding Liu
:
Shift Pruning: Equivalent Weight Pruning for CNN via Differentiable Shift Operator. 5445-5454 - Shuman Fang
, Shuai Liu
, Jie Li
, Guannan Jiang
, Xianming Lin
, Rongrong Ji
:
Improving Human-Object Interaction Detection via Virtual Image Learning. 5455-5463 - Bo Zhang
, Jian Wang
, Hui Ma
, Bo Xu
, Hongfei Lin
:
ZRIGF: An Innovative Multimodal Framework for Zero-Resource Image-Grounded Dialogue Generation. 5464-5473 - Borui Jiang
, Yadong Mu
:
Diffused Fourier Network for Video Action Segmentation. 5474-5483 - Rui Xu
, Yong Luo
, Han Hu
, Bo Du
, Jialie Shen
, Yonggang Wen
:
Rethinking the Localization in Weakly Supervised Object Localization. 5484-5494 - Jiahua Xiao
, Yantao Ji
, Xing Wei
:
Hyperspectral Image Denoising with Spectrum Alignment. 5495-5503 - Zilin Du
, Yunxin Li
, Xu Guo
, Yidan Sun
, Boyang Li
:
Training Multimedia Event Extraction With Generated Images and Captions. 5504-5513 - Xixi Nie
, Bo Hu
, Xinbo Gao
, Leida Li
, Xiaodan Zhang
, Bin Xiao
:
BMI-Net: A Brain-inspired Multimodal Interaction Network for Image Aesthetic Assessment. 5514-5522 - Sindhu B. Hegde
, Rudrabha Mukhopadhyay
, C. V. Jawahar
, Vinay P. Namboodiri
:
Towards Accurate Lip-to-Speech Synthesis in-the-Wild. 5523-5531 - Yicheng Song
, Shuyong Gao
, Haozhe Xing
, Yiting Cheng
, Yan Wang
, Wenqiang Zhang
:
Towards End-to-End Unsupervised Saliency Detection with Self-Supervised Top-Down Context. 5532-5541 - Hanbing Liu
, Jun-Yan He
, Zhi-Qi Cheng
, Wangmeng Xiang
, Qize Yang
, Wenhao Chai
, Gaoang Wang
, Xu Bao
, Bin Luo
, Yifeng Geng
, Xuansong Xie:
PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation. 5542-5551 - Chunhui Zhang
, Xin Sun
, Yiqian Yang
, Li Liu
, Qiong Liu
, Xi Zhou
, Yanfeng Wang
:
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment. 5552-5561 - Nan Li
, Pijian Li
, Dongsheng Xu
, Wenye Zhao
, Yi Cai
, Qingbao Huang
:
Scene-text Oriented Visual Entailment: Task, Dataset and Solution. 5562-5571 - Songhe Deng
, Wei Zhuo
, Jinheng Xie
, Linlin Shen
:
QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation. 5572-5583 - Lele Lv
, Qing Liu
, Shichao Kan
, Yixiong Liang
:
Confidence-Aware Contrastive Learning for Semantic Segmentation. 5584-5593 - Ao Wang
, Hui Chen
, Zijia Lin
, Zixuan Ding
, Pengzhang Liu
, Yongjun Bao
, Weipeng Yan
, Guiguang Ding
:
Hierarchical Prompt Learning Using CLIP for Multi-label Classification with Single Positive Labels. 5594-5604 - Wenrui Li
, Zhengyu Ma
, Liang-Jian Deng
, Penghong Wang
, Jinqiao Shi
, Xiaopeng Fan
:
Reservoir Computing Transformer for Image-Text Retrieval. 5605-5613 - Gege Qi
, Yuefeng Chen
, Xiaofeng Mao
, Binyuan Hui
, Xiaodan Li
, Rong Zhang
, Hui Xue
:
Model Inversion Attack via Dynamic Memory Learning. 5614-5622 - Zhiming Hu
, Angela Ning Ye
, Salar Hosseini Khorasgani
, Iqbal Mohomed
:
AdaCLIP: Towards Pragmatic Multimodal Video Retrieval. 5623-5633 - Zhenyang Li
, Yangyang Guo
, Kejie Wang
, Xiaolin Chen
, Liqiang Nie
, Mohan S. Kankanhalli
:
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR. 5634-5644 - Keyu Tu
, Zilei Wang
, Junjie Li
, Yixin Zhang
:
Semi-supervised Domain Adaptation via Joint Contrastive Learning with Sensitivity. 5645-5654 - Xinzi Cao
, Xiawu Zheng
, Yunhang Shen
, Ke Li
, Jie Chen
, Yutong Lu
, Yonghong Tian
:
LocLoc: Low-level Cues and Local-area Guides for Weakly Supervised Object Localization. 5655-5664 - Cong-Duy Nguyen
, The-Anh Vu-Le
, Thong Nguyen
, Tho Quan, Anh Tuan Luu
:
Expand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial Alignment. 5665-5673 - Zheng Ma
, Mianzhi Pan
, Wenhan Wu
, Kanzhi Cheng
, Jianbing Zhang
, Shujian Huang
, Jiajun Chen
:
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models. 5674-5685 - Zhe Li
, Laurence T. Yang
, Xin Nie
, Bocheng Ren
, Xianjun Deng
:
Enhancing Sentence Representation with Visually-supervised Multimodal Pre-training. 5686-5695 - Longzheng Wang
, Chuang Zhang
, Hongbo Xu
, Yongxiu Xu
, Xiaohan Xu
, Siqi Wang
:
Cross-modal Contrastive Learning for Multimodal Fake News Detection. 5696-5704 - Dingyi Yang
, Hongyu Chen
, Xinglin Hou
, Tiezheng Ge
, Yuning Jiang
, Qin Jin
:
Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences. 5705-5715 - Yinuo Jing
, Chunyu Wang
, Ruxu Zhang
, Kongming Liang
, Zhanyu Ma
:
Category-Specific Prompts for Animal Action Recognition with Pretrained Vision-Language Models. 5716-5724 - Rui Xu
, Le Hui
, Yuehui Han
, Jianjun Qian
, Jin Xie:
Scene Graph Masked Variational Autoencoders for 3D Scene Generation. 5725-5733 - Shuo Huang
, Zongxin Yang
, Liangting Li
, Yi Yang, Jia Jia
:
AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion. 5734-5745 - Xu Bao
, Zhi-Qi Cheng
, Jun-Yan He
, Wangmeng Xiang
, Chenyang Li
, Jingdong Sun
, Hanbing Liu
, Wei Liu
, Bin Luo
, Yifeng Geng
, Xuansong Xie:
KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration. 5746-5755 - Zhiyu Jin
, Hanyang Yu
, Chen Haul
, Linxiang Wang
, Zuobin Zhu
, Qiu Shen
, Xun Cao
:
WormTrack: Dataset and Benchmark for Multi-Object Tracking in Worm Crowds. 5756-5763 - Jinglei Zhang
, Tiancheng Lin
, Yi Xu, Kai Chen, Rui Zhang:
Relational Contrastive Learning for Scene Text Recognition. 5764-5775 - Yiting Liu
, Liang Li
, Beichen Zhang
, Shan Huang
, Zheng-Jun Zha
, Qingming Huang
:
MaTCR: Modality-Aligned Thought Chain Reasoning for Multimodal Task-Oriented Dialogue Generation. 5776-5785 - Xiaoyu Li
, Xiaoxue Chen
, Zuming Huang
, Lele Xie
, Jingdong Chen
, Ming Yang
:
Fine-grained Pseudo Labels for Scene Text Recognition. 5786-5795 - Jiachen Sun
, Mark Ibrahim
, Melissa Hall
, Ivan Evtimov
, Z. Morley Mao
, Cristian Canton-Ferrer
, Caner Hazirbas
:
VPA: Fully Test-Time Visual Prompt Adaptation. 5796-5806 - Haonan Shi
, Wenwen Pan
, Zhou Zhao
, Mingmin Zhang
, Fei Wu
:
Unsupervised Domain Adaptation for Referring Semantic Segmentation. 5807-5818 - Guangming Shi
, Xuyang Li
, Xuemei Xie
, Mingxuan Yu
, Chengwei Rao
, Jiakai Luo
:
OCSKB: An Object Component Sketch Knowledge Base for Fast 6D Pose Estimation. 5819-5827 - Hongbo Sun
, Xiangteng He
, Jiahuan Zhou
, Yuxin Peng
:
Fine-Grained Visual Prompt Learning of Vision-Language Models for Image Recognition. 5828-5836
Poster Session IV: Engaging Users with Multimedia -- Emotional and Social Signals
- Teng Sun, Juntong Ni, Wenjie Wang, Liqiang Jing, Yinwei Wei, Liqiang Nie:
General Debiasing for Multimodal Sentiment Analysis. 5861-5869 - Tuukka Ruotsalo
, Kalle Mäkelä
, Michiel M. A. Spapé
, Luis A. Leiva
:
Feeling Positive? Predicting Emotional Image Similarity from Brain Signals. 5870-5878 - Tongjie Pan
, Yalan Ye
, Hecheng Cai
, Shudong Huang
, Yang Yang
, Guoqing Wang
:
Multimodal Physiological Signals Fusion for Online Emotion Recognition. 5879-5888 - Hanwei Liu
, Huiling Cai
, Qingcheng Lin
, Xuefeng Li
, Hui Xiao
:
Learning from More: Combating Uncertainty Cross-multidomain for Facial Expression Recognition. 5889-5898 - Yizhuo Lu
, Changde Du
, Qiongyi Zhou
, Dianpeng Wang
, Huiguang He
:
MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion. 5899-5908 - Jaeho Yoon
, Jaewoo Park
, Kensuke Wagata
, Hojin Park
, Andrew Beng Jin Teoh
:
Pretrained Implicit-Ensemble Transformer for Open-Set Authentication on Multimodal Mobile Biometrics. 5909-5922 - Bobo Li
, Hao Fei
, Lizi Liao
, Yu Zhao
, Chong Teng
, Tat-Seng Chua
, Donghong Ji
, Fei Li
:
Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition. 5923-5934 - Yiwei Ru
, Peipei Li
, Muyi Sun
, Yunlong Wang, Kunbo Zhang
, Qi Li
, Zhaofeng He
, Zhenan Sun
:
Sensing Micro-Motion Human Patterns using Multimodal mmRadar and Video Signal for Affective and Psychological Intelligence. 5935-5946 - Yunxiao Wang
, Meng Liu
, Zhe Li
, Yupeng Hu
, Xin Luo
, Liqiang Nie
:
Unlocking the Power of Multimodal Learning for Emotion Recognition in Conversation. 5947-5955 - Jiaxin Ye
, Yujie Wei
, Xin-Cheng Wen
, Chenglong Ma
, Zhizhong Huang
, Kunhong Liu
, Hongming Shan
:
Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition. 5956-5965