


default search action
ACM Transactions on Multimedia Computing, Communications, and Applications, Volume 22
Volume 22, Number 1, January 2026
Regular Papers
- Mingqiang Wei, Qian Sun, Haoran Xie, Dong Liang, Dingkun Zhu, Fu Lee Wang:

Search by Image: Deeply Exploring Beneficial Features for Beauty Product Retrieval. 1:1-1:19 - Hu Xiong, Hang Yan, Mohammad S. Obaidat, Jingxue Chen, Mingsheng Cao, Sachin Kumar, Kadambri Agarwal, Saru Kumari:

Efficient and Privacy-Enhanced Asynchronous Federated Learning for Multimedia Data in Edge-Based IoT. 2:1-2:23 - Sadia Jabeen Siddiqi, Abdulraheem H. Alobaidi, Mian Ahmad Jan, Muhammad Tariq:

Securing Vehicle-to-Digital Twin Communications in the Internet of Vehicles. 3:1-3:19 - Baoping Liu, Bo Liu, Ming Ding, Tianqing Zhu:

ForgeFinder: Perceptive Multimodal Deepfake Detection via Multi-grained Forgery Localization. 4:1-4:24 - Haolong Xiang, Xuyun Zhang, Xiaolong Xu, Amin Beheshti, Lianyong Qi, Yujie Hong, Wanchun Dou:

Federated Learning-Based Anomaly Detection with Isolation Forest in the IoT-Edge Continuum. 5:1-5:19 - Trung Thanh Nguyen, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide:

Hierarchical Global-Local Fusion for One-stage Open-vocabulary Temporal Action Detection. 6:1-6:23 - Yuan Wang, Bin Zhu, Yanbin Hao, Chong-Wah Ngo, Yi Tan, Xiang Wang:

CookingDiffusion: Cooking Procedural Image Generation with Stable Diffusion. 7:1-7:24 - Yaning Li, Hao Zhu, Bing-Kun Bao:

Light Field Reconstruction Using Multi-orientation Epipolar Plane Images. 8:1-8:22 - Qing Zhang, Jing Zhang, Xiangdong Su, Feilong Bao, Guanglai Gao:

Hyperbolic-Based Cross-Modal Semantic Remodeling Network for Zero-Shot Sketch-Based Image Retrieval. 9:1-9:23 - Anni Tang, Zhiyu Zhang, Chen Zhu, Jun Ling, Rong Xie, Li Song:

A Hybrid Scheme for Face Video Compression. 10:1-10:24 - Xingyu Liu, Yan Jiang, Xu Cheng, Hao Yu, Haoyu Chen, Guoying Zhao:

CROMBO: Cross-Modality Bootstrapping for Unified Sketch-Photo Representation Learning. 11:1-11:18 - Yan Zhang, Rui Song, Riting Xia, Zhenwei Shi:

QoE Evaluation for VR with Vibrotactile Feedback Based on Inter-user Brain Spatial Information. 12:1-12:20 - Jianjun Lei, Duohui Tu, Bo Peng, Jie Zhu, Zhe Zhang, Chong Wu, Qingming Huang:

Depth-Aware Transformer for Aerial Localization. 13:1-13:16 - Seung-Lee Lee, Minjae Kang, Bo Seok Shim, Jong-Uk Hou:

Robust 3D Watermarking for NeRF-Induced Modality Shifts. 14:1-14:23 - Zan Gao, Xiaoyi Xu, Yibo Zhao, Chunjie Ma, Yanbing Xue, Riwei Wang:

A Collaborative Hierarchical Aggregation Network for Weakly Supervised Temporal Action Localization. 15:1-15:18 - Xianxuan Lin, Bailin Yang, Zhigeng Pan, Chuangxin Cai, Shuang Wang, Aditi Bhattarai, Fan Meng:

MambaWDC: Efficient Weather Data Compression via Selective State Space Model. 16:1-16:24 - Zeyang Zhang, Hui Li, Tianyang Xu, Xiaojun Wu, Congcong Bian, Josef Kittler:

BusReF: Infrared-Visible Images Registration and Fusion Focus on Reconstructible Area Using One Set of Features. 17:1-17:19 - Jooyoung Lee, Se Yoon Jeong, Munchurl Kim:

DeepHQ: Learned Hierarchical Quantizer for Progressive Deep Image Coding. 18:1-18:24 - Karanvir Singh, Abdulmotaleb El Saddik, Mukesh Saini:

A Step Closer Towards the Digital Twin of the Plant. 19:1-19:23 - Haiyu Deng, Xu Wang, Guangsheng Yu, Wei Ni, Ying He, Tanzeela Altaf, Ren Ping Liu:

NNFMAC: A Neural Network Fingerprinting-Based Model Authentication Code Scheme. 20:1-20:25 - Xin Dong, Lihan Zhang, Aoyang Liu, Xiaojun Liang, Yutao Guo, Yansong Tang:

Enhancing Pose-Guided Human Image Generation with Comprehensive and Adjustable 3D Control. 21:1-21:24 - Yibo Xia, Qihui Zhan, Xiaoyan Luo, Xiaofeng Shi, Yunhong Wang:

SignMask: Structure-aware Masked Modeling for Holistic 3D Sign Language Production. 22:1-22:28 - Liangcheng Zhao, Yueying Wang, Yuhao Qing, Dan Zeng, Li Xu:

MCFINet: A Cost-Efficient Multi-Channel Feature Integration Network for Surface Scenarios Image Super-Resolution. 23:1-23:17 - Zhihao Wang, Feifei Zhang, Lingkai Ran, Caixia Song, Ling Zhou:

Enhancing Image Captioning through Bridging Image-Text Gap and Reducing Hallucinations. 24:1-24:23 - Ruiji Xu, Junhao Chen, Runzhe Zhang, Guanglin Dai, Keji Mao:

FaceDepth: A Robust Unimodal Depression Detection Framework Using Invariant Facial Landmark Features. 25:1-25:27 - Yixuan Li, Lipeng Ma, Weidong Yang, Ben Fei:

3DMambaComplete: Structured State Space Model for High-Efficiency Point Cloud Completion. 26:1-26:24 - Shangheng Chen, Quan Fang, Shengsheng Qian, Changsheng Xu:

Metapath-Enhanced Language Model Pretraining on Text-Attributed Heterogeneous Graphs. 27:1-27:23 - Yuanyu Zheng, Lin Zhang, Yunda Sun, Ying Shen, Shengjie Zhao:

CaneSpeaker: An LLM-Assisted Speaker for Generating Human-Like Navigation Instructions. 28:1-28:26 - Wenjun Xie, Kejun Chen, Dong Wang, Xiaoping Liu:

MatPose: A 2D Human Pose Estimation Model with Hybrid Mamba-Transformer. 29:1-29:21 - Xinyi Chen, Weimin Lei, Wei Zhang, Wenhui Ye, Yanwen Wang:

Portrait Video Compression with Semantic-guided Animation Model and Background Incremental Coding. 30:1-30:23
Volume 22, Number 2, February 2026
Regular Papers
- Shiyi Zheng, Peizhi Zhao, Qingbao Huang, Yi Cai, Haonan Cheng, Qi Wu:

Implement Referring Expression Comprehension by Extending Auto-focus Lens to Locked Vision Model. 31:1-31:24 - Dinghao Yang, Bin Wang, Weijia Li, Yiqi Lin, Conghui He:

Exploring the Interactive Guidance for Unified and Effective Image Matting. 32:1-32:24 - Masahiro Yasuda, Noboru Harada, Yasunori Ohishi, Shoichiro Saito, Akira Nakayama, Nobutaka Ono:

Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis. 33:1-33:24 - Huijie Yao, Wengang Zhou, Hao Zhou, Hezhen Hu, Houqiang Li:

Retrieval-Augmented Sign Language Translation. 34:1-34:19 - Yuanyou Xu, Zongxin Yang, Yi Yang:

Photorealistic Text-to-3D Avatar Generation with Constraints for Decoupled Geometry and Appearance. 35:1-35:22 - Jiazhong Chen, Lu Guo, Dakai Ren, Zian Fu, Furui Liu, Hao Zhu, Yuxuan Pan:

Geometry-Insensitive RPN Prototypes for Domain Adaptive 3D Object Detection. 36:1-36:22 - Yulei Yang, Zongju Peng, Huabo Zhang, Fen Chen, Qianliang Zhang:

LF-F3Net: Frequency-Guided Feature Fusion Network for Light Field Image Super-Resolution. 37:1-37:19 - Junteng Liu, Zizhe Wang, Yunji Liang, Sagar Samtani, Yangyang Li, Lei Tang, Zhiwen Yu:

A Hierarchical Hard Negative Sampling Strategy for Robust Out-of-Distribution Object Detection. 38:1-38:20 - Pindan Cao, Weiqing Min, Guorui Sheng, Yongqiang Song, Tao Yao, Lili Wang, Shuqiang Jiang:

FoodHash: Context-Aware Proxy Interaction and Fusion for Food Image Retrieval. 39:1-39:24 - Yifan Xu, Sirui Zhao, Shifeng Liu, Tong Xu, Enhong Chen:

Emotionally Controllable Audio-driven Talking Face Generation. 40:1-40:22 - Shaofan Wang, Fuhao Wei, Hong Ma, Yanfeng Sun, Baocai Yin:

Text-Prompted Prompt Generator with Uncertainty Regularization for Rehearsal-Free Class-Incremental Learning. 41:1-41:23 - Harry Cheng, Yangyang Guo, Tianyi Wang, Liqiang Nie, Mohan Kankanhalli:

Towards Generalizable Deepfake Detection by Primary Region Regularization. 42:1-42:25 - Junjie Chen, Hang Yu, Subin Huang, Sanmin Liu, Linfeng Zhang:

InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-Modal Sarcasm Detection. 43:1-43:23 - Yefei Sheng, Jie Wang, Ming Tao, Bing-Kun Bao:

AdaEdit: Adaptive Diffusion Model for Invisible Target Oriented Text-Conditioned Image Editing. 44:1-44:19 - Yuankang Pan, Zhaoquan Yuan, Xiao Wu, Zechao Li, Changsheng Xu:

THMM-CLIP: Task-Guided Hierarchical Multi-Modal Alignment for Rehearsal-Free Class Incremental Learning. 45:1-45:18 - Yu Jiang, Yongji Zhang, Siqi Li, Yuehang Wang, Yue Gao:

SkiTrack: An Aerial Skiing Benchmark for Human-Centric Object Tracking. 46:1-46:20 - Xiangling Ding, Jia Tang, Yunyi Li, Gaobo Yang, Yubo Lang:

Bi-Level Routing Attention and Enhanced Spatial-Temporal Inconsistency Learning for Deep VFI Video Detection. 47:1-47:26 - Zhe Chang, Haodong Jin, Yan Song, Ying Sun, Hui Yu:

GAT-NeRF: Geometry-Aware-Transformer-Enhanced Neural Radiance Fields for High-Fidelity 4D Facial Avatars. 48:1-48:20 - Chong-Yu Zhang, Xin Luo, Yu-Wei Zhan, Zhen-Duo Chen, Xin-Shun Xu:

Gleaning Wisdom from the Past: Towards Label Incremental Learning for Online Hashing with a Plug-and-Play Framework. 49:1-49:23 - Yilin Hou, Jin Wang, Jiade Chen, Yunhui Shi, Nam Ling, Baocai Yin:

S2PU-Net: Sparse Semantic-Guided Progressive Point Cloud Upsampling for Indoor Scenes. 50:1-50:24 - Jianhui Zou, Weijia Cao, Nankun Mu, Shuang Yi, Yifeng Zheng, Zhaoquan Gu, Zhongyun Hua:

Reversible Data Hiding over Encrypted Images via Intrinsic Correlation in Block-Based Secret Sharing. 51:1-51:25 - Shuo Han, Qibing Qin, Wenfeng Zhang, Lei Huang:

Deep Uncertainty-aware Probabilistic Hashing for Cross-modal Retrieval. 52:1-52:23 - Xiaobo Yang, Xiaojin Gong:

Re-purposing SAM into Efficient Visual Projectors for MLLM-based Referring Image Segmentation. 53:1-53:26 - Mingjie Qiu, Zhiyi Tan, Bing-Kun Bao:

MyGO: Modality-incomplete Fake News Video Detection via Prompt-assisted Modality Disentangling Model. 54:1-54:23 - Zhiwen Shao, Hang Yang, Hancheng Zhu, Rui Yao, Lixin Zou, Mengtian Li, Bin Sheng:

Spatio-Temporal Disentanglement and Constrained Self-Attention for Multi-Modal Deception Detection. 55:1-55:20 - Tao Yan, Weilong Huang, Weijiang He, Chenglong Wang, Cihang Wei, Yiwei Lu, Xiangjie Zhu, Yinghui Wang, Rynson W. H. Lau:

MDeRainNet: An Efficient Macro-pixel Image Rain Removal Network. 56:1-56:24 - Jikang Cheng, Jiaxin Ai, Zhen Han, Chao Liang, Qin Zou, Zhongyuan Wang:

IDRetracor: Towards Visual Forensics against Malicious Face Swapping. 57:1-57:22 - Yangtao Wang, Weibin Huang, Yanzhao Xie, Siyuan Chen, Weilong Peng, Maobin Tang, Meie Fang, Wensheng Zhang:

High Feature Distinguishability for Adaptive Image-text Matching with Dual-stream Transformers. 58:1-58:23
Survey Papers
- Ghulam Muhammad, Sumayah A. Almuntasheri, Fadia Alenezi, Nwraan Alhadi, Victor C. M. Leung:

EEG-based Multimodal Emotion Recognition: Recent Progress, Challenges, and Future Directions. 59:1-59:28 - Syed Umar Amin, Mohsen Guizani, M. Shamim Hossain:

Advances, Evaluation, and Explainability of Large Language Models in Healthcare: A Systematic Review. 60:1-60:32
Volume 22, Number 3, March 2026
Section: Special Issue on ACM Multimedia Systems 2024 and Co-Located Workshops Edited by Christian Timmerer, Maria Martini, Ali C. Begen, and Lucca De Cicco
- Christian Timmerer, Maria G. Martini, Ali C. Begen, Luca De Cicco:

Introduction to the Special Issue on ACM Multimedia Systems 2024 and Co-Located Workshops. 61:1-61:4 - Ali Zeynali, Mahsa Sahebdel, Mohammad H. Hajiesmaili, Ramesh K. Sitaraman:

BOLA360: Near-optimal View and Bitrate Adaptation for 360-degree Video Streaming. 62:1-62:30 - Jianxin Shi, Miao Zhang, Linfeng Shen, Jiangchuan Liu, Yuan Zhang, Lingjun Pu, Jingdong Xu:

Implicit Representation-based Volumetric Video Streaming for Photorealistic Full-scene Experience. 63:1-63:21 - Darijo Raca, Gregory M. Provan, Ahmed H. Zahran:

M2ATURE: Mobile Multistage Throughput Prediction for Adaptive Video Streaming in Cellular Networks. 64:1-64:17 - Matthias De Fré, Jeroen van der Hooft, Tim Wauters, Filip De Turck:

Scalable MDC-Based WebRTC Streaming for One-to-Many Volumetric Video Conferencing. 65:1-65:25 - Casper Haems, Jeroen van der Hooft, Hannes Mareen, Peter Steenkiste, Glenn Van Wallendael, Tim Wauters, Filip De Turck:

Hybrid Unicast-Broadcast Video Delivery for Scalable Low-Latency Live Streaming. 66:1-66:24 - Dongbiao He, Xian Yu, Canshu Lin, Cédric Westphal, Zhongxing Ming, Laizhong Cui, Xu Zhou, J. J. Garcia-Luna-Aceves, Yanbiao Li:

Enhancing Video Conference Applications with VCApather: A Network as a Service Perspective. 67:1-67:24 - Yuankang Zhao, Qinghua Wu, Gerui Lv, Furong Yang, Jiuhai Zhang, Feng Peng, Yanmei Liu, Zhenyu Li, Hongyu Guo, Ying Chen, Gaogang Xie:

Understanding and Taming the Inflated Latency in Mobile Cloud Rendering. 68:1-68:23 - Valeri George, Jens Brandenburg, Gabriel Hege, Tobias Hinz, Adam Wieckowski, Benjamin Bross, Thomas Schierl, Detlev Marpe:

Multi-level Inter-frame Parallelization in an Open Optimized VVC Encoder. 69:1-69:16 - Hamed Alimohammadzadeh, Shuqin Zhu, Shahram Ghandeharizadeh:

Techniques to Conceal Dark Standby Flying Light Specks. 70:1-70:26
Section: Regular Papers
- Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hung Chen, Marcel Worring:

Conditional Modeling-Based Automatic Video Summarization. 71:1-71:21 - Bo Peng, Lin Chen, Jiahui Song, Menglei Zhao, Qingming Huang, Jianjun Lei:

Meta-Learned Zero-Shot Sketch-Based Point Cloud Retrieval via Perspective-Predicted Feature Learning. 72:1-72:16 - Weizhi Xian, Yichi Chen, Bin Chen, Leong Hou U, Shiyou Liu, Yong Feng, Mingliang Zhou, Sam Kwong:

Neighborhood Attention-based Feature Reconstruction for Image Anomaly Detection and Localization. 73:1-73:20 - Keke Xu, Zhenghua Peng, Shuangping Huang, Gege Zhang, Yunqing Hu, Wenjie Peng:

Improving Pseudo-Labeling by Dynamic Confidence Calibration for Semi-Supervised Sequence Recognition. 74:1-74:22 - Yun Zhou, Hongfu Yin, Chunyu Tan, Qiaoyun Wu, Changyin Sun, Richang Hong:

An Efficient Hybrid Cascade Tracker with Spiking Neural Networks for Event Domain Tracking. 75:1-75:20 - Bifa Liang, Yichao Wang, Ziyang Hu, Zhicong Huang, Haifeng Hu, Jianming Xu, Dihu Chen:

RCAENet: Residual Convolutional and Attention-Enhanced Stereo Matching for Real-Time Depth Estimation on Edge Devices. 76:1-76:25 - Guanhua Zheng, Jitao Sang, Changsheng Xu:

GAROD: Delve into Gradient-Based Attribution Reliability for Out-of-Distribution Detection. 77:1-77:17 - Jun Ling, Yiwen Wang, Han Xue, Rong Xie, Li Song:

PoseTalk: Exploring Text- and Audio-Based Pose Control for One-Shot Talking Face Generation. 78:1-78:24 - Xinbo Geng, Fan Shi, Xu Cheng, Chen Jia, Shengyong Chen:

Hierarchical Spatial-Angular Representation Learning for Point-Supervised Salient Object Detection in Light Fields. 79:1-79:18 - Yifan Zhao, Ziyang Zheng, Duoduo Xue, Yong Li, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong:

Unfolding Convolutional Sparse Coding With Low-rank-Guided Hybrid Priors for Image Denoising. 80:1-80:29 - Guoyi Tang, Chunlin Li, Zihao Zhang, Kun Jiang, Bingxin Wang, Wenhao Wu, Xu Yang, Shaohua Wan:

Low-Latency Multimedia Delivery via Collaborative Cloud-Edge Caching in Edge Computing Networks. 81:1-81:24 - Yidan Xu, Suo Gao, Yinghong Cao, Jun Mou:

Multi-Image Encryption Scheme Based on Chaotic Pseudo-Random Signal Generator and DWT Compression. 82:1-82:20 - Jing-Xuan Chen, Ling Lo, Si-Yu Lu, Ling Zou, Wen-Huang Cheng, Jungwoo Huh, Sanghoon Lee:

SeCo: Semantic-Guided Multimodal Color Splash Effects. 83:1-83:21 - Zuyi Zhou, Dizhan Xue, Baoyuan Qi, Shengsheng Qian, Changsheng Xu:

Code-Driven LLM Agent for One-Shot Explanatory Visual Question Answering. 84:1-84:16 - Dongxu Mao, Shangzhi Teng, Xueqiang Lyu:

CVAF: A CLIP-Based View-Consistent Alignment Framework for Aerial-Ground Person Re-Identification. 85:1-85:19 - Yuan-Yu Tsai, Wen-Ting Jao, Yi-Hui Chen:

Authentication-enabled Reversible Data Hiding in Encrypted 3D Meshes via Effective Vertex Traversal and Secret Sharing. 86:1-86:25 - Jian Li, Quanxing Xu, Ling Zhou, Feifei Zhang, Rubing Huang:

PLMAS: Adaptive Sample Selection for Prompting LLMs in Knowledge-Based Visual Question Answering. 87:1-87:21 - Ji Dai, Quan Fang, Jun Hu, Desheng Cai, Yang Yang, Can Zhao:

Cross-Modal Attention Network with Dual Graph Learning in Multimodal Recommendation. 88:1-88:23 - Yun-Cong Liu, Zhen-Duo Chen, Qingze Bai, Xiao-Dong Xie, Hao Liu, Xin Luo, Xin-Shun Xu:

Fine-Grained Augmentation and Progressive Feature Integration for Unsupervised Fine-Grained Hashing. 89:1-89:19 - Jiaqi Liu, Xian-Ying Xu, Suo Gao, Junxin Chen, Jun Mou:

Lightweight Video Secondary-Encryption Scheme Based on YOLOv11 and a Discrete Model of Bi-Neuron HNN. 90:1-90:20

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














