


default search action
MMM 2026, Prague, Czech Republic - Part II
- Jakub Lokoc

, Ladislav Peska
, Jan Zahálka
, Stevan Rudinac
, Marc A. Kastner
, Jingjing Chen, Min-Chun Hu
, Jiaxin Wu
, Ujjwal Sharma
:
MultiMedia Modeling - 32nd International Conference on Multimedia Modeling, MMM 2026, Prague, Czech Republic, January 29-31, 2026, Proceedings, Part II. Lecture Notes in Computer Science 16413, Springer 2026, ISBN 978-981-95-6956-4 - Oriana Presacan, Alireza Nik, Vajira Thambawita, Bogdan Ionescu, Michael Riegler:

A Comparative Study of Decoding Strategies in Medical Text Generation. 1-15 - Peirou Liang, Meng Yang, Zhiqian Wu, Peng Yuan Zhou, Yong Liao:

DiSCo: Disrupting Semantic Consistency for Transferable Cross-Modal Adversarial Attacks. 16-30 - Junhao Li, Jiahao Chen, Zhou Feng, Chunyi Zhou:

Auditing M-LLMs for Privacy Risks: A Synthetic Benchmark and Evaluation Framework. 31-44 - Qihao Ye, Zhuowei Wang:

FGR: Frequency Aware and Geometric Structure-Guided Multi-modality Image Registration Framework. 45-58 - Zhangyi Wang, Zongze Li:

MedFuse-GRM: Multi-scale Feature Extraction and Medically-Guided Graph Relation Modeling for Multimodal Skin Lesion Classification. 59-73 - Meiyi Lyu, Jiawei Mo, Xuewen Chen, Chaoqun Wang:

AxialUNet: A Lightweight Network for Medical Image Segmentation with Axial Operators. 74-88 - Teng Tu, Xiaohao Liu, Yunshan Ma, Ji Qi, Tat-Seng Chua:

Integrating Symbolic and Waveform Music Into Large Language Models. 89-103 - Likai Yang, Nianqiao Li, Xiaoping Liang, Lv Chen, Zhenjun Tang:

Video Hashing via a Mamba-Transformer Network for Retrieval. 104-118 - Feng Wu, Zhaojing Wang, Li Li:

DFRF-MIAD: Multimodal Industrial Anomaly Detection via Feature Reconstruction and Fusion. 119-133 - Guobin Zhang, Li Li, Zhaojing Wang, Qihang Wang, Tao Peng, Xinrong Hu:

PSR-Diff: Polarization-Guided Diffusion Model for Single Image Specular Highlight Removal. 134-148 - Shuai Li, Xin Yuan, Minshi Chen, Yi Yin, Xin Xu:

NPFML: Non-isotropic Potential Fields with Hierarchical Decay for Deep Metric Learning. 149-163 - Ruichao Ren, Yiqi Wang, Jiaxin Zhang, Wen Yin, Yong Guo, Xiaoling Li:

Robust Ensemble of GNNs with Adaptive Graph Structure Learning. 164-176 - Yuxuan Li, Yuning Ren:

Enhancing Vision Transformer with Multiple Fractional-Order Differential Operators for Image Desnowing. 177-189 - Thanh-Nhan Vo, Trong-Thuan Nguyen, Tam V. Nguyen, Minh-Triet Tran:

VENUS: Visual Editing with Noise Inversion Using Scene Graphs. 190-203 - Yuchen Deng, Hongyou Chen, Lingfeng Qu, Yong Jiang, Yong Fan:

Noise Scale Controllable Anomaly Synthesis Strategy for Industrial Anomaly Detection and Localization. 204-218 - Wei Wang, JiaYi Hu:

Enhancing Image Generation of Diffusion Models with Structural Image Guidance. 219-231 - Lin Wang, Tiansong Li, Guofen Wang, Shaoguo Cui, Hongkui Wang, Li Yu:

HCFFPN: Hierarchical Cross-Scale Feature Fusion Pyramid Network for Small Target Detection in Unmanned Aerial Vehicle Images. 232-246 - Zhaofu Zeng, Jian Xing:

MP-CLIP: Unlocking Long-Text Understanding in CLIP via Multi-paragraph Encoding. 247-261 - Bo Wu:

Token-Based Multi-condition Autoregressive Diffusion for Lung CT Image Generation. 262-275 - Qingguan Li, Jiawei Cong, Kai Zhao:

DAHM: A Dual-Stream Attention Fusion Model for Hate Content Detection. 276-289 - Feng Zhang, Junliang Tan, Zhenming Chen, Hao Feng, Biao Guo, Junyan Chen, Yao Lu, Ming Jiang:

TTEdit: Cross-Modal Fusion with Diffusion Models for Detail-Aware Fashion Editing. 290-303 - JingShuo Guan, Na Qi, Qing Zhu, Liang Chen:

UCAMNet: HVI Color Space Based Unsupervised Low-Light Enhancement via Uncertainty Constraint and Attention Mechanism. 304-318 - Guang Huo, Yue Wang:

DPC-FCNet: A Dual-Channel Cross-Modality Person re-Identification Network with Enhanced Multi-Level Feature Correlation. 319-331 - Xiaoqiang Wang, Liurui Zhao, Yanjie Wang:

Surface Defect Detection of Photovoltaic Panels Based on Deep Learning and Electroluminescent Images. 332-345 - Thanh Tu Do, Van Hua, Uyen Dang, Thu Nguyen, Steven Hicks, Pål Halvorsen, Michael A. Riegler, Binh T. Nguyen:

Low-Dimension Representation Estimation in Principal Component Analysis Under Missing Data. 346-360 - Allie Tran, Luca Rossetto:

On the Brittleness of CLIP Text Encoders. 361-374 - Zhiting Chen, Jieyun Bai, Shunning Li, Xiaoshen Zhang, Hua Lu:

DPNet: A Dual-Perception Fusion Network for Automated Coronary Artery Segmentation. 375-388 - Hongzhi Yan, Jianmei Su:

SDB: Safety Constraint Mechanism for Dual-Branch End-to-End Autonomous Driving. 389-402 - Yiqian Li, Andy J. Ma:

CSQDA: A Parameter-Efficient and Memory-Efficient Tuning Method for Medical Image Classification. 403-416 - Zhiyang Mai, Yukun Qian, Haitao Wang, Hejun Wu, Liangliang Zhou:

LCKPose: Laplacian Candidate Keypoints Modeling for 6D Object Pose Estimation. 417-431 - Wenlong Niu, Zebao Zhang:

SCP: Sinkhorn-reconciled Collaborative Prompt Learning for Vision-Language Models. 432-446 - Xinying Zhou, Leixiao Li, Hao Lin:

DAGMP: A Multimodal Learning Approach Jointly Driven by Feature Fusion and Gradient Modulation. 447-460 - Son T. Huynh, Tran Minh Huan, Tran Nguyen Minh Quang, Pham Phi Nhung, Binh T. Nguyen:

HFS: Hierarchical Fine-Tuning for Span Detection and Aspect-Based Sentiment Analysis in the Vietnamese Language. 461-475 - Guohua Miao, Zhihua Xie, Haolin Chang, Chenyu Tu:

Spatial-Spectral Prior Guided Mamba Network for Hyperspectral Image Super-Resolution. 476-489 - Congrui Yu, Bo Fan, Na Lyu:

MotionSlim: A Lightweight T2M Generation Framework Based on LLM. 490-503 - Jiale Yang, Kai Zhao, Linlin Zhang, Qingguan Li:

Boosting the Transferability of Adversarial Examples via Frequency Domain Masking and Adaptive Step Size. 504-518 - Honghui Chen, Fan Zhou, Ruomei Wang, Baoquan Zhao:

V-HOI: Velocity-Aware Human-Object Interaction Generation. 519-532 - Zhicong Sun, Jacqueline Ty Lo, Jinxing Hu:

WildfireX-SLAM: A Large-Scale Low-Altitude RGB-D Dataset for Wildfire SLAM and Beyond. 533-546 - Yu Ao, Hongze Han, Yuqin Li, Yu Miao, Weili Shi:

LGF-Net: Integrating Local and Global Features in a Dual-Branch Architecture for Tooth Segmentation in CBCT Images. 547-560 - Feng Li, Ke Wu, Yongwei Li:

MCN-CL: Multimodal Cross-Attention Network and Contrastive Learning for Multimodal Emotion Recognition. 561-574 - Guodong Wei, Jiayu Yu, Yu Ao, Yuqin Li, Guan Yuan Feng, Weili Shi, Yu Miao, Zhengang Jiang:

CTDiff : A Lightweight Hybrid Diffusion Network for Low-Light Endoscopic Image Enhancement. 575-587 - Tianshi Xu, Zhengzheng Sun, Yizheng Hu, Junyuan Shang, Si Wu:

Hierarchical Cross-Modality Interaction for Unified Video-Text Retrieval Modeling. 588-601 - Penghao Ma, Guangcun Wei, Chuike Kong, Shuo Li, Jianfeng Fang:

SE-EEND: A Structurally Enhanced End-to-End Neural Diarization System. 602-615 - Wenzheng Liu, Ming Yuan, Yizhou Wang, Lianghao Shen, Xiaofeng Wang, Qianqian Xing, Ronghui Cao, Xiaoyong Tang, Tan Deng, Cheng Fu:

SPADE: Attention-Guided Split Diffusion for Precise Spatial Control in Interior Layout Image Generation. 616-630 - Feifei Xu, Wenjing Zhu, Dongyang Li, Puzhe Li:

Question-Aware Spatial-Temporal Reasoning in Patch for Audio-Visual Question Answering. 631-645 - Haoyang Wang, Liming Liu, Xinggong Zhang:

R2-Mesh: Reinforcement Learning Powered Mesh Reconstruction via Geometry and Appearance Refinement. 646-660 - Tuan L. Vo, Uyen Dang, Thu Nguyen, Pål Halvorsen, Michael A. Riegler, Binh T. Nguyen:

DPERC: Direct Parameter Estimation for Mixed Data with Random Missingness. 661-675

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














