


default search action
17th ECCV 2022: Tel Aviv, Israel - Volume 36
- Shai Avidan, Gabriel J. Brostow

, Moustapha Cissé, Giovanni Maria Farinella
, Tal Hassner
:
Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXVI. Lecture Notes in Computer Science 13696, Springer 2022, ISBN 978-3-031-20058-8 - Benedikt Boecking

, Naoto Usuyama
, Shruthi Bannur
, Daniel C. Castro
, Anton Schwaighofer
, Stephanie L. Hyland
, Maria Wetscherek, Tristan Naumann
, Aditya V. Nori, Javier Alvarez-Valle
, Hoifung Poon
, Ozan Oktay
:
Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing. 1-21 - Shipeng Yan, Lanqing Hong, Hang Xu, Jianhua Han, Tinne Tuytelaars

, Zhenguo Li, Xuming He:
Generative Negative Text Replay for Continual Vision-Language Pretraining. 22-38 - Junbin Xiao, Pan Zhou

, Tat-Seng Chua, Shuicheng Yan:
Video Graph Transformer for Video Question Answering. 39-58 - Kun Yan

, Lei Ji, Chenfei Wu
, Jianmin Bao, Ming Zhou, Nan Duan
, Shuai Ma:
Trace Controlled Text to Image Generation. 59-75 - A. J. Piergiovanni, Kairo Morton, Weicheng Kuo, Michael S. Ryoo, Anelia Angelova:

Video Question Answering with Iterative Video-Text Co-tokenization. 76-94 - Long Chen

, Yuhang Zheng, Jun Xiao:
Rethinking Data Augmentation for Robust Visual Question Answering. 95-112 - Zhen Wang

, Long Chen
, Wenbo Ma, Guangxing Han, Yulei Niu, Jian Shao, Jun Xiao:
Explicit Image Caption Editing. 113-129 - Jiachang Hao

, Haifeng Sun, Pengfei Ren, Jingyu Wang, Qi Qi, Jianxin Liao:
Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding. 130-147 - Spencer Whitehead, Suzanne Petryk, Vedaad Shakib, Joseph Gonzalez

, Trevor Darrell, Anna Rohrbach
, Marcus Rohrbach:
Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly. 148-166 - Van-Quang Nguyen, Masanori Suganuma, Takayuki Okatani:

GRIT: Faster and Better Image Captioning Transformer Using Dual Visual Features. 167-184 - Sunjae Yoon

, Ji Woo Hong
, Eunseop Yoon
, Dahyun Kim
, Junyeong Kim
, Hee Suk Yoon
, Chang D. Yoo
:
Selective Query-Guided Debiasing for Video Corpus Moment Retrieval. 185-200 - Cheng Shi

, Sibei Yang:
Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding. 201-218 - Zihang Meng, David Yang, Xuefei Cao, Ashish Shah, Ser-Nam Lim:

Object-Centric Unsupervised Image Captioning. 219-235 - Quan Cui, Boyan Zhou, Yu Guo, Weidong Yin, Hao Wu, Osamu Yoshie, Yubo Chen:

Contrastive Vision-Language Pre-training with Limited Resources. 236-253 - Sheng Fang

, Shuhui Wang
, Junbao Zhuo
, Xinzhe Han
, Qingming Huang
:
Learning Linguistic Association Towards Efficient Text-Video Retrieval. 254-270 - Zanming Huang, Zhongkai Shangguan, Jimuyang Zhang, Gilad Bar, Matthew Boyd, Eshed Ohn-Bar:

ASSISTER: Assistive Navigation via Conditional Instruction Generation. 271-289 - Zhaowei Cai

, Gukyeong Kwon
, Avinash Ravichandran, Erhan Bas
, Zhuowen Tu, Rahul Bhotika, Stefano Soatto:
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks. 290-308 - Wenhao Cheng

, Xingping Dong
, Salman H. Khan, Jianbing Shen
:
Learning Disentanglement with Decoupled Labels for Vision-Language Navigation. 309-329 - Qingpei Guo, Kaisheng Yao, Wei Chu:

Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input. 330-346 - Bowen Li

:
Word-Level Fine-Grained Story Visualization. 347-362 - Qi Zhang

, Yuqing Song, Qin Jin
:
Unifying Event Detection and Captioning as Sequence Generation via Pre-training. 363-379 - Chuang Lin

, Yi Jiang
, Jianfei Cai
, Lizhen Qu
, Gholamreza Haffari
, Zehuan Yuan
:
Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation. 380-397 - Christopher Thomas, Yipeng Zhang

, Shih-Fu Chang:
Fine-Grained Visual Entailment. 398-416 - Ayush Jain, Nikolaos Gkanatsios, Ishita Mediratta, Katerina Fragkiadaki:

Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds. 417-433 - Yifeng Zhang

, Ming Jiang
, Qi Zhao
:
New Datasets and Models for Contextual Reasoning in Visual Dialog. 434-451 - Joanna Hong

, Minsu Kim
, Yong Man Ro
:
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection. 452-468 - Matan Levy, Rami Ben-Ari, Dani Lischinski:

Classification-Regression for Chart Comprehension. 469-484 - Benita Wong, Joya Chen

, You Wu, Stan Weixian Lei, Dongxing Mao, Difei Gao, Mike Zheng Shou:
AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant. 485-501 - Weicheng Kuo, Fred Bertsch, Wei Li, A. J. Piergiovanni, Mohammad Saffar, Anelia Angelova:

FindIt: Generalized Localization with Natural Language Queries. 502-520 - Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang:

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling. 521-539 - Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin:

Scaling Open-Vocabulary Image Segmentation with Image-Level Labels. 540-557 - Jack Hessel

, Jena D. Hwang
, Jae Sung Park
, Rowan Zellers
, Chandra Bhagavatula
, Anna Rohrbach
, Kate Saenko
, Yejin Choi
:
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning. 558-575 - Minsu Kim

, Hyunjun Kim
, Yong Man Ro
:
Speaker-Adaptive Lip Reading with User-Dependent Padding. 576-593 - Tan M. Dinh, Rang Nguyen, Binh-Son Hua:

TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation. 594-609 - Morgan Heisler

, Amin Banitalebi-Dehkordi, Yong Zhang
:
SemAug: Semantically Meaningful Image Augmentations for Object Detection Through Language Grounding. 610-626 - Myungsub Choi

:
Referring Object Manipulation of Natural Images with Conditional Classifier-Free Guidance. 627-643 - Reuben Tan, Bryan A. Plummer, Kate Saenko, J. P. Lewis, Avneesh Sud, Thomas Leung:

NewsStories: Illustrating Articles with Visual Summaries. 644-661 - Amita Kamath, Christopher Clark, Tanmay Gupta

, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi:
Webly Supervised Concept Expansion for General Purpose Vision Models. 662-681 - Kaiwen Zhou, Xin Eric Wang:

FedVLN: Privacy-Preserving Federated Vision-and-Language Navigation. 682-699 - Haoran Wang, Dongliang He, Wenhao Wu

, Boyang Xia, Min Yang, Fu Li, Yunlong Yu, Zhong Ji, Errui Ding, Jingdong Wang:
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval. 700-716 - Tsu-Jui Fu, Xin Eric Wang, William Yang Wang:

Language-Driven Artistic Style Transfer. 717-734 - Zaid Khan

, B. G. Vijay Kumar
, Xiang Yu
, Samuel Schulter
, Manmohan Chandraker
, Yun Fu
:
Single-Stream Multi-level Alignment for Vision-Language Pretraining. 735-751

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














