


default search action
Zhenheng Yang
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2026
[i39]Matthew Gwilliam, Xiao Wang, Xuefeng Hu, Zhenheng Yang:
Implicit Neural Representation Facilitates Unified Universal Vision Encoding. CoRR abs/2601.14256 (2026)- 2025
[c21]Yuqing Wang, Shuhuai Ren, Zhijie Lin, Yujin Han, Haoyuan Guo, Zhenheng Yang, Difan Zou, Jiashi Feng, Xihui Liu:
Parallelized Autoregressive Visual Generation. CVPR 2025: 12955-12965
[c20]Tiehan Fan, Kepan Nan, Rui Xie, Penghao Zhou, Zhenheng Yang, Chaoyou Fu, Xiang Li, Jian Yang, Ying Tai:
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption. CVPR 2025: 28974-28983
[c19]Xiaotian Han, Yiren Jian, Xuefeng Hu, Haogeng Liu, Yiqi Wang, Qihang Fan, Yuang Ai, Huaibo Huang, Ran He, Zhenheng Yang, Quanzeng You:
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning. EMNLP (Findings) 2025: 14221-14231
[c18]Kepan Nan, Rui Xie, Penghao Zhou, Tiehan Fan, Zhenheng Yang, Zhijie Chen, Xiang Li, Jian Yang, Ying Tai:
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation. ICLR 2025
[c17]Jinheng Xie, Weijia Mao, Zechen Bai, David Junhao Zhang, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Zheng Shou:
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation. ICLR 2025
[c16]Xin Dong, Sen Jia, Ming Rui Wang, Yan Li, Zhenheng Yang, Bingfeng Deng, Hongyu Xiong:
COEF-VQ: Cost-Efficient Video Quality Understanding through a Cascaded Multimodal LLM Framework. KDD (2) 2025: 4387-4395
[i38]Rui Xie, Yinhong Liu, Penghao Zhou, Chen Zhao
, Jun Zhou, Kai Zhang, Zhenyu Zhang, Jian Yang, Zhenheng Yang, Ying Tai:
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution. CoRR abs/2501.02976 (2025)
[i37]Weijia Mao, Zhenheng Yang, Mike Zheng Shou:
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths. CoRR abs/2502.06474 (2025)
[i36]Yuwei Guo, Ceyuan Yang, Ziyan Yang, Zhibei Ma, Zhijie Lin, Zhenheng Yang, Dahua Lin, Lu Jiang:
Long Context Tuning for Video Generation. CoRR abs/2503.10589 (2025)
[i35]Mengyao Lyu, Yan Li, Huasong Zhong, Wenhao Yang, Hui Chen, Jungong Han, Guiguang Ding, Zhenheng Yang:
Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning. CoRR abs/2503.13383 (2025)
[i34]Yuang Ai, Qihang Fan, Xuefeng Hu, Zhenheng Yang, Ran He, Huaibo Huang:
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling. CoRR abs/2505.11196 (2025)
[i33]Weijia Mao, Zhenheng Yang, Mike Zheng Shou:
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning. CoRR abs/2505.23380 (2025)
[i32]Yipeng Du, Tiehan Fan, Kepan Nan, Rui Xie, Penghao Zhou, Xiang Li, Jian Yang, Zhenheng Yang, Ying Tai:
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs. CoRR abs/2506.01674 (2025)
[i31]Jinheng Xie, Zhenheng Yang, Mike Zheng Shou:
Show-o2: Improved Native Unified Multimodal Models. CoRR abs/2506.15564 (2025)
[i30]Yanzhe Chen, Huasong Zhong, Yan Li, Zhenheng Yang:
UniCode2: Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation. CoRR abs/2506.20214 (2025)
[i29]Qipeng Zhu, Yanzhe Chen, Huasong Zhong, Yan Li, Jie Chen, Zhixin Zhang, Junping Zhang, Zhenheng Yang:
UniAPO: Unified Multimodal Automated Prompt Optimization. CoRR abs/2508.17890 (2025)
[i28]Shengqu Cai, Ceyuan Yang, Lvmin Zhang, Yuwei Guo, Junfei Xiao, Ziyan Yang, Yinghao Xu, Zhenheng Yang, Alan L. Yuille, Leonidas J. Guibas, Maneesh Agrawala, Lu Jiang, Gordon Wetzstein:
Mixture of Contexts for Long Video Generation. CoRR abs/2508.21058 (2025)
[i27]Hanyu Wang, Jiaming Han, Ziyan Yang, Qi Zhao, Shanchuan Lin, Xiangyu Yue, Abhinav Shrivastava, Zhenheng Yang, Hao Chen:
Growing Visual Generative Capacity for Pre-Trained MLLMs. CoRR abs/2510.01546 (2025)
[i26]Ruibo Chen, Jiacheng Pan, Heng Huang, Zhenheng Yang:
Improving Text-to-Image Generation with Input-Side Inference-Time Scaling. CoRR abs/2510.12041 (2025)
[i25]Zirui Zhu, Hailun Xu, Yang Luo, Yong Liu, Kanchan Sarkar, Zhenheng Yang, Yang You:
FOCUS: Efficient Keyframe Selection for Long Video Understanding. CoRR abs/2510.27280 (2025)
[i24]Weijia Mao, Hao Chen, Zhenheng Yang, Mike Zheng Shou:
The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation. CoRR abs/2511.20256 (2025)
[i23]Wei Chee Yew, Hailun Xu, Sanjay Saha, Xiaotian Fan, Hiok Hian Ong
, David Yuchen Wang, Kanchan Sarkar, Zhenheng Yang, Danhui Guan:
Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching. CoRR abs/2512.03553 (2025)
[i22]Arman Zarei, Jiacheng Pan, Matthew Gwilliam, Soheil Feizi, Zhenheng Yang:
AgentComp: From Agentic Reasoning to Compositional Mastery in Text-to-Image Models. CoRR abs/2512.09081 (2025)
[i21]Yuwei Guo, Ceyuan Yang, Hao He, Yang Zhao, Meng Wei, Zhenheng Yang, Weilin Huang, Dahua Lin:
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling. CoRR abs/2512.15702 (2025)- 2024
[i20]Kepan Nan, Rui Xie, Penghao Zhou, Tiehan Fan, Zhenheng Yang, Zhijie Chen, Xiang Li
, Jian Yang, Ying Tai:
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation. CoRR abs/2407.02371 (2024)
[i19]Jinheng Xie, Weijia Mao, Zechen Bai, David Junhao Zhang, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Zheng Shou:
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation. CoRR abs/2408.12528 (2024)
[i18]Xiaotian Han, Yiren Jian, Xuefeng Hu, Haogeng Liu, Yiqi Wang, Qihang Fan, Yuang Ai, Huaibo Huang, Ran He, Zhenheng Yang, Quanzeng You:
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning. CoRR abs/2409.12568 (2024)
[i17]Tiehan Fan, Kepan Nan, Rui Xie, Penghao Zhou, Zhenheng Yang, Chaoyou Fu
, Xiang Li
, Jian Yang, Ying Tai:
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption. CoRR abs/2412.09283 (2024)
[i16]Yuqing Wang, Shuhuai Ren, Zhijie Lin, Yujin Han, Haoyuan Guo, Zhenheng Yang, Difan Zou, Jiashi Feng, Xihui Liu
:
Parallelized Autoregressive Visual Generation. CoRR abs/2412.15119 (2024)- 2021
[c15]Qing Liu, Vignesh Ramanathan, Dhruv Mahajan, Alan L. Yuille, Zhenheng Yang:
Weakly Supervised Instance Segmentation for Videos With Temporal Mask Consistency. CVPR 2021: 13968-13978
[i15]Qing Liu, Vignesh Ramanathan, Dhruv Mahajan, Alan L. Yuille, Zhenheng Yang:
Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency. CoRR abs/2103.12886 (2021)- 2020
[j2]Cheng Fu, Huili Chen, Zhenheng Yang, Farinaz Koushanfar
, Yuandong Tian, Jishen Zhao:
Enhancing Model Parallelism in Neural Architecture Search for Multidevice System. IEEE Micro 40(5): 46-55 (2020)
[j1]Chenxu Luo
, Zhenheng Yang
, Peng Wang
, Yang Wang, Wei Xu, Ram Nevatia, Alan L. Yuille
:
Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42(10): 2624-2641 (2020)
[c14]Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, Ram Nevatia:
SPAN: Spatial Pyramid Attention Network for Image Manipulation Localization. ECCV (21) 2020: 312-328
[i14]Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, Ram Nevatia:
SPAN: Spatial Pyramid Attention Network forImage Manipulation Localization. CoRR abs/2009.00726 (2020)
2010 – 2019
- 2019
[c13]Zhenheng Yang, Dhruv Mahajan, Deepti Ghadiyaram, Ram Nevatia, Vignesh Ramanathan:
Activity Driven Weakly Supervised Object Detection. CVPR 2019: 2917-2926
[c12]Yang Wang, Peng Wang, Zhenheng Yang, Chenxu Luo, Yi Yang, Wei Xu:
UnOS: Unified Unsupervised Optical-Flow and Stereo-Depth Estimation by Watching Videos. CVPR 2019: 8071-8081
[i13]Zhenheng Yang, Dhruv Mahajan, Deepti Ghadiyaram, Ram Nevatia, Vignesh Ramanathan:
Activity Driven Weakly Supervised Object Detection. CoRR abs/1904.01665 (2019)- 2018
[c11]Zhenheng Yang, Peng Wang, Wei Xu, Liang Zhao, Ramakant Nevatia:
Unsupervised Learning of Geometry From Videos With Edge-Aware Depth-Normal Consistency. AAAI 2018: 7493-7500
[c10]Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia:
LEGO: Learning Edge With Geometry All at Once by Watching Videos. CVPR 2018: 225-234
[c9]Yang Wang, Yi Yang, Zhenheng Yang, Liang Zhao, Peng Wang, Wei Xu:
Occlusion Aware Unsupervised Learning of Optical Flow. CVPR 2018: 4884-4893
[c8]Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia:
Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding. ECCV Workshops (5) 2018: 691-709
[c7]KangGeon Kim
, Zhenheng Yang, Iacopo Masi, Ramakant Nevatia, Gérard G. Medioni:
Face and Body Association for Video-Based Face Recognition. WACV 2018: 39-48
[i12]Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia:
LEGO: Learning Edge with Geometry all at Once by Watching Videos. CoRR abs/1803.05648 (2018)
[i11]Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia:
Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding. CoRR abs/1806.10556 (2018)
[i10]Yang Wang, Zhenheng Yang, Peng Wang, Yi Yang, Chenxu Luo, Wei Xu:
Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos. CoRR abs/1810.03654 (2018)
[i9]Chenxu Luo, Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia, Alan L. Yuille:
Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding. CoRR abs/1810.06125 (2018)- 2017
[c6]Jiyang Gao, Zhenheng Yang, Ram Nevatia:
Cascaded Boundary Regression for Temporal Action Detection. BMVC 2017
[c5]Jiyang Gao, Zhenheng Yang, Ram Nevatia:
RED: Reinforced Encoder-Decoder Networks for Action Anticipation. BMVC 2017
[c4]Zhenheng Yang, Jiyang Gao, Ram Nevatia:
Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation. BMVC 2017
[c3]Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia:
TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals. ICCV 2017: 3648-3656
[c2]Jiyang Gao, Chen Sun, Zhenheng Yang, Ram Nevatia:
TALL: Temporal Activity Localization via Language Query. ICCV 2017: 5277-5285
[i8]Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia:
TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals. CoRR abs/1703.06189 (2017)
[i7]Jiyang Gao, Zhenheng Yang, Ram Nevatia:
Cascaded Boundary Regression for Temporal Action Detection. CoRR abs/1705.01180 (2017)
[i6]Jiyang Gao, Chen Sun, Zhenheng Yang, Ram Nevatia:
TALL: Temporal Activity Localization via Language Query. CoRR abs/1705.02101 (2017)
[i5]Jiyang Gao, Zhenheng Yang, Ram Nevatia:
RED: Reinforced Encoder-Decoder Networks for Action Anticipation. CoRR abs/1707.04818 (2017)
[i4]Zhenheng Yang, Jiyang Gao, Ram Nevatia:
Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation. CoRR abs/1708.00042 (2017)
[i3]Zhenheng Yang, Peng Wang, Wei Xu, Liang Zhao, Ramakant Nevatia:
Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency. CoRR abs/1711.03665 (2017)
[i2]Yang Wang, Yi Yang, Zhenheng Yang, Liang Zhao, Wei Xu:
Occlusion Aware Unsupervised Learning of Optical Flow. CoRR abs/1711.05890 (2017)- 2016
[c1]Zhenheng Yang, Ramakant Nevatia:
A multi-scale cascade fully convolutional network face detector. ICPR 2016: 633-638
[i1]Zhenheng Yang, Ram Nevatia:
A Multi-Scale Cascade Fully Convolutional Network Face Detector. CoRR abs/1609.03536 (2016)
Coauthor Index
aka: Ram Nevatia

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-02-26 23:24 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







