


default search action
Yihan Wu 0008
Person information
- affiliation: Renmin University of China, Beijing, China
Other persons with the same name
- Yihan Wu — disambiguation page
- Yihan Wu 0001
— ShanghaiTech University, Shanghai, China - Yihan Wu 0002
— Shandong University, Weihai, Shandong, China - Yihan Wu 0003
— State Grid Hunan Electric Power Company Limited Research Institute, China - Yihan Wu 0004
— Beijing Institute of Technology, Beijing, China - Yihan Wu 0005
— Southwest University of Science and Technology, Mianyang, Sichuan, China - Yihan Wu 0006
— Johns Hopkins University, Baltimore, MD, USA - Yihan Wu 0007
— University of Maryland Foundation, College Park, MD, USA - Yihan Wu 0009
— Peking University, Beijing, China
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[c14]Yihan Wu, Yichen Lu
, Yifan Peng, Xihua Wang, Ruihua Song, Shinji Watanabe
:
Enhancing Audiovisual Speech Recognition Through Bifocal Preference Optimization. AAAI 2025: 25516-25524
[c13]Xihua Wang, Ruihua Song, Chongxuan Li, Xin Cheng, Boyuan Li, Yihan Wu, Yuyue Wang, Hongteng Xu, Yunfeng Wang:
Animate and Sound an Image. CVPR 2025: 23369-23378
[c12]Xin Cheng, Xihua Wang, Yihan Wu, Yuyue Wang, Ruihua Song:
LoVA: Long-form Video-to-Audio Generation. ICASSP 2025: 1-5
[c11]Yihan Wu, Yichen Lu, Yijing Chen, Jiaqi Song, William Chen, Ruihua Song, Shinji Watanabe:
GALAXY: A Large-Scale Open-Domain Dataset for Multimodal Learning. INTERSPEECH 2025
[c10]Yuyue Wang, Xin Cheng, Yihan Wu, Xihua Wang, Jinchuan Tian, Ruihua Song:
A Visual Speech Language Model for Visual Text-to-Speech Task. MMAsia 2025: 66:1-66:8
[i15]Xin Cheng, Yuyue Wang, Xihua Wang, Yihan Wu, Kaisi Guan, Yijing Chen, Peng Zhang, Xiaojiang Liu, Meng Cao, Ruihua Song:
VSSFlow: Unifying Video-conditioned Sound and Speech Generation via Joint Learning. CoRR abs/2509.24773 (2025)
[i14]Yuyue Wang, Xin Cheng, Yihan Wu, Xihua Wang, Jinchuan Tian, Ruihua Song:
VSpeechLM: A Visual Speech Language Model for Visual Text-to-Speech Task. CoRR abs/2511.22229 (2025)
[i13]Yijing Chen, Yihan Wu, Kaisi Guan, Yuchen Ren, Yuyue Wang, Ruihua Song, Liyun Ru:
ChronusOmni: Improving Time Awareness of Omni Large Language Models. CoRR abs/2512.09841 (2025)- 2024
[c9]Xihua Wang
, Yuyue Wang
, Yihan Wu
, Ruihua Song
, Xu Tan
, Zehua Chen
, Hongteng Xu
, Guodong Sui
:
TiVA: Time-Aligned Video-to-Audio Generation. ACM Multimedia 2024: 573-582
[c8]Yihan Wu, Yifan Peng, Yichen Lu
, Xuankai Chang, Ruihua Song, Shinji Watanabe
:
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts. SLT 2024: 43-48
[c7]Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-Weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharthi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H. Liu, Bhiksha Raj, Qin Jin, Ruihua Song, Shinji Watanabe
:
ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs For Audio, Music, and Speech. SLT 2024: 562-569
[c6]Yihan Wu
, Ruihua Song
, Xu Chen
, Hao Jiang
, Zhao Cao
, Jin Yu
:
Understanding Human Preferences: Towards More Personalized Video to Text Generation. WWW 2024: 3952-3963
[i12]Yihan Wu, Soumi Maiti, Yifan Peng, Wangyou Zhang, Chenda Li, Yuyue Wang, Xihua Wang, Shinji Watanabe
, Ruihua Song:
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition. CoRR abs/2401.18045 (2024)
[i11]Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen
, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou, Jiaxin Mao, Yankai Lin, Ruihua Song, Jun Xu, Xu Chen, Rui Yan, Zhewei Wei, Di Hu, Wenbing Huang, Ze-Feng Gao
, Yueguo Chen, Weizheng Lu, Ji-Rong Wen:
YuLan: An Open-source Large Language Model. CoRR abs/2406.19853 (2024)
[i10]Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, Shinji Watanabe
:
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts. CoRR abs/2409.12370 (2024)
[i9]Xin Cheng
, Xihua Wang, Yihan Wu, Yuyue Wang, Ruihua Song:
LoVA: Long-form Video-to-Audio Generation. CoRR abs/2409.15157 (2024)
[i8]Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharthi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H. Liu, Bhiksha Raj, Qin Jin, Ruihua Song, Shinji Watanabe
:
ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech. CoRR abs/2409.15897 (2024)
[i7]Yihan Wu, Yichen Lu, Yifan Peng, Xihua Wang, Ruihua Song, Shinji Watanabe
:
Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization. CoRR abs/2412.19005 (2024)- 2023
[c5]Yihan Wu, Junliang Guo, Xu Tan
, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian:
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing. AAAI 2023: 13772-13779
[c4]Zhifang Guo, Yichong Leng, Yihan Wu, Sheng Zhao, Xu Tan
:
Prompttts: Controllable Text-To-Speech With Text Descriptions. ICASSP 2023: 1-5
[c3]Yuyue Wang, Huan Xiao
, Yihan Wu, Ruihua Song:
ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios. INTERSPEECH 2023: 4828-4832
[i6]Yuyue Wang, Huan Xiao, Yihan Wu, Ruihua Song:
ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios. CoRR abs/2305.12200 (2023)- 2022
[c2]Yihan Wu, Xu Tan
, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu:
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios. INTERSPEECH 2022: 2568-2572
[c1]Yihan Wu, Xi Wang, Shaofei Zhang, Lei He, Ruihua Song, Jian-Yun Nie
:
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis. INTERSPEECH 2022: 5503-5507
[i5]Yihan Wu, Xu Tan
, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu:
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios. CoRR abs/2204.00436 (2022)
[i4]Yihan Wu, Xi Wang, Shaofei Zhang, Lei He, Ruihua Song, Jian-Yun Nie
:
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis. CoRR abs/2206.12559 (2022)
[i3]Zhifang Guo, Yichong Leng, Yihan Wu, Sheng Zhao, Xu Tan
:
PromptTTS: Controllable Text-to-Speech with Text Descriptions. CoRR abs/2211.12171 (2022)
[i2]Yihan Wu, Junliang Guo, Xu Tan
, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian:
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing. CoRR abs/2211.16934 (2022)
[i1]Zehua Chen, Yihan Wu, Yichong Leng, Jiawei Chen, Haohe Liu, Xu Tan
, Yang Cui, Ke Wang, Lei He, Sheng Zhao, Jiang Bian, Danilo P. Mandic:
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech. CoRR abs/2212.14518 (2022)
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-02-07 00:05 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







