


default search action
Shan Yang 0001
Person information
- affiliation: Tencent AI Lab, Beijing, China
- affiliation (PhD): Northwestern Polytechnical University, School of Computer Science, Xi'an, China
Other persons with the same name
- Shan Yang — disambiguation page
- Shan Yang 0002
— Zhejiang University of Finance and Economics, School of Information Technology and Artificial Intelligence, Hangzhou, China
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[c33]Yuanyuan Wang, Hangting Chen, Dongchao Yang, Weiqin Li, Dan Luo, Guangzhi Li, Shan Yang, Zhiyong Wu, Helen Meng, Xixin Wu:
UniSep: Universal Target Audio Separation with Language Models at Scale. ICME 2025: 1-6
[c32]Yong Ren, Chenxing Li, Le Xu, Hao Gu, Duzhen Zhang, Yujie Chen, Manjie Xu, Ruibo Fu, Shan Yang, Dong Yu:
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model. INTERSPEECH 2025
[c31]Le Xu, Chenxing Li, Yong Ren, Yujie Chen, Yu Gu, Ruibo Fu, Shan Yang, Dong Yu:
Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning. INTERSPEECH 2025
[c30]Guanjie Huang, Danny H. K. Tsang, Shan Yang, Guangzhi Lei, Li Liu:
Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition. ACM Multimedia 2025: 8313-8321
[c29]Yan Rong, Jinting Wang, Guangzhi Lei, Shan Yang, Li Liu:
AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation. ACM Multimedia 2025: 8872-8881
[i29]Hanzhao Li, Yuke Li, Xinsheng Wang, Jingbin Hu, Qicong Xie, Shan Yang, Lei Xie:
FleSpeech: Flexibly Controllable Speech Generation with Various Prompts. CoRR abs/2501.04644 (2025)
[i28]Yuanyuan Wang, Hangting Chen, Dongchao Yang, Weiqin Li, Dan Luo, Guangzhi Li, Shan Yang, Zhiyong Wu, Helen Meng, Xixin Wu:
UniSep: Universal Target Audio Separation with Language Models at Scale. CoRR abs/2503.23762 (2025)
[i27]Yong Ren, Chenxing Li, Le Xu, Hao Gu, Duzhen Zhang, Yujie Chen, Manjie Xu, Ruibo Fu, Shan Yang, Dong Yu:
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model. CoRR abs/2505.13062 (2025)
[i26]Le Xu, Chenxing Li, Yong Ren, Yujie Chen, Yu Gu, Ruibo Fu, Shan Yang, Dong Yu:
Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning. CoRR abs/2505.22045 (2025)
[i25]Yan Rong, Jinting Wang, Shan Yang, Guangzhi Lei, Li Liu:
AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation. CoRR abs/2505.22053 (2025)
[i24]Jinting Wang, Shan Yang, Li Liu:
UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation. CoRR abs/2506.04134 (2025)
[i23]Guanjie Huang, Danny H. K. Tsang, Shan Yang, Guangzhi Lei, Li Liu:
Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition. CoRR abs/2508.00391 (2025)
[i22]Tianxin Xie, Shan Yang, Chenxing Li, Dong Yu, Li Liu:
EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering. CoRR abs/2508.03543 (2025)
[i21]Tianxin Xie, Wentao Lei, Guanjie Huang, Pengfei Zhang, Kai Jiang, Chunhui Zhang, Fengji Ma, Haoyu He, Han Zhang, Jiangshan He, Jinting Wang, Linghan Fang, Lufei Gao, Orkesh Ablet, Peihua Zhang, Ruolin Hu, Shengyu Li, Weilin Lin, Xiaoyang Feng, Xinyue Yang, Yan Rong, Yanyun Wang, Zihang Shao, Zelin Zhao, Chenxing Li, Shan Yang, Wenfu Wang, Meng Yu, Dong Yu, Li Liu:
PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation. CoRR abs/2512.23994 (2025)- 2023
[c28]Yi Lei, Shan Yang, Xinsheng Wang, Qicong Xie, Jixun Yao, Lei Xie, Dan Su:
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis. AAAI 2023: 13025-13033
[c27]Wei Xiao, Wenzhe Liu, Meng Wang, Shan Yang, Yupeng Shi, Yuyong Kang, Dan Su, Shidong Shang, Dong Yu:
Multi-mode Neural Speech Coding Based on Deep Generative Networks. INTERSPEECH 2023: 819-823
[i20]Wenzhe Liu, Wei Xiao, Meng Wang, Shan Yang, Yupeng Shi, Yuyong Kang, Dan Su, Shidong Shang, Dong Yu:
A High Fidelity and Low Complexity Neural Audio Coding. CoRR abs/2310.10992 (2023)- 2022
[j7]Yi Lei
, Shan Yang, Xinfa Zhu, Lei Xie
, Dan Su:
Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis. IEEE Signal Process. Lett. 29: 1948-1952 (2022)
[j6]Yi Lei, Shan Yang, Xinsheng Wang
, Lei Xie
:
MsEmoTTS: Multi-Scale Emotion Transfer, Prediction, and Control for Emotional Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 30: 853-864 (2022)
[c26]Songxiang Liu, Shan Yang, Dan Su, Dong Yu:
Referee: Towards Reference-Free Cross-Speaker Style Transfer with Low-Quality Data for Expressive Speech Synthesis. ICASSP 2022: 6307-6311
[c25]Disong Wang, Shan Yang, Dan Su, Xunying Liu, Dong Yu, Helen Meng:
VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion. ICASSP 2022: 7252-7256
[c24]Liumeng Xue, Shan Yang, Na Hu, Dan Su, Lei Xie:
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers. INTERSPEECH 2022: 2548-2552
[c23]Yi Lei, Shan Yang, Jian Cong, Lei Xie, Dan Su:
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion. INTERSPEECH 2022: 2563-2567
[c22]Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su:
End-to-End Voice Conversion with Information Perturbation. ISCSLP 2022: 91-95
[i19]Yi Lei, Shan Yang, Xinsheng Wang, Lei Xie:
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis. CoRR abs/2201.06460 (2022)
[i18]Disong Wang, Shan Yang, Dan Su, Xunying Liu, Dong Yu, Helen Meng:
VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion. CoRR abs/2202.09081 (2022)
[i17]Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su:
End-to-End Voice Conversion with Information Perturbation. CoRR abs/2206.07569 (2022)
[i16]Liumeng Xue, Shan Yang, Na Hu, Dan Su, Lei Xie:
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers. CoRR abs/2207.00756 (2022)
[i15]Yi Lei, Shan Yang, Jian Cong, Lei Xie, Dan Su:
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion. CoRR abs/2207.01832 (2022)
[i14]Yi Lei, Shan Yang, Xinsheng Wang, Qicong Xie, Jixun Yao, Lei Xie, Dan Su:
UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis. CoRR abs/2212.01546 (2022)- 2021
[j5]Xiaochun An
, Frank K. Soong
, Shan Yang, Lei Xie:
Effective and direct control of neural TTS prosody by removing interactions between different attributes. Neural Networks 143: 250-260 (2021)
[c21]Yi Chen, Shan Yang, Na Hu, Lei Xie, Dan Su:
TeNC: Low Bit-Rate Speech Coding with VQ-VAE and GAN. ICMI Companion 2021: 126-130
[c20]Jian Cong, Shan Yang, Lei Xie, Dan Su:
Glow-WaveGAN: Learning Speech Representations from GAN-Based Variational Auto-Encoder for High Fidelity Flow-Based Speech Synthesis. Interspeech 2021: 2182-2186
[c19]Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su:
Controllable Context-Aware Conversational Speech Synthesis. Interspeech 2021: 4658-4662
[c18]Tao Li, Shan Yang, Liumeng Xue, Lei Xie:
Controllable Emotion Transfer For End-to-End Speech Synthesis. ISCSLP 2021: 1-5
[c17]Zhichao Wang, Wenshuo Ge, Xiong Wang, Shan Yang, Wendong Gan, Haitao Chen, Hai Li, Lei Xie, Xiulin Li:
Accent and Speaker Disentanglement in Many-to-many Voice Conversion. ISCSLP 2021: 1-5
[c16]Yi Lei, Shan Yang, Lei Xie:
Fine-Grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis. SLT 2021: 423-430
[c15]Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie:
Multi-Band Melgan: Faster Waveform Generation For High-Quality Text-To-Speech. SLT 2021: 492-498
[c14]Heyang Xue, Shan Yang, Yi Lei, Lei Xie, Xiulin Li:
Learn2Sing: Target Speaker Singing Voice Synthesis by Learning from a Singing Teacher. SLT 2021: 522-529
[i13]Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su:
Controllable Context-aware Conversational Speech Synthesis. CoRR abs/2106.10828 (2021)
[i12]Jian Cong, Shan Yang, Lei Xie, Dan Su:
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis. CoRR abs/2106.10831 (2021)
[i11]Songxiang Liu, Shan Yang, Dan Su, Dong Yu:
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis. CoRR abs/2109.03439 (2021)- 2020
[j4]Shan Yang, Heng Lu, Shiyin Kang, Liumeng Xue
, Jinba Xiao, Dan Su, Lei Xie, Dong Yu:
On the localness modeling for the self-attention based end-to-end speech synthesis. Neural Networks 125: 121-130 (2020)
[j3]Shan Yang
, Yuxuan Wang, Lei Xie
:
Adversarial Feature Learning and Unsupervised Clustering Based Speech Synthesis for Found Data With Acoustic and Textual Noise. IEEE Signal Process. Lett. 27: 1730-1734 (2020)
[c13]Xiaohai Tian, Zhichao Wang, Shan Yang, Xinyong Zhou, Hongqiang Du, Yi Zhou, Mingyang Zhang, Kun Zhou, Berrak Sisman, Lei Xie, Haizhou Li:
The NUS & NWPU system for Voice Conversion Challenge 2020. Blizzard Challenge / Voice Conversion Challenge 2020
[c12]Jian Cong, Shan Yang, Lei Xie, Guoqiao Yu, Guanglu Wan:
Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training. INTERSPEECH 2020: 811-815
[c11]Fengyu Yang, Shan Yang, Qinghua Wu, Yujun Wang, Lei Xie:
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis. INTERSPEECH 2020: 3436-3440
[i10]Shan Yang, Yuxuan Wang, Lei Xie:
Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise. CoRR abs/2004.13595 (2020)
[i9]Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie:
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech. CoRR abs/2005.05106 (2020)
[i8]Fengyu Yang, Shan Yang, Qinghua Wu, Yujun Wang, Lei Xie:
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis. CoRR abs/2008.00613 (2020)
[i7]Jian Cong, Shan Yang, Lei Xie, Guoqiao Yu, Guanglu Wan:
Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training. CoRR abs/2008.04265 (2020)
[i6]Heyang Xue, Shan Yang, Yi Lei, Lei Xie, Xiulin Li:
Learn2Sing: Target Speaker Singing Voice Synthesis by learning from a Singing Teacher. CoRR abs/2011.08467 (2020)
[i5]Yi Lei, Shan Yang, Lei Xie:
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis. CoRR abs/2011.08477 (2020)
[i4]Zhichao Wang, Wenshuo Ge, Xiong Wang, Shan Yang, Wendong Gan, Haitao Chen, Hai Li, Lei Xie, Xiulin Li:
Accent and Speaker Disentanglement in Many-to-many Voice Conversion. CoRR abs/2011.08609 (2020)
[i3]Tao Li, Shan Yang, Liumeng Xue, Lei Xie:
Controllable Emotion Transfer For End-to-End Speech Synthesis. CoRR abs/2011.08679 (2020)
[i2]Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu:
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training. CoRR abs/2012.01837 (2020)
2010 – 2019
- 2019
[j2]Xiaolian Zhu
, Yuchao Zhang, Shan Yang, Liumeng Xue, Lei Xie:
Pre-Alignment Guided Attention for Improving Training Efficiency and Model Stability in End-to-End Speech Synthesis. IEEE Access 7: 65955-65964 (2019)
[c10]Xiaochun An, Yuxuan Wang, Shan Yang, Zejun Ma, Lei Xie:
Learning Hierarchical Representations for Expressive Speaking Style in End-to-End Speech Synthesis. ASRU 2019: 184-191
[c9]Xiaolian Zhu
, Shan Yang, Geng Yang, Lei Xie:
Controlling Emotion Strength with Relative Attribute for End-to-End Speech Synthesis. ASRU 2019: 192-199
[c8]Fengyu Yang, Shan Yang, Pengcheng Zhu, Pengju Yan, Lei Xie:
Improving Mandarin End-to-End Speech Synthesis by Self-Attention and Learnable Gaussian Bias. ASRU 2019: 208-213
[c7]Shan Yang, Wenshuo Ge, Fengyu Yang, Xinyong Zhou, Fanbo Meng, Kai Liu, Lei Xie:
SZ-NPU Team's Entry to Blizzard Challenge 2019. Blizzard Challenge 2019
[c6]Shan Yang, Heng Lu, Shiying Kang, Lei Xie, Dong Yu:
Enhancing Hybrid Self-attention Structure with Relative-position-aware Bias for Speech Synthesis. ICASSP 2019: 6910-6914- 2018
[c5]Jinba Xiao, Shan Yang, Mingyang Zhang, Berrak Sisman, Dongyan Huang, Lei Xie, Minghui Dong, Haizhou Li:
The I2R-NWPU-NUS Text-to-Speech System for Blizzard Challenge 2018. Blizzard Challenge 2018- 2017
[c4]Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, Haizhou Li
:
Statistical parametric speech synthesis using generative adversarial networks under a multi-task learning framework. ASRU 2017: 685-691
[c3]Yanfeng Lu, Zhengchen Zhang, Chenyu Yang, Huaiping Ming, Xiaolian Zhu, Yuchao Zhang, Shan Yang, Dongyan Huang, Lei Xie, Minghui Dong:
The I2R-NWPU Text-to-Speech System for Blizzard Challenge 2017. Blizzard Challenge 2017
[i1]Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dongyan Huang, Haizhou Li:
Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework. CoRR abs/1707.01670 (2017)- 2016
[j1]Bo Fan, Lei Xie, Shan Yang, Lijuan Wang, Frank K. Soong:
A deep bidirectional LSTM approach for video-realistic talking head. Multim. Tools Appl. 75(9): 5287-5309 (2016)
[c2]Shan Yang, Zhizheng Wu, Lei Xie:
On the training of DNN-based average voice model for speech synthesis. APSIPA 2016: 1-6
[c1]Zhengchen Zhang, Mei Li, Yuchao Zhang, Weini Zhang, Yang Liu, Shan Yang, Yanfeng Lu, Van Tung Pham, Lei Xie, Minghui Dong:
The I2R-NWPU-NTU Text-to-Speech System at Blizzard Challenge 2016. Blizzard Challenge 2016
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-01-30 23:14 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







