


default search action
Ziyang Ma 0001
Person information
- affiliation: Shanghai Jiao Tong University, Department of Computer Science and Engineering, AI Institute, MoE Key Lab of Artificial Intelligence, Shanghai, China
- affiliation (until 2022): Shandong University, School of Computer Science and Technology, Shandong, China
Other persons with the same name
- Ziyang Ma — disambiguation page
- Ziyang Ma 0002 — Tencent, China (and 2 more)
- Ziyang Ma 0003
— Southeast University, School of Computer Science and Engineering, Nanjing, China (and 1 more)
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[c39]Tao Liu, Ziyang Ma, Qi Chen, Feilong Chen, Shuai Fan, Xie Chen, Kai Yu:
VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization. AAAI 2025: 5586-5594
[c38]Ziyang Ma, Yakun Song, Chenpeng Du, Jian Cong, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen:
Language Model Can Listen While Speaking. AAAI 2025: 24831-24839
[c37]Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen:
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration. AAAI 2025: 24840-24848
[c36]Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Xie Chen:
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering. AAAI 2025: 25174-25182
[c35]Ziyang Ma, Xiquan Li, Yakun Song, Wenxi Chen, Chenpeng Du, Jian Wu, Yuanzhe Chen, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen:
Towards Reliable Large Audio Language Model. ACL (Findings) 2025: 1000-1014
[c34]Wenxi Chen, Ziyang Ma, Ruiqi Yan, Yuzhe Liang, Xiquan Li, Ruiyang Xu, Zhikang Niu, Yanqiao Zhu, Yifan Yang, Zhanxun Liu, Kai Yu, Yuxuan Hu, Jinyu Li, Yan Lu, Shujie Liu, Xie Chen:
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training. ACL (Findings) 2025: 2262-2282
[c33]Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen:
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement. ACL (1) 2025: 2673-2686
[c32]Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu, Xie Chen:
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching. ACL (1) 2025: 6255-6271
[c31]Yexing Du, Youcheng Pan, Ziyang Ma, Bo Yang, Yifan Yang, Keqi Deng, Xie Chen, Yang Xiang, Ming Liu, Bing Qin:
Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning. ACL (1) 2025: 12466-12478
[c30]Wenxi Chen, Ziyang Ma, Xiquan Li, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Kai Yu, Xie Chen:
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs. ICASSP 2025: 1-5
[c29]Xiquan Li, Wenxi Chen, Ziyang Ma, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Qiuqiang Kong, Xie Chen:
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning. ICASSP 2025: 1-5
[c28]Guanrou Yang, Fan Yu, Ziyang Ma, Zhihao Du, Zhifu Gao, Shiliang Zhang, Xie Chen:
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap. ICASSP 2025: 1-5
[i64]Haina Zhu, Yizhi Zhou, Hangting Chen, Jianwei Yu, Ziyang Ma, Rongzhi Gu, Yi Luo, Wei Tan, Xie Chen:
MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization. CoRR abs/2501.01108 (2025)
[i63]Ziyang Ma, Zhuo Chen, Yuping Wang, Eng Siong Chng, Xie Chen:
Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model. CoRR abs/2501.07246 (2025)
[i62]Ruiqi Yan, Xiquan Li, Wenxi Chen, Zhikang Niu, Chen Yang, Ziyang Ma, Kai Yu, Xie Chen:
URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models. CoRR abs/2502.17810 (2025)
[i61]Xinsheng Wang, Mingqi Jiang, Ziyang Ma, Ziyu Zhang, Songxiang Liu, Linqin Li, Zheng Liang, Qixi Zheng, Rui Wang, Xiaoqin Feng, Weizhen Bian, Zhen Ye, Sitong Cheng, Ruibin Yuan, Zhixian Zhao, Xinfa Zhu, Jiahao Pan, Liumeng Xue, Pengcheng Zhu, Yunlin Chen, Zhifei Li, Xie Chen, Lei Xie, Yike Guo
, Wei Xue:
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens. CoRR abs/2503.01710 (2025)
[i60]Ruibin Yuan, Hanfeng Lin, Shuyue Guo, Ge Zhang, Jiahao Pan, Yongyi Zang, Haohe Liu, Yiming Liang, Wenye Ma, Xingjian Du, Xinrun Du, Zhen Ye, Tianyu Zheng, Yinghao Ma, Minghao Liu, Zeyue Tian, Ziya Zhou, Liumeng Xue, Xingwei Qu, Yizhi Li, Shangda Wu, Tianhao Shen, Ziyang Ma, Jun Zhan, Chunhui Wang, Yatian Wang, Xiaowei Chi, Xinyue Zhang, Zhenzhu Yang, Xiangzhou Wang, Shansong Liu, Lingrui Mei, Peng Li, Junjie Wang, Jianwei Yu, Guojian Pang, Xu Li, Zihao Wang, Xiaohuan Zhou, Lijun Yu, Emmanouil Benetos, Yong Chen, Chenghua Lin, Xie Chen, Gus Xia, Zhaoxiang Zhang, Chao Zhang, Wenhu Chen, Xinyu Zhou, Xipeng Qiu, Roger B. Dannenberg, Zheng-Jia Liu, Jian Yang, Wenhao Huang, Wei Xue, Xu Tan, Yike Guo:
YuE: Scaling Open Foundation Models for Long-Form Music Generation. CoRR abs/2503.08638 (2025)
[i59]Guanrou Yang, Chen Yang, Qian Chen, Ziyang Ma, Wenxi Chen, Wen Wang, Tianrui Wang, Yifan Yang, Zhikang Niu, Wenrui Liu, Fan Yu, Zhihao Du, Zhifu Gao, Shiliang Zhang, Xie Chen:
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting. CoRR abs/2504.12867 (2025)
[i58]Zheng Lian, Rui Liu
, Kele Xu, Bin Liu, Xuefei Liu, Yazhou Zhang, Xin Liu, Yong Li, Zebang Cheng, Haolin Zuo, Ziyang Ma, Xiaojiang Peng, Xie Chen, Ya Li, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao:
MER 2025: When Affective Computing Meets Large Language Models. CoRR abs/2504.19423 (2025)
[i57]Pengchao Feng, Ziyang Ma, Wenxi Chen, Yao Li, Sheng Wang, Kai Yu, Xie Chen:
Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation. CoRR abs/2505.00028 (2025)
[i56]Ziyang Ma, Yinghao Ma, Yanqiao Zhu, Chen Yang, Yi-Wen Chao, Ruiyang Xu, Wenxi Chen, Yuanzhe Chen, Zhuo Chen, Jian Cong, Kai Li, Keliang Li, Siyou Li, Xinfeng Li, Xiquan Li, Zheng Lian, Yuzhe Liang, Minghao Liu, Zhikang Niu, Tianrui Wang, Yuping Wang, Yuxuan Wang, Yihao Wu, Guanrou Yang, Jianwei Yu, Ruibin Yuan, Zhisheng Zheng, Ziya Zhou, Haina Zhu, Wei Xue, Emmanouil Benetos, Kai Yu, Chng Eng Siong, Xie Chen:
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix. CoRR abs/2505.13032 (2025)
[i55]Kai Li, Can Shen, Yile Liu, Jirui Han, Kelong Zheng, Xuechao Zou, Zhe Wang, Xingjian Du, Shun Zhang, Hanjun Luo, Yingbin Jin, Xinxin Xing, Ziyang Ma, Yue Liu, Xiaojun Jia, Yifan Zhang, Junfeng Fang, Kun Wang, Yibo Yan, Haoyang Li, Yiming Li, Xiaobin Zhuang, Yang Liu, Haibo Hu, Zhuo Chen, Zhizheng Wu, Xiaolin Hu, Eng-Siong Chng, XiaoFeng Wang, Wenyuan Xu, Wei Dong, Xinfeng Li:
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models. CoRR abs/2505.16211 (2025)
[i54]Ziyang Ma, Xiquan Li, Yakun Song, Wenxi Chen, Chenpeng Du, Jian Wu, Yuanzhe Chen, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen:
Towards Reliable Large Audio Language Model. CoRR abs/2505.19294 (2025)
[i53]Qixi Zheng, Yushen Chen, Zhikang Niu, Ziyang Ma, Xiaofei Wang, Kai Yu, Xie Chen:
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling. CoRR abs/2505.19931 (2025)
[i52]Yakun Song, Jiawei Chen, Xiaobin Zhuang, Chenpeng Du, Ziyang Ma, Jian Wu, Jian Cong, Dongya Jia, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen:
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation. CoRR abs/2506.00385 (2025)
[i51]Yizhou Peng, Bin Wang, Yi-Wen Chao, Ziyang Ma, Haoyang Zhang, Hexin Liu, Xie Chen, Eng Siong Chng:
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025. CoRR abs/2506.13339 (2025)
[i50]Zheng Lian, Licai Sun, Haoyu Chen, Zebang Cheng, Fan Zhang, Ziyu Jia, Ziyang Ma, Fei Ma, Xiaojiang Peng, Jianhua Tao:
EMER-Ranker: Learning to Rank Emotion Descriptions in the Absence of Ground Truth. CoRR abs/2507.04278 (2025)
[i49]Jin Xu, Zhifang Guo, Hangrui Hu, Yunfei Chu, Xiong Wang, Jinzheng He, Yuxuan Wang, Xian Shi, Ting He, Xinfa Zhu, Yuanjun Lv, Yongqi Wang, Dake Guo, He Wang, Linhan Ma, Pei Zhang, Xinyu Zhang, Hongkun Hao, Zishan Guo, Baosong Yang, Bin Zhang, Ziyang Ma, Xipin Wei, Shuai Bai, Keqin Chen, Xuejing Liu, Peng Wang, Mingkun Yang, Dayiheng Liu, Xingzhang Ren, Bo Zheng, Rui Men, Fan Zhou, Bowen Yu, Jianxin Yang, Le Yu, Jingren Zhou, Junyang Lin:
Qwen3-Omni Technical Report. CoRR abs/2509.17765 (2025)
[i48]Junyu Wang, Ziyang Ma, Zhengding Luo, Tianrui Wang, Meng Ge, Xiaobao Wang, Longbiao Wang:
Pay More Attention To Audio: Mitigating Imbalance of Cross-Modal Attention in Large Audio Language Models. CoRR abs/2509.18816 (2025)
[i47]Tianrui Wang, Haoyu Wang, Meng Ge, Cheng Gong, Chunyu Qiang, Ziyang Ma, Zikang Huang, Guanrou Yang, Xiaobao Wang, Eng Siong Chng, Xie Chen, Longbiao Wang, Jianwu Dang:
Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis. CoRR abs/2509.24629 (2025)- 2024
[j2]Zheng Liang
, Ziyang Ma, Chenpeng Du
, Kai Yu
, Xie Chen
:
E$^{3}$TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4810-4821 (2024)
[j1]Xuenan Xu
, Ziyang Ma, Mengyue Wu
, Kai Yu
:
Towards Weakly Supervised Text-to-Audio Grounding. IEEE Trans. Multim. 26: 11126-11138 (2024)
[c27]Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Liumeng Xue, Ziyang Ma, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Jie Fu, Emmanouil Benetos, Gus Xia, Roger B. Dannenberg, Wei Xue, Shiyin Kang, Yike Guo
:
ChatMusician: Understanding and Generating Music Intrinsically with LLM. ACL (Findings) 2024: 6252-6271
[c26]Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen:
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation. ACL (Findings) 2024: 15747-15760
[c25]Fan Yu, Haoxu Wang, Ziyang Ma, Shiliang Zhang:
Hourglass-AVSR: Down-Up Sampling-Based Computational Efficiency Model for Audio-Visual Speech Recognition. ICASSP 2024: 7940-7944
[c24]Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen:
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS. ICASSP 2024: 10401-10405
[c23]Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu:
VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching. ICASSP 2024: 11121-11125
[c22]Ziyang Ma, Wen Wu
, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen:
Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition. ICASSP 2024: 11146-11150
[c21]Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath:
BAT: Learning to Reason about Spatial Sounds with Large Language Models. ICML 2024
[c20]Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen:
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer. IJCAI 2024: 3807-3815
[c19]Ziyang Ma, Mingjie Chen, Hezhao Zhang, Zhisheng Zheng, Wenxi Chen, Xiquan Li, Jiaxin Ye, Xie Chen, Thomas Hain
:
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark. INTERSPEECH 2024
[c18]Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Guanrou Yang, Xie Chen:
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers. INTERSPEECH 2024
[c17]Zheshu Song, Jianheng Zhuo, Yifan Yang, Ziyang Ma, Shixiong Zhang, Xie Chen:
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR. INTERSPEECH 2024
[c16]Guanrou Yang, Ziyang Ma, Fan Yu, Zhifu Gao, Shiliang Zhang, Xie Chen:
MaLa-ASR: Multimedia-Assisted LLM-Based ASR. INTERSPEECH 2024
[c15]Zhengshun Xia, Ziyang Ma, Zhisheng Zheng, Xie Chen:
Improving Emotion Recognition with Pre-Trained Models, Multimodality, and Contextual Information. ISCSLP 2024: 636-640
[c14]Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Xu Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu:
The X-Lance Technical Report for Interspeech 2024 Speech Processing using Discrete Speech Unit Challenge. ISCSLP 2024: 641-645
[c13]Zheng Lian
, Haiyang Sun
, Licai Sun
, Zhuofan Wen, Siyuan Zhang
, Shun Chen
, Hao Gu
, Jinming Zhao
, Ziyang Ma, Xie Chen
, Jiangyan Yi
, Rui Liu
, Kele Xu
, Bin Liu
, Erik Cambria
, Guoying Zhao
, Björn W. Schuller
, Jianhua Tao
:
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition. MRAC@MM 2024: 41-48
[c12]Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu
, Ziyang Ma, Peter Bell, Catherine Lai, Joshua D. Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain
:
1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem. Odyssey 2024: 260-265
[c11]Guanrou Yang, Ziyang Ma, Zhifu Gao, Shiliang Zhang, Xie Chen:
CTC-Assisted LLM-Based Contextual ASR. SLT 2024: 126-131
[c10]Zhikang Niu, Sanyuan Chen, Long Zhou, Ziyang Ma, Xie Chen, Shujie Liu:
NDVQ: Robust Neural Audio Codec With Normal Distribution-Based Vector Quantization. SLT 2024: 705-710
[i46]Xuenan Xu, Ziyang Ma, Mengyue Wu, Kai Yu:
Towards Weakly Supervised Text-to-Audio Grounding. CoRR abs/2401.02584 (2024)
[i45]Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen:
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer. CoRR abs/2401.03497 (2024)
[i44]Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Xie Chen:
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering. CoRR abs/2401.07333 (2024)
[i43]Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath:
BAT: Learning to Reason about Spatial Sounds with Large Language Models. CoRR abs/2402.01591 (2024)
[i42]Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen:
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity. CoRR abs/2402.08846 (2024)
[i41]Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen
, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia
, Roger B. Dannenberg, Wei Xue, Shiyin Kang, Yike Guo
:
ChatMusician: Understanding and Generating Music Intrinsically with LLM. CoRR abs/2402.16153 (2024)
[i40]Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Binhang Yuan, Wenhu Chen, Jie Fu, Ge Zhang:
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model. CoRR abs/2404.04167 (2024)
[i39]Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu:
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge. CoRR abs/2404.06079 (2024)
[i38]Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia
, Emmanouil Benetos, Xiang Yue
, Chenghua Lin, Xu Tan, Stephen W. Huang, Wenhu Chen, Jie Fu, Ge Zhang:
MuPT: A Generative Symbolic Music Pretrained Transformer. CoRR abs/2404.06393 (2024)
[i37]Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu
, Kele Xu
, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao:
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition. CoRR abs/2404.17113 (2024)
[i36]Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Y. Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kaijing Ma, Minghao Liu, Morry Niu, Noah Wang, Quehry Que, Ruibo Liu, Sine Liu, Shawn Guo, Soren Gao, Wangchunshu Zhou, Xinyue Zhang, Yizhi Zhou, Yubo Wang, Yuelin Bai, Yuhan Zhang, Yuxiang Zhang, Zenith Wang, Zhenzhu Yang, Zijian Zhao, Jiajun Zhang, Wanli Ouyang
, Wenhao Huang, Wenhu Chen:
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series. CoRR abs/2405.19327 (2024)
[i35]Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua D. Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain
:
1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem. CoRR abs/2405.20064 (2024)
[i34]Guanrou Yang, Ziyang Ma, Fan Yu, Zhifu Gao, Shiliang Zhang, Xie Chen:
MaLa-ASR: Multimedia-Assisted LLM-Based ASR. CoRR abs/2406.05839 (2024)
[i33]Zheshu Song, Jianheng Zhuo, Yifan Yang, Ziyang Ma, Shixiong Zhang, Xie Chen:
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR. CoRR abs/2406.06619 (2024)
[i32]Ziyang Ma, Mingjie Chen, Hezhao Zhang, Zhisheng Zheng, Wenxi Chen, Xiquan Li, Jiaxin Ye, Xie Chen, Thomas Hain:
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark. CoRR abs/2406.07162 (2024)
[i31]Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen:
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement. CoRR abs/2406.11546 (2024)
[i30]Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Guanrou Yang, Xie Chen:
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers. CoRR abs/2406.15752 (2024)
[i29]Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang, Zhangyu Xiao, Zhijie Yan, Yexin Yang, Bin Zhang, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Siqi Zheng:
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs. CoRR abs/2407.04051 (2024)
[i28]Zhihao Du, Qian Chen, Shiliang Zhang, Kai Hu, Heng Lu, Yexin Yang, Hangrui Hu, Siqi Zheng, Yue Gu, Ziyang Ma, Zhifu Gao, Zhijie Yan:
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens. CoRR abs/2407.05407 (2024)
[i27]Ziyang Ma, Yakun Song, Chenpeng Du, Jian Cong, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen:
Language Model Can Listen While Speaking. CoRR abs/2408.02622 (2024)
[i26]Yinghao Ma, Anders Øland, Anton Ragni, Bleiz Macsen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia
, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot
, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma
, Qiuqiang Kong, Roger B. Dannenberg, Ruibin Yuan, Shangda Wu, Shih-Lun Wu, Shuqi Dai, Shun Lei, Shiyin Kang, Simon Dixon, Wenhu Chen, Wenhao Huang, Xingjian Du
, Xingwei Qu, Xu Tan, Yizhi Li, Zeyue Tian, Zhiyong Wu, Zhizheng Wu, Ziyang Ma, Ziyu Wang:
Foundation Models for Music: A Survey. CoRR abs/2408.14340 (2024)
[i25]Tianrui Wang, Jin Li, Ziyang Ma, Rui Cao, Xie Chen, Longbiao Wang, Meng Ge, Xiaobao Wang, Yuguang Wang, Jianwu Dang, Nyima Tashi:
Progressive Residual Extraction based Pre-training for Speech Representation Learning. CoRR abs/2409.00387 (2024)
[i24]Zhikang Niu, Sanyuan Chen, Long Zhou, Ziyang Ma, Xie Chen, Shujie Liu:
NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization. CoRR abs/2409.12717 (2024)
[i23]Yexing Du, Ziyang Ma, Yifan Yang, Keqi Deng, Xie Chen, Bo Yang, Yang Xiang, Ming Liu, Bing Qin
:
CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought. CoRR abs/2409.19510 (2024)
[i22]Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu, Xie Chen:
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching. CoRR abs/2410.06885 (2024)
[i21]Xiquan Li, Wenxi Chen, Ziyang Ma, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Qiuqiang Kong, Xie Chen:
DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning. CoRR abs/2410.09472 (2024)
[i20]Wenxi Chen, Ziyang Ma, Xiquan Li, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Kai Yu, Xie Chen:
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs. CoRR abs/2410.09503 (2024)
[i19]Guanrou Yang, Fan Yu, Ziyang Ma, Zhihao Du, Zhifu Gao, Shiliang Zhang, Xie Chen:
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap. CoRR abs/2410.16726 (2024)
[i18]Guanrou Yang, Ziyang Ma, Zhifu Gao, Shiliang Zhang, Xie Chen:
CTC-Assisted LLM-Based Contextual ASR. CoRR abs/2411.06437 (2024)
[i17]Tao Liu, Ziyang Ma, Qi Chen, Feilong Chen, Shuai Fan, Xie Chen, Kai Yu:
VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization. CoRR abs/2412.09892 (2024)- 2023
[c9]Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang:
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition. ASRU 2023: 1-6
[c8]Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen:
Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning. ASRU 2023: 1-7
[c7]Qi Chen, Ziyang Ma, Tao Liu, Xu Tan
, Qu Lu, Kai Yu, Xie Chen:
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation. ICASSP 2023: 1-5
[c6]Xie Chen, Ziyang Ma, Changli Tang, Yujin Wang, Zhisheng Zheng:
Front-End Adapter: Adapting Front-End Input of Speech Based Self-Supervised Learning for Speech Recognition. ICASSP 2023: 1-5
[c5]Ziyang Ma, Zhisheng Zheng, Changli Tang, Yujin Wang, Xie Chen:
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets. INTERSPEECH 2023: 82-86
[c4]Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen:
Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation. INTERSPEECH 2023: 919-923
[c3]Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang, Chao Zhang, Xie Chen:
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation. INTERSPEECH 2023: 1269-1273
[c2]Zhisheng Zheng, Ziyang Ma, Yu Wang, Xie Chen:
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition. INTERSPEECH 2023: 3307-3311
[i16]Xie Chen, Ziyang Ma, Changli Tang, Yujin Wang, Zhisheng Zheng:
Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition. CoRR abs/2302.09331 (2023)
[i15]Qi Chen, Ziyang Ma, Tao Liu, Xu Tan
, Qu Lu, Xie Chen, Kai Yu:
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation. CoRR abs/2303.05322 (2023)
[i14]Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen:
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation. CoRR abs/2306.08588 (2023)
[i13]Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang, Chao Zhang, Xie Chen:
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation. CoRR abs/2306.08920 (2023)
[i12]Zhisheng Zheng, Ziyang Ma, Yu Wang, Xie Chen:
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition. CoRR abs/2308.14814 (2023)
[i11]Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu:
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching. CoRR abs/2309.05027 (2023)
[i10]Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen:
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS. CoRR abs/2309.07377 (2023)
[i9]Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen:
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition. CoRR abs/2309.10294 (2023)
[i8]Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen:
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning. CoRR abs/2309.13860 (2023)
[i7]Jiaming Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang:
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT. CoRR abs/2310.04673 (2023)
[i6]Fan Yu, Haoxu Wang, Ziyang Ma, Shiliang Zhang:
Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition. CoRR abs/2312.08850 (2023)
[i5]Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen:
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation. CoRR abs/2312.15185 (2023)- 2022
[i4]Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang:
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition. CoRR abs/2210.15631 (2022)
[i3]Ziyang Ma, Zhisheng Zheng, Changli Tang, Yujin Wang, Xie Chen:
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets. CoRR abs/2211.07321 (2022)
[i2]Zhuoyuan Yao, Shuo Ren, Sanyuan Chen, Ziyang Ma, Pengcheng Guo, Lei Xie:
TESSP: Text-Enhanced Self-Supervised Speech Pre-training. CoRR abs/2211.13443 (2022)- 2021
[c1]Ziyang Ma
, Xianjing Han, Xuemeng Song, Yiran Cui, Liqiang Nie:
Hierarchical Deep Residual Reasoning for Temporal Moment Localization. MMAsia 2021: 15:1-15:7
[i1]Ziyang Ma, Xianjing Han, Xuemeng Song, Yiran Cui, Liqiang Nie:
Hierarchical Deep Residual Reasoning for Temporal Moment Localization. CoRR abs/2111.00417 (2021)
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-10-24 03:05 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







