default search action

combined dblp search
author search
venue search
publication search

ask others

Ziyang Ma 0001

> Home > Persons

Person information

affiliation: Shanghai Jiao Tong University, Department of Computer Science and Engineering, AI Institute, MoE Key Lab of Artificial Intelligence, Shanghai, China
affiliation (until 2022): Shandong University, School of Computer Science and Technology, Shandong, China

Other persons with the same name

see FAQ

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2025
[c39]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/aaai/LiuM0C00025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/aaai/LiuM0C00025
Tao Liu, Ziyang Ma, Qi Chen, Feilong Chen, Shuai Fan, Xie Chen, Kai Yu:
VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization. AAAI 2025: 5586-5594
[c38]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/aaai/MaSDC0WW025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/aaai/MaSDC0WW025
Ziyang Ma, Yakun Song, Chenpeng Du, Jian Cong, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen:
Language Model Can Listen While Speaking. AAAI 2025: 24831-24839
[c37]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/aaai/MaY0GWDY0ZZ025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/aaai/MaY0GWDY0ZZ025
Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen:
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration. AAAI 2025: 24840-24848
[c36]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/aaai/Song0WM025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/aaai/Song0WM025
Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Xie Chen:
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering. AAAI 2025: 25174-25182
[c35]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/acl/0001LSCDWC00W025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/0001LSCDWC00W025
Ziyang Ma, Xiquan Li, Yakun Song, Wenxi Chen, Chenpeng Du, Jian Wu, Yuanzhe Chen, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen:
Towards Reliable Large Audio Language Model. ACL (Findings) 2025: 1000-1014
[c34]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/acl/Chen0YLLXN00L0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/Chen0YLLXN00L0025
Wenxi Chen, Ziyang Ma, Ruiqi Yan, Yuzhe Liang, Xiquan Li, Ruiyang Xu, Zhikang Niu, Yanqiao Zhu, Yifan Yang, Zhanxun Liu, Kai Yu, Yuxuan Hu, Jinyu Li, Yan Lu, Shujie Liu, Xie Chen:
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training. ACL (Findings) 2025: 2262-2282
[c33]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/acl/0005SZCLYD0LWL025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/0005SZCLYD0LWL025
Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen:
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement. ACL (1) 2025: 2673-2686
[c32]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/acl/ChenN0DWZ0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/ChenN0DWZ0025
Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu, Xie Chen:
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching. ACL (1) 2025: 6255-6271
[c31]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/acl/DuP0YYD0XL025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/DuP0YYD0XL025
Yexing Du, Youcheng Pan, Ziyang Ma, Bo Yang, Yifan Yang, Keqi Deng, Xie Chen, Yang Xiang, Ming Liu, Bing Qin:
Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning. ACL (1) 2025: 12466-12478
[c30]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/Chen0LXLZ0025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/Chen0LXLZ0025
Wenxi Chen, Ziyang Ma, Xiquan Li, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Kai Yu, Xie Chen:
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs. ICASSP 2025: 1-5
[c29]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/LiC0XLZK025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/LiC0XLZK025
Xiquan Li, Wenxi Chen, Ziyang Ma, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Qiuqiang Kong, Xie Chen:
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning. ICASSP 2025: 1-5
[c28]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/YangY0DGZ025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/YangY0DGZ025
Guanrou Yang, Fan Yu, Ziyang Ma, Zhihao Du, Zhifu Gao, Shiliang Zhang, Xie Chen:
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap. ICASSP 2025: 1-5
[i64]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2501-01108
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2501-01108
Haina Zhu, Yizhi Zhou, Hangting Chen, Jianwei Yu, Ziyang Ma, Rongzhi Gu, Yi Luo, Wei Tan, Xie Chen:
MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization. CoRR abs/2501.01108 (2025)
[i63]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2501-07246
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2501-07246
Ziyang Ma, Zhuo Chen, Yuping Wang, Eng Siong Chng, Xie Chen:
Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model. CoRR abs/2501.07246 (2025)
[i62]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2502-17810
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2502-17810
Ruiqi Yan, Xiquan Li, Wenxi Chen, Zhikang Niu, Chen Yang, Ziyang Ma, Kai Yu, Xie Chen:
URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models. CoRR abs/2502.17810 (2025)
[i61]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2503-01710
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2503-01710
Xinsheng Wang, Mingqi Jiang, Ziyang Ma, Ziyu Zhang, Songxiang Liu, Linqin Li, Zheng Liang, Qixi Zheng, Rui Wang, Xiaoqin Feng, Weizhen Bian, Zhen Ye, Sitong Cheng, Ruibin Yuan, Zhixian Zhao, Xinfa Zhu, Jiahao Pan, Liumeng Xue, Pengcheng Zhu, Yunlin Chen, Zhifei Li, Xie Chen, Lei Xie, Yike Guo, Wei Xue:
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens. CoRR abs/2503.01710 (2025)
[i60]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2503-08638
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2503-08638
Ruibin Yuan, Hanfeng Lin, Shuyue Guo, Ge Zhang, Jiahao Pan, Yongyi Zang, Haohe Liu, Yiming Liang, Wenye Ma, Xingjian Du, Xinrun Du, Zhen Ye, Tianyu Zheng, Yinghao Ma, Minghao Liu, Zeyue Tian, Ziya Zhou, Liumeng Xue, Xingwei Qu, Yizhi Li, Shangda Wu, Tianhao Shen, Ziyang Ma, Jun Zhan, Chunhui Wang, Yatian Wang, Xiaowei Chi, Xinyue Zhang, Zhenzhu Yang, Xiangzhou Wang, Shansong Liu, Lingrui Mei, Peng Li, Junjie Wang, Jianwei Yu, Guojian Pang, Xu Li, Zihao Wang, Xiaohuan Zhou, Lijun Yu, Emmanouil Benetos, Yong Chen, Chenghua Lin, Xie Chen, Gus Xia, Zhaoxiang Zhang, Chao Zhang, Wenhu Chen, Xinyu Zhou, Xipeng Qiu, Roger B. Dannenberg, Zheng-Jia Liu, Jian Yang, Wenhao Huang, Wei Xue, Xu Tan, Yike Guo:
YuE: Scaling Open Foundation Models for Long-Form Music Generation. CoRR abs/2503.08638 (2025)
[i59]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2504-12867
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2504-12867
Guanrou Yang, Chen Yang, Qian Chen, Ziyang Ma, Wenxi Chen, Wen Wang, Tianrui Wang, Yifan Yang, Zhikang Niu, Wenrui Liu, Fan Yu, Zhihao Du, Zhifu Gao, Shiliang Zhang, Xie Chen:
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting. CoRR abs/2504.12867 (2025)
[i58]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2504-19423
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2504-19423
Zheng Lian, Rui Liu, Kele Xu, Bin Liu, Xuefei Liu, Yazhou Zhang, Xin Liu, Yong Li, Zebang Cheng, Haolin Zuo, Ziyang Ma, Xiaojiang Peng, Xie Chen, Ya Li, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao:
MER 2025: When Affective Computing Meets Large Language Models. CoRR abs/2504.19423 (2025)
[i57]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2505-00028
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2505-00028
Pengchao Feng, Ziyang Ma, Wenxi Chen, Yao Li, Sheng Wang, Kai Yu, Xie Chen:
Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation. CoRR abs/2505.00028 (2025)
[i56]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2505-13032
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2505-13032
Ziyang Ma, Yinghao Ma, Yanqiao Zhu, Chen Yang, Yi-Wen Chao, Ruiyang Xu, Wenxi Chen, Yuanzhe Chen, Zhuo Chen, Jian Cong, Kai Li, Keliang Li, Siyou Li, Xinfeng Li, Xiquan Li, Zheng Lian, Yuzhe Liang, Minghao Liu, Zhikang Niu, Tianrui Wang, Yuping Wang, Yuxuan Wang, Yihao Wu, Guanrou Yang, Jianwei Yu, Ruibin Yuan, Zhisheng Zheng, Ziya Zhou, Haina Zhu, Wei Xue, Emmanouil Benetos, Kai Yu, Chng Eng Siong, Xie Chen:
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix. CoRR abs/2505.13032 (2025)
[i55]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2505-16211
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2505-16211
Kai Li, Can Shen, Yile Liu, Jirui Han, Kelong Zheng, Xuechao Zou, Zhe Wang, Xingjian Du, Shun Zhang, Hanjun Luo, Yingbin Jin, Xinxin Xing, Ziyang Ma, Yue Liu, Xiaojun Jia, Yifan Zhang, Junfeng Fang, Kun Wang, Yibo Yan, Haoyang Li, Yiming Li, Xiaobin Zhuang, Yang Liu, Haibo Hu, Zhuo Chen, Zhizheng Wu, Xiaolin Hu, Eng-Siong Chng, XiaoFeng Wang, Wenyuan Xu, Wei Dong, Xinfeng Li:
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models. CoRR abs/2505.16211 (2025)
[i54]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2505-19294
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2505-19294
Ziyang Ma, Xiquan Li, Yakun Song, Wenxi Chen, Chenpeng Du, Jian Wu, Yuanzhe Chen, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen:
Towards Reliable Large Audio Language Model. CoRR abs/2505.19294 (2025)
[i53]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2505-19931
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2505-19931
Qixi Zheng, Yushen Chen, Zhikang Niu, Ziyang Ma, Xiaofei Wang, Kai Yu, Xie Chen:
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling. CoRR abs/2505.19931 (2025)
[i52]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2506-00385
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2506-00385
Yakun Song, Jiawei Chen, Xiaobin Zhuang, Chenpeng Du, Ziyang Ma, Jian Wu, Jian Cong, Dongya Jia, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen:
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation. CoRR abs/2506.00385 (2025)
[i51]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2506-13339
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2506-13339
Yizhou Peng, Bin Wang, Yi-Wen Chao, Ziyang Ma, Haoyang Zhang, Hexin Liu, Xie Chen, Eng Siong Chng:
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025. CoRR abs/2506.13339 (2025)
[i50]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2507-04278
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2507-04278
Zheng Lian, Licai Sun, Haoyu Chen, Zebang Cheng, Fan Zhang, Ziyu Jia, Ziyang Ma, Fei Ma, Xiaojiang Peng, Jianhua Tao:
EMER-Ranker: Learning to Rank Emotion Descriptions in the Absence of Ground Truth. CoRR abs/2507.04278 (2025)
[i49]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2509-17765
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2509-17765
Jin Xu, Zhifang Guo, Hangrui Hu, Yunfei Chu, Xiong Wang, Jinzheng He, Yuxuan Wang, Xian Shi, Ting He, Xinfa Zhu, Yuanjun Lv, Yongqi Wang, Dake Guo, He Wang, Linhan Ma, Pei Zhang, Xinyu Zhang, Hongkun Hao, Zishan Guo, Baosong Yang, Bin Zhang, Ziyang Ma, Xipin Wei, Shuai Bai, Keqin Chen, Xuejing Liu, Peng Wang, Mingkun Yang, Dayiheng Liu, Xingzhang Ren, Bo Zheng, Rui Men, Fan Zhou, Bowen Yu, Jianxin Yang, Le Yu, Jingren Zhou, Junyang Lin:
Qwen3-Omni Technical Report. CoRR abs/2509.17765 (2025)
[i48]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2509-18816
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2509-18816
Junyu Wang, Ziyang Ma, Zhengding Luo, Tianrui Wang, Meng Ge, Xiaobao Wang, Longbiao Wang:
Pay More Attention To Audio: Mitigating Imbalance of Cross-Modal Attention in Large Audio Language Models. CoRR abs/2509.18816 (2025)
[i47]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2509-24629
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2509-24629
Tianrui Wang, Haoyu Wang, Meng Ge, Cheng Gong, Chunyu Qiang, Ziyang Ma, Zikang Huang, Guanrou Yang, Xiaobao Wang, Eng Siong Chng, Xie Chen, Longbiao Wang, Jianwu Dang:
Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis. CoRR abs/2509.24629 (2025)
2024
[j2]
- view
  authority control:
- export record
  dblp key:
  - journals/taslp/LiangMDYC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/taslp/LiangMDYC24
Zheng Liang, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen:
E$^{3}$TTS: End-to-End Text-Based Speech Editing TTS System and Its Applications. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4810-4821 (2024)
[j1]
- view
  authority control:
- export record
  dblp key:
  - journals/tmm/XuMWY24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/tmm/XuMWY24
Xuenan Xu, Ziyang Ma, Mengyue Wu, Kai Yu:
Towards Weakly Supervised Text-to-Audio Grounding. IEEE Trans. Multim. 26: 11126-11138 (2024)
[c27]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/YuanLWTWSZWLZXM24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/YuanLWTWSZWLZXM24
Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Liumeng Xue, Ziyang Ma, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Jie Fu, Emmanouil Benetos, Gus Xia, Roger B. Dannenberg, Wei Xue, Shiyin Kang, Yike Guo:
ChatMusician: Understanding and Generating Music Intrinsically with LLM. ACL (Findings) 2024: 6252-6271
[c26]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/MaZYLGZ024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/MaZYLGZ024
Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen:
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation. ACL (Findings) 2024: 15747-15760
[c25]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/YuWMZ24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/YuWMZ24
Fan Yu, Haoxu Wang, Ziyang Ma, Shiliang Zhang:
Hourglass-AVSR: Down-Up Sampling-Based Computational Efficiency Model for Audio-Visual Speech Recognition. ICASSP 2024: 7940-7944
[c24]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/YangSDM0P024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/YangSDM0P024
Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen:
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS. ICASSP 2024: 10401-10405
[c23]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/GuoDM0024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/GuoDM0024
Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu:
VoiceFlow: Efficient Text-To-Speech with Rectified Flow Matching. ICASSP 2024: 11121-11125
[c22]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/MaWZG0Z024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/MaWZG0Z024
Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen:
Leveraging Speech PTM, Text LLM, And Emotional TTS For Speech Emotion Recognition. ICASSP 2024: 11146-11150
[c21]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/icml/ZhengPM0CH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icml/ZhengPM0CH24
Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath:
BAT: Learning to Reason about Spatial Sounds with Large Language Models. ICML 2024
[c20]
- view
  - electronic edition @ ijcai.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/ijcai/ChenLMZ024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/ijcai/ChenLMZ024
Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen:
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer. IJCAI 2024: 3807-3815
[c19]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaCZZCLY0H24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaCZZCLY0H24
Ziyang Ma, Mingjie Chen, Hezhao Zhang, Zhisheng Zheng, Wenxi Chen, Xiquan Li, Jiaxin Ye, Xie Chen, Thomas Hain:
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark. INTERSPEECH 2024
[c18]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Song0WMY024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Song0WMY024
Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Guanrou Yang, Xie Chen:
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers. INTERSPEECH 2024
[c17]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SongZ0M0024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SongZ0M0024
Zheshu Song, Jianheng Zhuo, Yifan Yang, Ziyang Ma, Shixiong Zhang, Xie Chen:
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR. INTERSPEECH 2024
[c16]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangMYGZ024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangMYGZ024
Guanrou Yang, Ziyang Ma, Fan Yu, Zhifu Gao, Shiliang Zhang, Xie Chen:
MaLa-ASR: Multimedia-Assisted LLM-Based ASR. INTERSPEECH 2024
[c15]
- view
  authority control:
- export record
  dblp key:
  - conf/iscslp/XiaMZ024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iscslp/XiaMZ024
Zhengshun Xia, Ziyang Ma, Zhisheng Zheng, Xie Chen:
Improving Emotion Recognition with Pre-Trained Models, Multimodality, and Contextual Information. ISCSLP 2024: 636-640
[c14]
- view
  authority control:
- export record
  dblp key:
  - conf/iscslp/GuoWYWMDWLL0Z0024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iscslp/GuoWYWMDWLL0Z0024
Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Xu Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu:
The X-Lance Technical Report for Interspeech 2024 Speech Processing using Discrete Speech Unit Challenge. ISCSLP 2024: 641-645
[c13]
- view
  authority control:
- export record
  dblp key:
  - conf/mrac/LianSSWZCGZMCYL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/mrac/LianSSWZCGZMCYL24
Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao:
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition. MRAC@MM 2024: 41-48
[c12]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/odyssey/ChenZLLWM0LRWW024
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/odyssey/ChenZLLWM0LRWW024
Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua D. Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain:
1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem. Odyssey 2024: 260-265
[c11]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/YangMGZC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/YangMGZC24
Guanrou Yang, Ziyang Ma, Zhifu Gao, Shiliang Zhang, Xie Chen:
CTC-Assisted LLM-Based Contextual ASR. SLT 2024: 126-131
[c10]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/NiuCZMCL24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/NiuCZMCL24
Zhikang Niu, Sanyuan Chen, Long Zhou, Ziyang Ma, Xie Chen, Shujie Liu:
NDVQ: Robust Neural Audio Codec With Normal Distribution-Based Vector Quantization. SLT 2024: 705-710
[i46]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2401-02584
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2401-02584
Xuenan Xu, Ziyang Ma, Mengyue Wu, Kai Yu:
Towards Weakly Supervised Text-to-Audio Grounding. CoRR abs/2401.02584 (2024)
[i45]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2401-03497
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2401-03497
Wenxi Chen, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, Xie Chen:
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer. CoRR abs/2401.03497 (2024)
[i44]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2401-07333
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2401-07333
Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Xie Chen:
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering. CoRR abs/2401.07333 (2024)
[i43]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-01591
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-01591
Zhisheng Zheng, Puyuan Peng, Ziyang Ma, Xie Chen, Eunsol Choi, David Harwath:
BAT: Learning to Reason about Spatial Sounds with Large Language Models. CoRR abs/2402.01591 (2024)
[i42]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-08846
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-08846
Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen:
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity. CoRR abs/2402.08846 (2024)
[i41]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-16153
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-16153
Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia, Roger B. Dannenberg, Wei Xue, Shiyin Kang, Yike Guo:
ChatMusician: Understanding and Generating Music Intrinsically with LLM. CoRR abs/2402.16153 (2024)
[i40]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2404-04167
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2404-04167
Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Binhang Yuan, Wenhu Chen, Jie Fu, Ge Zhang:
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model. CoRR abs/2404.04167 (2024)
[i39]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2404-06079
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2404-06079
Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu:
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge. CoRR abs/2404.06079 (2024)
[i38]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2404-06393
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2404-06393
Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan, Stephen W. Huang, Wenhu Chen, Jie Fu, Ge Zhang:
MuPT: A Generative Symbolic Music Pretrained Transformer. CoRR abs/2404.06393 (2024)
[i37]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2404-17113
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2404-17113
Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao:
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition. CoRR abs/2404.17113 (2024)
[i36]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2405-19327
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2405-19327
Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Y. Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kaijing Ma, Minghao Liu, Morry Niu, Noah Wang, Quehry Que, Ruibo Liu, Sine Liu, Shawn Guo, Soren Gao, Wangchunshu Zhou, Xinyue Zhang, Yizhi Zhou, Yubo Wang, Yuelin Bai, Yuhan Zhang, Yuxiang Zhang, Zenith Wang, Zhenzhu Yang, Zijian Zhao, Jiajun Zhang, Wanli Ouyang, Wenhao Huang, Wenhu Chen:
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series. CoRR abs/2405.19327 (2024)
[i35]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2405-20064
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2405-20064
Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua D. Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain:
1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem. CoRR abs/2405.20064 (2024)
[i34]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-05839
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-05839
Guanrou Yang, Ziyang Ma, Fan Yu, Zhifu Gao, Shiliang Zhang, Xie Chen:
MaLa-ASR: Multimedia-Assisted LLM-Based ASR. CoRR abs/2406.05839 (2024)
[i33]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-06619
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-06619
Zheshu Song, Jianheng Zhuo, Yifan Yang, Ziyang Ma, Shixiong Zhang, Xie Chen:
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR. CoRR abs/2406.06619 (2024)
[i32]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-07162
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-07162
Ziyang Ma, Mingjie Chen, Hezhao Zhang, Zhisheng Zheng, Wenxi Chen, Xiquan Li, Jiaxin Ye, Xie Chen, Thomas Hain:
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark. CoRR abs/2406.07162 (2024)
[i31]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-11546
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-11546
Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen:
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement. CoRR abs/2406.11546 (2024)
[i30]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-15752
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-15752
Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Guanrou Yang, Xie Chen:
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers. CoRR abs/2406.15752 (2024)
[i29]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2407-04051
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2407-04051
Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang, Zhangyu Xiao, Zhijie Yan, Yexin Yang, Bin Zhang, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Siqi Zheng:
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs. CoRR abs/2407.04051 (2024)
[i28]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2407-05407
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2407-05407
Zhihao Du, Qian Chen, Shiliang Zhang, Kai Hu, Heng Lu, Yexin Yang, Hangrui Hu, Siqi Zheng, Yue Gu, Ziyang Ma, Zhifu Gao, Zhijie Yan:
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens. CoRR abs/2407.05407 (2024)
[i27]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2408-02622
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2408-02622
Ziyang Ma, Yakun Song, Chenpeng Du, Jian Cong, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen:
Language Model Can Listen While Speaking. CoRR abs/2408.02622 (2024)
[i26]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2408-14340
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2408-14340
Yinghao Ma, Anders Øland, Anton Ragni, Bleiz Macsen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg, Ruibin Yuan, Shangda Wu, Shih-Lun Wu, Shuqi Dai, Shun Lei, Shiyin Kang, Simon Dixon, Wenhu Chen, Wenhao Huang, Xingjian Du, Xingwei Qu, Xu Tan, Yizhi Li, Zeyue Tian, Zhiyong Wu, Zhizheng Wu, Ziyang Ma, Ziyu Wang:
Foundation Models for Music: A Survey. CoRR abs/2408.14340 (2024)
[i25]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2409-00387
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2409-00387
Tianrui Wang, Jin Li, Ziyang Ma, Rui Cao, Xie Chen, Longbiao Wang, Meng Ge, Xiaobao Wang, Yuguang Wang, Jianwu Dang, Nyima Tashi:
Progressive Residual Extraction based Pre-training for Speech Representation Learning. CoRR abs/2409.00387 (2024)
[i24]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2409-12717
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2409-12717
Zhikang Niu, Sanyuan Chen, Long Zhou, Ziyang Ma, Xie Chen, Shujie Liu:
NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization. CoRR abs/2409.12717 (2024)
[i23]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2409-19510
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2409-19510
Yexing Du, Ziyang Ma, Yifan Yang, Keqi Deng, Xie Chen, Bo Yang, Yang Xiang, Ming Liu, Bing Qin:
CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought. CoRR abs/2409.19510 (2024)
[i22]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2410-06885
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2410-06885
Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu, Xie Chen:
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching. CoRR abs/2410.06885 (2024)
[i21]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2410-09472
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2410-09472
Xiquan Li, Wenxi Chen, Ziyang Ma, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Qiuqiang Kong, Xie Chen:
DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning. CoRR abs/2410.09472 (2024)
[i20]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2410-09503
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2410-09503
Wenxi Chen, Ziyang Ma, Xiquan Li, Xuenan Xu, Yuzhe Liang, Zhisheng Zheng, Kai Yu, Xie Chen:
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs. CoRR abs/2410.09503 (2024)
[i19]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2410-16726
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2410-16726
Guanrou Yang, Fan Yu, Ziyang Ma, Zhihao Du, Zhifu Gao, Shiliang Zhang, Xie Chen:
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap. CoRR abs/2410.16726 (2024)
[i18]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2411-06437
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2411-06437
Guanrou Yang, Ziyang Ma, Zhifu Gao, Shiliang Zhang, Xie Chen:
CTC-Assisted LLM-Based Contextual ASR. CoRR abs/2411.06437 (2024)
[i17]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2412-09892
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2412-09892
Tao Liu, Ziyang Ma, Qi Chen, Feilong Chen, Shuai Fan, Xie Chen, Kai Yu:
VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization. CoRR abs/2412.09892 (2024)
2023
[c9]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/WangTMZCZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/WangTMZCZ23
Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang:
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition. ASRU 2023: 1-6
[c8]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/YangMZSNC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/YangMZSNC23
Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen:
Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning. ASRU 2023: 1-7
[c7]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ChenMLTLYC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ChenMLTLYC23
Qi Chen, Ziyang Ma, Tao Liu, Xu Tan, Qu Lu, Kai Yu, Xie Chen:
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation. ICASSP 2023: 1-5
[c6]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ChenMTWZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ChenMTWZ23
Xie Chen, Ziyang Ma, Changli Tang, Yujin Wang, Zhisheng Zheng:
Front-End Adapter: Adapting Front-End Input of Speech Based Self-Supervised Learning for Speech Recognition. ICASSP 2023: 1-5
[c5]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaZTW023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaZTW023
Ziyang Ma, Zhisheng Zheng, Changli Tang, Yujin Wang, Xie Chen:
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets. INTERSPEECH 2023: 82-86
[c4]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangSMD0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangSMD0023
Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen:
Improving Code-Switching and Name Entity Recognition in ASR with Speech Editing based Data Augmentation. INTERSPEECH 2023: 919-923
[c3]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaZY00023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaZY00023
Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang, Chao Zhang, Xie Chen:
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation. INTERSPEECH 2023: 1269-1273
[c2]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengM0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengM0023
Zhisheng Zheng, Ziyang Ma, Yu Wang, Xie Chen:
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition. INTERSPEECH 2023: 3307-3311
[i16]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2302-09331
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2302-09331
Xie Chen, Ziyang Ma, Changli Tang, Yujin Wang, Zhisheng Zheng:
Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition. CoRR abs/2302.09331 (2023)
[i15]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2303-05322
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2303-05322
Qi Chen, Ziyang Ma, Tao Liu, Xu Tan, Qu Lu, Xie Chen, Kai Yu:
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation. CoRR abs/2303.05322 (2023)
[i14]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2306-08588
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2306-08588
Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen:
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation. CoRR abs/2306.08588 (2023)
[i13]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2306-08920
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2306-08920
Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang, Chao Zhang, Xie Chen:
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation. CoRR abs/2306.08920 (2023)
[i12]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2308-14814
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2308-14814
Zhisheng Zheng, Ziyang Ma, Yu Wang, Xie Chen:
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition. CoRR abs/2308.14814 (2023)
[i11]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-05027
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-05027
Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu:
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching. CoRR abs/2309.05027 (2023)
[i10]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-07377
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-07377
Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen:
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS. CoRR abs/2309.07377 (2023)
[i9]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-10294
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-10294
Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen:
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition. CoRR abs/2309.10294 (2023)
[i8]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-13860
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-13860
Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen:
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning. CoRR abs/2309.13860 (2023)
[i7]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2310-04673
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2310-04673
Jiaming Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang:
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT. CoRR abs/2310.04673 (2023)
[i6]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2312-08850
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2312-08850
Fan Yu, Haoxu Wang, Ziyang Ma, Shiliang Zhang:
Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition. CoRR abs/2312.08850 (2023)
[i5]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2312-15185
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2312-15185
Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen:
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation. CoRR abs/2312.15185 (2023)
2022
[i4]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2210-15631
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2210-15631
Yujin Wang, Changli Tang, Ziyang Ma, Zhisheng Zheng, Xie Chen, Wei-Qiang Zhang:
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition. CoRR abs/2210.15631 (2022)
[i3]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-07321
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-07321
Ziyang Ma, Zhisheng Zheng, Changli Tang, Yujin Wang, Xie Chen:
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets. CoRR abs/2211.07321 (2022)
[i2]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-13443
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-13443
Zhuoyuan Yao, Shuo Ren, Sanyuan Chen, Ziyang Ma, Pengcheng Guo, Lei Xie:
TESSP: Text-Enhanced Self-Supervised Speech Pre-training. CoRR abs/2211.13443 (2022)
2021
[c1]
- view
  authority control:
- export record
  dblp key:
  - conf/mmasia/MaHSCN21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/mmasia/MaHSCN21
Ziyang Ma, Xianjing Han, Xuemeng Song, Yiran Cui, Liqiang Nie:
Hierarchical Deep Residual Reasoning for Temporal Moment Localization. MMAsia 2021: 15:1-15:7
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2111-00417
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2111-00417
Ziyang Ma, Xianjing Han, Xuemeng Song, Yiran Cui, Liqiang Nie:
Hierarchical Deep Residual Reasoning for Temporal Moment Localization. CoRR abs/2111.00417 (2021)

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.