


default search action
Wei-Ning Hsu
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j3]Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli:
Scaling Speech Technology to 1, 000+ Languages. J. Mach. Learn. Res. 25: 97:1-97:52 (2024) - [c70]HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang:
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception. ACL (1) 2024: 12896-12911 - [c69]Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman:
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos. ECCV (70) 2024: 277-295 - [c68]Sungho Jeon, Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel:
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency. ICASSP Workshops 2024: 555-559 - [c67]Peng-Jen Chen, Bowen Shi, Kelvin Niu, Ann Lee, Wei-Ning Hsu:
M2BART: Multilingual and Multimodal Encoder-Decoder Pre-Training for Any-to-Any Machine Translation. ICASSP 2024: 11896-11900 - [c66]Alexander H. Liu, Matthew Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu:
Generative Pre-training for Speech with Flow Matching. ICLR 2024 - [c65]K. R. Prajwal, Bowen Shi, Matthew Le, Apoorv Vyas, Andros Tjandra, Mahi Luthra, Baishan Guo, Huiyu Wang, Triantafyllos Afouras, David Kant, Wei-Ning Hsu:
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation. ICML 2024 - [c64]Navonil Majumder
, Chia-Yu Hung
, Deepanway Ghosal
, Wei-Ning Hsu
, Rada Mihalcea
, Soujanya Poria
:
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization. ACM Multimedia 2024: 564-572 - [i74]HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang:
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception. CoRR abs/2403.14402 (2024) - [i73]Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria:
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization. CoRR abs/2404.09956 (2024) - [i72]Chung-Ming Chien, Andros Tjandra, Apoorv Vyas, Matt Le, Bowen Shi, Wei-Ning Hsu:
Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning. CoRR abs/2406.06251 (2024) - [i71]Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman:
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos. CoRR abs/2406.09272 (2024) - [i70]Gaël Le Lan, Bowen Shi, Zhaoheng Ni, Sidd Srinivasan, Anurag Kumar, Brian Ellis, David Kant, Varun Nagaraja, Ernie Chang, Wei-Ning Hsu, Yangyang Shi, Vikas Chandra:
High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching. CoRR abs/2407.03648 (2024) - [i69]Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, David Yan, Dhruv Choudhary, Dingkang Wang, Geet Sethi, Guan Pang, Haoyu Ma, Ishan Misra, Ji Hou, Jialiang Wang, Kiran Jagadeesh, Kunpeng Li, Luxin Zhang, Mannat Singh, Mary Williamson, Matt Le, Matthew Yu, Mitesh Kumar Singh, Peizhao Zhang, Peter Vajda, Quentin Duval, Rohit Girdhar, Roshan Sumbaly, Sai Saketh Rambhatla, Sam S. Tsai, Samaneh Azadi, Samyak Datta, Sanyuan Chen, Sean Bell, Sharadh Ramaswamy, Shelly Sheynin, Siddharth Bhattacharya, Simran Motwani, Tao Xu, Tianhe Li, Tingbo Hou, Wei-Ning Hsu, Xi Yin, Xiaoliang Dai, Yaniv Taigman, Yaqiao Luo, Yen-Cheng Liu, Yi-Chiao Wu, Yue Zhao, Yuval Kirstain, Zecheng He, Zijian He, Albert Pumarola, Ali K. Thabet, Artsiom Sanakoyeu, Arun Mallya, Baishan Guo, Boris Araya, Breena Kerr, Carleigh Wood, Ce Liu, Cen Peng, Dmitry Vengertsev, Edgar Schönfeld, Elliot Blanchard, Felix Juefei-Xu, Fraylie Nord, Jeff Liang, John Hoffman, Jonas Kohler, Kaolin Fire, Karthik Sivakumar, Lawrence Chen, Licheng Yu, Luya Gao, Markos Georgopoulos, Rashel Moritz, Sara K. Sampson, Shikai Li, Simone Parmeggiani, Steve Fine, Tara Fowler, Vladan Petrovic, Yuming Du:
Movie Gen: A Cast of Media Foundation Models. CoRR abs/2410.13720 (2024) - [i68]K. R. Prajwal, Bowen Shi, Matthew Le, Apoorv Vyas, Andros Tjandra, Mahi Luthra, Baishan Guo, Huiyu Wang, Triantafyllos Afouras, David Kant, Wei-Ning Hsu:
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation. CoRR abs/2410.20478 (2024) - [i67]Mu Yang, Bowen Shi, Matthew Le, Wei-Ning Hsu, Andros Tjandra:
Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation. CoRR abs/2411.05141 (2024) - 2023
- [j2]Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoît Sagot, Abdelrahman Mohamed, Emmanuel Dupoux:
Generative Spoken Dialogue Language Modeling. Trans. Assoc. Comput. Linguistics 11: 250-266 (2023) - [c63]Peng-Jen Chen, Kevin Tran, Yilin Yang, Jingfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino, Wei-Ning Hsu, Ann Lee:
Speech-to-Speech Translation for a Real-world Unwritten Language. ACL (Findings) 2023: 4969-4983 - [c62]Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun Tang, Wei-Ning Hsu, Michael Auli, Juan Pino:
Simple and Effective Unsupervised Speech Translation. ACL (1) 2023: 10771-10784 - [c61]Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli:
Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations. ASRU 2023: 1-8 - [c60]Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi:
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration. CVPR 2023: 18796-18806 - [c59]Ju-Chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli:
Toward Joint Language Modeling for Speech Units and Text. EMNLP (Findings) 2023: 6582-6593 - [c58]Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed:
Continual Learning for On-Device Speech Recognition Using Disentangled Conformers. ICASSP 2023: 1-5 - [c57]Ali Elkahky, Wei-Ning Hsu, Paden Tomasello, Tu Anh Nguyen, Robin Algayres, Yossi Adi, Jade Copet, Emmanuel Dupoux, Abdelrahman Mohamed:
Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training? ICASSP 2023: 1-5 - [c56]Maryam Fazel-Zarandi, Wei-Ning Hsu:
Cocktail Hubert: Generalized Self-Supervised Pre-Training for Mixture and Single-Source Speech. ICASSP 2023: 1-5 - [c55]Ramon Sanabria, Wei-Ning Hsu, Alexei Baevski, Michael Auli:
Measuring the Impact of Domain Factors in Self-Supervised Pre-Training. ICASSP Workshops 2023: 1-5 - [c54]Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer:
Scaling Laws for Generative Mixed-Modal Language Models. ICML 2023: 265-279 - [c53]Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli:
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language. ICML 2023: 1416-1429 - [c52]Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang:
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation. INTERSPEECH 2023: 4064-4068 - [c51]Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux:
Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis. INTERSPEECH 2023: 4823-4827 - [c50]Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu:
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale. NeurIPS 2023 - [c49]Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass:
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning. NeurIPS 2023 - [i66]Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Abdelrahman Mohamed:
Efficient Speech Representation Learning with Low-Bit Quantization. CoRR abs/2301.00652 (2023) - [i65]Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer:
Scaling Laws for Generative Mixed-Modal Language Models. CoRR abs/2301.03728 (2023) - [i64]Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli:
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations. CoRR abs/2302.06419 (2023) - [i63]Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang:
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation. CoRR abs/2303.00628 (2023) - [i62]Maryam Fazel-Zarandi, Wei-Ning Hsu:
Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech. CoRR abs/2303.11131 (2023) - [i61]Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass:
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning. CoRR abs/2305.10005 (2023) - [i60]Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli:
Scaling Speech Technology to 1, 000+ Languages. CoRR abs/2305.13516 (2023) - [i59]Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu:
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale. CoRR abs/2306.15687 (2023) - [i58]Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux:
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis. CoRR abs/2308.05725 (2023) - [i57]Po-Chun Hsu, Ali Elkahky, Wei-Ning Hsu, Yossi Adi, Tu Anh Nguyen, Jade Copet, Emmanuel Dupoux, Hung-yi Lee, Abdelrahman Mohamed:
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS. CoRR abs/2309.17020 (2023) - [i56]Ju-Chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli:
Toward Joint Language Modeling for Speech Units and Text. CoRR abs/2310.08715 (2023) - [i55]Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu:
Generative Pre-training for Speech with Flow Matching. CoRR abs/2310.16338 (2023) - [i54]Sungho Jeon, Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel:
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency. CoRR abs/2311.02772 (2023) - [i53]Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu:
Audiobox: Unified Audio Generation with Natural Language Prompts. CoRR abs/2312.15821 (2023) - 2022
- [c48]Yun Tang, Hongyu Gong, Ning Dong, Changhan Wang, Wei-Ning Hsu, Jiatao Gu, Alexei Baevski, Xian Li, Abdelrahman Mohamed, Michael Auli, Juan Miguel Pino:
Unified Speech-Text Pre-training for Speech Translation and Recognition. ACL (1) 2022: 1488-1499 - [c47]Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu:
Direct Speech-to-Speech Translation With Discrete Units. ACL (1) 2022: 3327-3339 - [c46]Eugene Kharitonov, Ann Lee, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu:
Text-Free Prosody-Aware Generative Spoken Language Modeling. ACL (1) 2022: 8666-8681 - [c45]Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi:
Textless Speech Emotion Conversion using Discrete & Decomposed Representations. EMNLP 2022: 11200-11214 - [c44]Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed:
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction. ICLR 2022 - [c43]Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli:
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language. ICML 2022: 1298-1312 - [c42]Alexander H. Liu, Cheng-I Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James R. Glass:
Simple and Effective Unsupervised Speech Synthesis. INTERSPEECH 2022: 843-847 - [c41]Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed:
Robust Self-Supervised Audio-Visual Speech Recognition. INTERSPEECH 2022: 2118-2122 - [c40]Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski:
On-demand compute reduction with stochastic wav2vec 2.0. INTERSPEECH 2022: 3048-3052 - [c39]Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu:
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT. INTERSPEECH 2022: 4785-4789 - [c38]Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee:
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation. INTERSPEECH 2022: 5195-5199 - [c37]Ann Lee, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Yossi Adi, Juan Miguel Pino, Jiatao Gu, Wei-Ning Hsu:
Textless Speech-to-Speech Translation on Real Data. NAACL-HLT 2022: 860-872 - [c36]Wei-Ning Hsu, Bowen Shi:
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality. NeurIPS 2022 - [c35]Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski:
Towards End-to-End Unsupervised Speech Recognition. SLT 2022: 221-228 - [c34]Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Anh Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed:
Stop: A Dataset for Spoken Task Oriented Semantic Parsing. SLT 2022: 991-998 - [i52]Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed:
Robust Self-Supervised Audio-Visual Speech Recognition. CoRR abs/2201.01763 (2022) - [i51]Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed:
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction. CoRR abs/2201.02184 (2022) - [i50]Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli:
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language. CoRR abs/2202.03555 (2022) - [i49]Eugene Kharitonov, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Paden Tomasello, Ann Lee, Ali Elkahky, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi:
textless-lib: a Library for Textless Spoken Language Processing. CoRR abs/2202.07359 (2022) - [i48]Ramon Sanabria, Wei-Ning Hsu, Alexei Baevski, Michael Auli:
Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training. CoRR abs/2203.00648 (2022) - [i47]Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoît Sagot, Abdelrahman Mohamed, Emmanuel Dupoux:
Generative Spoken Dialogue Language Modeling. CoRR abs/2203.16502 (2022) - [i46]Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski:
Towards End-to-end Unsupervised Speech Recognition. CoRR abs/2204.02492 (2022) - [i45]Alexander H. Liu, Cheng-I Jeff Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James R. Glass:
Simple and Effective Unsupervised Speech Synthesis. CoRR abs/2204.02524 (2022) - [i44]Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee:
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation. CoRR abs/2204.02967 (2022) - [i43]Yun Tang, Hongyu Gong, Ning Dong, Changhan Wang, Wei-Ning Hsu, Jiatao Gu, Alexei Baevski, Xian Li, Abdelrahman Mohamed, Michael Auli, Juan Miguel Pino:
Unified Speech-Text Pre-training for Speech Translation and Recognition. CoRR abs/2204.05409 (2022) - [i42]Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski:
On-demand compute reduction with stochastic wav2vec 2.0. CoRR abs/2204.11934 (2022) - [i41]Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu:
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT. CoRR abs/2205.07180 (2022) - [i40]Wei-Ning Hsu, Bowen Shi:
A Single Self-Supervised Model for Many Speech Modalities Enables Zero-Shot Modality Transfer. CoRR abs/2207.07036 (2022) - [i39]Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossef Mordechay, Robin Algayres, Tu Anh Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed:
STOP: A dataset for Spoken Task Oriented Semantic Parsing. CoRR abs/2207.10643 (2022) - [i38]Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun Tang, Wei-Ning Hsu, Michael Auli, Juan Pino:
Simple and Effective Unsupervised Speech Translation. CoRR abs/2210.10191 (2022) - [i37]Peng-Jen Chen, Kevin Tran, Yilin Yang, Jingfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Miguel Pino, Wei-Ning Hsu, Ann Lee:
Speech-to-Speech Translation For A Real-world Unwritten Language. CoRR abs/2211.06474 (2022) - [i36]Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed:
Continual Learning for On-Device Speech Recognition using Disentangled Conformers. CoRR abs/2212.01393 (2022) - [i35]Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli:
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language. CoRR abs/2212.07525 (2022) - [i34]Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi:
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement. CoRR abs/2212.11377 (2022) - 2021
- [j1]Wei-Ning Hsu
, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed:
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. IEEE ACM Trans. Audio Speech Lang. Process. 29: 3451-3460 (2021) - [c33]Wei-Ning Hsu, David Harwath, Tyler Miller, Christopher Song, James R. Glass:
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units. ACL/IJCNLP (1) 2021: 5284-5300 - [c32]Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed:
Kaizen: Continuously Improving Teacher Using Exponential Moving Average for Semi-Supervised Speech Recognition. ASRU 2021: 518-525 - [c31]Changhan Wang, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Ann Lee, Peng-Jen Chen, Jiatao Gu, Juan Pino:
fairseq S\^2: A Scalable and Integrable Speech Synthesis Toolkit. EMNLP (Demos) 2021: 143-152 - [c30]Wei-Ning Hsu, Yao-Hung Hubert Tsai, Benjamin Bolte, Ruslan Salakhutdinov, Abdelrahman Mohamed:
Hubert: How Much Can a Bad Teacher Benefit ASR Pre-Training? ICASSP 2021: 6533-6537 - [c29]Wei-Ning Hsu, Anuroop Sriram
, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli:
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training. Interspeech 2021: 721-725 - [c28]Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux:
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations. Interspeech 2021: 3615-3619 - [c27]Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli:
Unsupervised Speech Recognition. NeurIPS 2021: 27826-27839 - [c26]Wei-Ning Hsu, Ann Lee, Gabriel Synnaeve, Awni Y. Hannun:
Semi-Supervised end-to-end Speech Recognition via Local Prior Matching. SLT 2021: 125-132 - [i33]Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu Anh Nguyen, Jade Copet, Alexei Baevski, Adelrahman Mohamed, Emmanuel Dupoux:
Generative Spoken Language Modeling from Raw Audio. CoRR abs/2102.01192 (2021) - [i32]Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux
:
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations. CoRR abs/2104.00355 (2021) - [i31]Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli:
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training. CoRR abs/2104.01027 (2021) - [i30]Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli:
Unsupervised Speech Recognition. CoRR abs/2105.11084 (2021) - [i29]Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed:
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. CoRR abs/2106.07447 (2021) - [i28]Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed:
Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition. CoRR abs/2106.07759 (2021) - [i27]Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Miguel Pino, Wei-Ning Hsu:
Direct speech-to-speech translation with discrete units. CoRR abs/2107.05604 (2021) - [i26]Eugene Kharitonov, Ann Lee, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu:
Text-Free Prosody-Aware Generative Spoken Language Modeling. CoRR abs/2109.03264 (2021) - [i25]Changhan Wang, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Ann Lee, Peng-Jen Chen, Jiatao Gu, Juan Miguel Pino:
fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit. CoRR abs/2109.06912 (2021) - [i24]Xutai Ma, Hongyu Gong, Danni Liu, Ann Lee, Yun Tang, Peng-Jen Chen, Wei-Ning Hsu, Kenneth Heafield, Phillip Koehn, Juan Miguel Pino:
Direct simultaneous speech to speech translation. CoRR abs/2110.08250 (2021) - [i23]Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi:
Textless Speech Emotion Conversion using Decomposed and Discrete Representations. CoRR abs/2111.07402 (2021) - [i22]Ann Lee, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Juan Miguel Pino, Jiatao Gu, Wei-Ning Hsu:
Textless Speech-to-Speech Translation on Real Data. CoRR abs/2112.08352 (2021) - 2020
- [c25]David Harwath, Wei-Ning Hsu, James R. Glass:
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech. ICLR 2020 - [c24]Michael Gump, Wei-Ning Hsu, James R. Glass:
Unsupervised Methods for Evaluating Speech Representations. INTERSPEECH 2020: 170-174 - [c23]Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski
, Adrian Lancucki, Ricard Marxer, James R. Glass:
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning. INTERSPEECH 2020: 3790-3794 - [i21]Wei-Ning Hsu, Ann Lee, Gabriel Synnaeve, Awni Y. Hannun:
Semi-Supervised Speech Recognition via Local Prior Matching. CoRR abs/2002.10336 (2020) - [i20]Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James R. Glass:
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning. CoRR abs/2006.02547 (2020) - [i19]Awni Y. Hannun, Vineel Pratap, Jacob Kahn, Wei-Ning Hsu:
Differentiable Weighted Finite-State Transducers. CoRR abs/2010.01003 (2020) - [i18]Wei-Ning Hsu, David Harwath, Christopher Song, James R. Glass:
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units. CoRR abs/2012.15454 (2020)
2010 – 2019
- 2019
- [c22]Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Yu-An Chung, Yuxuan Wang, Yonghui Wu, James R. Glass:
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization. ICASSP 2019: 5901-5905 - [c21]Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, Yu Zhang, R. J. Skerry-Ryan:
Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis. ICASSP 2019: 6940-6944 - [c20]Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang:
Hierarchical Generative Modeling for Controllable Speech Synthesis. ICLR (Poster) 2019 - [c19]