default search action
Wei-Ning Hsu
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j3]Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli:
Scaling Speech Technology to 1, 000+ Languages. J. Mach. Learn. Res. 25: 97:1-97:52 (2024) - [c69]HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang:
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception. ACL (1) 2024: 12896-12911 - [c68]Sungho Jeon, Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel:
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency. ICASSP Workshops 2024: 555-559 - [c67]Peng-Jen Chen, Bowen Shi, Kelvin Niu, Ann Lee, Wei-Ning Hsu:
M2BART: Multilingual and Multimodal Encoder-Decoder Pre-Training for Any-to-Any Machine Translation. ICASSP 2024: 11896-11900 - [c66]Alexander H. Liu, Matthew Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu:
Generative Pre-training for Speech with Flow Matching. ICLR 2024 - [c65]K. R. Prajwal, Bowen Shi, Matthew Le, Apoorv Vyas, Andros Tjandra, Mahi Luthra, Baishan Guo, Huiyu Wang, Triantafyllos Afouras, David Kant, Wei-Ning Hsu:
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation. ICML 2024 - [c64]Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria:
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization. ACM Multimedia 2024: 564-572 - [i71]HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang:
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception. CoRR abs/2403.14402 (2024) - [i70]Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria:
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization. CoRR abs/2404.09956 (2024) - [i69]Chung-Ming Chien, Andros Tjandra, Apoorv Vyas, Matt Le, Bowen Shi, Wei-Ning Hsu:
Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning. CoRR abs/2406.06251 (2024) - [i68]Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman:
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos. CoRR abs/2406.09272 (2024) - [i67]Gaël Le Lan, Bowen Shi, Zhaoheng Ni, Sidd Srinivasan, Anurag Kumar, Brian Ellis, David Kant, Varun Nagaraja, Ernie Chang, Wei-Ning Hsu, Yangyang Shi, Vikas Chandra:
High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching. CoRR abs/2407.03648 (2024) - 2023
- [j2]Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoît Sagot, Abdelrahman Mohamed, Emmanuel Dupoux:
Generative Spoken Dialogue Language Modeling. Trans. Assoc. Comput. Linguistics 11: 250-266 (2023) - [c63]Peng-Jen Chen, Kevin Tran, Yilin Yang, Jingfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino, Wei-Ning Hsu, Ann Lee:
Speech-to-Speech Translation for a Real-world Unwritten Language. ACL (Findings) 2023: 4969-4983 - [c62]Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun Tang, Wei-Ning Hsu, Michael Auli, Juan Pino:
Simple and Effective Unsupervised Speech Translation. ACL (1) 2023: 10771-10784 - [c61]Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli:
Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations. ASRU 2023: 1-8 - [c60]Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi:
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration. CVPR 2023: 18796-18806 - [c59]Ju-Chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli:
Toward Joint Language Modeling for Speech Units and Text. EMNLP (Findings) 2023: 6582-6593 - [c58]Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed:
Continual Learning for On-Device Speech Recognition Using Disentangled Conformers. ICASSP 2023: 1-5 - [c57]Ali Elkahky, Wei-Ning Hsu, Paden Tomasello, Tu Anh Nguyen, Robin Algayres, Yossi Adi, Jade Copet, Emmanuel Dupoux, Abdelrahman Mohamed:
Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training? ICASSP 2023: 1-5 - [c56]Maryam Fazel-Zarandi, Wei-Ning Hsu:
Cocktail Hubert: Generalized Self-Supervised Pre-Training for Mixture and Single-Source Speech. ICASSP 2023: 1-5 - [c55]Ramon Sanabria, Wei-Ning Hsu, Alexei Baevski, Michael Auli:
Measuring the Impact of Domain Factors in Self-Supervised Pre-Training. ICASSP Workshops 2023: 1-5 - [c54]Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer:
Scaling Laws for Generative Mixed-Modal Language Models. ICML 2023: 265-279 - [c53]Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli:
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language. ICML 2023: 1416-1429 - [c52]Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang:
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation. INTERSPEECH 2023: 4064-4068 - [c51]Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux:
Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis. INTERSPEECH 2023: 4823-4827 - [c50]Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu:
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale. NeurIPS 2023 - [c49]Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass:
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning. NeurIPS 2023 - [i66]Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Abdelrahman Mohamed:
Efficient Speech Representation Learning with Low-Bit Quantization. CoRR abs/2301.00652 (2023) - [i65]Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer:
Scaling Laws for Generative Mixed-Modal Language Models. CoRR abs/2301.03728 (2023) - [i64]Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli:
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations. CoRR abs/2302.06419 (2023) - [i63]Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang:
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation. CoRR abs/2303.00628 (2023) - [i62]Maryam Fazel-Zarandi, Wei-Ning Hsu:
Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech. CoRR abs/2303.11131 (2023) - [i61]Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass:
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning. CoRR abs/2305.10005 (2023) - [i60]Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli:
Scaling Speech Technology to 1, 000+ Languages. CoRR abs/2305.13516 (2023) - [i59]Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu:
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale. CoRR abs/2306.15687 (2023) - [i58]Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux:
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis. CoRR abs/2308.05725 (2023) - [i57]Po-Chun Hsu, Ali Elkahky, Wei-Ning Hsu, Yossi Adi, Tu Anh Nguyen, Jade Copet, Emmanuel Dupoux, Hung-yi Lee, Abdelrahman Mohamed:
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS. CoRR abs/2309.17020 (2023) - [i56]Ju-Chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli:
Toward Joint Language Modeling for Speech Units and Text. CoRR abs/2310.08715 (2023) - [i55]Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu:
Generative Pre-training for Speech with Flow Matching. CoRR abs/2310.16338 (2023) - [i54]Sungho Jeon, Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel:
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency. CoRR abs/2311.02772 (2023) - [i53]Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu:
Audiobox: Unified Audio Generation with Natural Language Prompts. CoRR abs/2312.15821 (2023) - 2022
- [c48]Yun Tang, Hongyu Gong, Ning Dong, Changhan Wang, Wei-Ning Hsu, Jiatao Gu, Alexei Baevski, Xian Li, Abdelrahman Mohamed, Michael Auli, Juan Miguel Pino:
Unified Speech-Text Pre-training for Speech Translation and Recognition. ACL (1) 2022: 1488-1499 - [c47]Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu:
Direct Speech-to-Speech Translation With Discrete Units. ACL (1) 2022: 3327-3339 - [c46]Eugene Kharitonov, Ann Lee, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu:
Text-Free Prosody-Aware Generative Spoken Language Modeling. ACL (1) 2022: 8666-8681 - [c45]Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi:
Textless Speech Emotion Conversion using Discrete & Decomposed Representations. EMNLP 2022: 11200-11214 - [c44]Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed:
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction. ICLR 2022 - [c43]Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli:
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language. ICML 2022: 1298-1312 - [c42]Alexander H. Liu, Cheng-I Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James R. Glass:
Simple and Effective Unsupervised Speech Synthesis. INTERSPEECH 2022: 843-847 - [c41]Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed:
Robust Self-Supervised Audio-Visual Speech Recognition. INTERSPEECH 2022: 2118-2122 - [c40]Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski:
On-demand compute reduction with stochastic wav2vec 2.0. INTERSPEECH 2022: 3048-3052 - [c39]Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu:
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT. INTERSPEECH 2022: 4785-4789 - [c38]Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee:
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation. INTERSPEECH 2022: 5195-5199 - [c37]Ann Lee, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Yossi Adi, Juan Miguel Pino, Jiatao Gu, Wei-Ning Hsu:
Textless Speech-to-Speech Translation on Real Data. NAACL-HLT 2022: 860-872 - [c36]Wei-Ning Hsu, Bowen Shi:
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality. NeurIPS 2022 - [c35]Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski:
Towards End-to-End Unsupervised Speech Recognition. SLT 2022: 221-228 - [c34]Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Anh Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed:
Stop: A Dataset for Spoken Task Oriented Semantic Parsing. SLT 2022: 991-998 - [i52]Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed:
Robust Self-Supervised Audio-Visual Speech Recognition. CoRR abs/2201.01763 (2022) - [i51]Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed:
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction. CoRR abs/2201.02184 (2022) - [i50]Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli:
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language. CoRR abs/2202.03555 (2022) - [i49]Eugene Kharitonov, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Paden Tomasello, Ann Lee, Ali Elkahky, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi:
textless-lib: a Library for Textless Spoken Language Processing. CoRR abs/2202.07359 (2022) - [i48]Ramon Sanabria, Wei-Ning Hsu, Alexei Baevski, Michael Auli:
Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training. CoRR abs/2203.00648 (2022) - [i47]Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoît Sagot, Abdelrahman Mohamed, Emmanuel Dupoux:
Generative Spoken Dialogue Language Modeling. CoRR abs/2203.16502 (2022) - [i46]Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski:
Towards End-to-end Unsupervised Speech Recognition. CoRR abs/2204.02492 (2022) - [i45]Alexander H. Liu, Cheng-I Jeff Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James R. Glass:
Simple and Effective Unsupervised Speech Synthesis. CoRR abs/2204.02524 (2022) - [i44]Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee:
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation. CoRR abs/2204.02967 (2022) - [i43]Yun Tang, Hongyu Gong, Ning Dong, Changhan Wang, Wei-Ning Hsu, Jiatao Gu, Alexei Baevski, Xian Li, Abdelrahman Mohamed, Michael Auli, Juan Miguel Pino:
Unified Speech-Text Pre-training for Speech Translation and Recognition. CoRR abs/2204.05409 (2022) - [i42]Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski:
On-demand compute reduction with stochastic wav2vec 2.0. CoRR abs/2204.11934 (2022) - [i41]Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu:
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT. CoRR abs/2205.07180 (2022) - [i40]Wei-Ning Hsu, Bowen Shi:
A Single Self-Supervised Model for Many Speech Modalities Enables Zero-Shot Modality Transfer. CoRR abs/2207.07036 (2022) - [i39]Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossef Mordechay, Robin Algayres, Tu Anh Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed:
STOP: A dataset for Spoken Task Oriented Semantic Parsing. CoRR abs/2207.10643 (2022) - [i38]Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun Tang, Wei-Ning Hsu, Michael Auli, Juan Pino:
Simple and Effective Unsupervised Speech Translation. CoRR abs/2210.10191 (2022) - [i37]Peng-Jen Chen, Kevin Tran, Yilin Yang, Jingfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Miguel Pino, Wei-Ning Hsu, Ann Lee:
Speech-to-Speech Translation For A Real-world Unwritten Language. CoRR abs/2211.06474 (2022) - [i36]Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed:
Continual Learning for On-Device Speech Recognition using Disentangled Conformers. CoRR abs/2212.01393 (2022) - [i35]Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli:
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language. CoRR abs/2212.07525 (2022) - [i34]Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi:
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement. CoRR abs/2212.11377 (2022) - 2021
- [j1]Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed:
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. IEEE ACM Trans. Audio Speech Lang. Process. 29: 3451-3460 (2021) - [c33]Wei-Ning Hsu, David Harwath, Tyler Miller, Christopher Song, James R. Glass:
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units. ACL/IJCNLP (1) 2021: 5284-5300 - [c32]Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed:
Kaizen: Continuously Improving Teacher Using Exponential Moving Average for Semi-Supervised Speech Recognition. ASRU 2021: 518-525 - [c31]Changhan Wang, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Ann Lee, Peng-Jen Chen, Jiatao Gu, Juan Pino:
fairseq S\^2: A Scalable and Integrable Speech Synthesis Toolkit. EMNLP (Demos) 2021: 143-152 - [c30]Wei-Ning Hsu, Yao-Hung Hubert Tsai, Benjamin Bolte, Ruslan Salakhutdinov, Abdelrahman Mohamed:
Hubert: How Much Can a Bad Teacher Benefit ASR Pre-Training? ICASSP 2021: 6533-6537 - [c29]Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli:
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training. Interspeech 2021: 721-725 - [c28]Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux:
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations. Interspeech 2021: 3615-3619 - [c27]Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli:
Unsupervised Speech Recognition. NeurIPS 2021: 27826-27839 - [c26]Wei-Ning Hsu, Ann Lee, Gabriel Synnaeve, Awni Y. Hannun:
Semi-Supervised end-to-end Speech Recognition via Local Prior Matching. SLT 2021: 125-132 - [i33]Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu Anh Nguyen, Jade Copet, Alexei Baevski, Adelrahman Mohamed, Emmanuel Dupoux:
Generative Spoken Language Modeling from Raw Audio. CoRR abs/2102.01192 (2021) - [i32]Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux:
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations. CoRR abs/2104.00355 (2021) - [i31]Wei-Ning Hsu, Anuroop Sriram, Alexei Baevski, Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Jacob Kahn, Ann Lee, Ronan Collobert, Gabriel Synnaeve, Michael Auli:
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training. CoRR abs/2104.01027 (2021) - [i30]Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli:
Unsupervised Speech Recognition. CoRR abs/2105.11084 (2021) - [i29]Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed:
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. CoRR abs/2106.07447 (2021) - [i28]Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed:
Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition. CoRR abs/2106.07759 (2021) - [i27]Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Miguel Pino, Wei-Ning Hsu:
Direct speech-to-speech translation with discrete units. CoRR abs/2107.05604 (2021) - [i26]Eugene Kharitonov, Ann Lee, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu:
Text-Free Prosody-Aware Generative Spoken Language Modeling. CoRR abs/2109.03264 (2021) - [i25]Changhan Wang, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Ann Lee, Peng-Jen Chen, Jiatao Gu, Juan Miguel Pino:
fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit. CoRR abs/2109.06912 (2021) - [i24]Xutai Ma, Hongyu Gong, Danni Liu, Ann Lee, Yun Tang, Peng-Jen Chen, Wei-Ning Hsu, Kenneth Heafield, Phillip Koehn, Juan Miguel Pino:
Direct simultaneous speech to speech translation. CoRR abs/2110.08250 (2021) - [i23]Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi:
Textless Speech Emotion Conversion using Decomposed and Discrete Representations. CoRR abs/2111.07402 (2021) - [i22]Ann Lee, Hongyu Gong, Paul-Ambroise Duquenne, Holger Schwenk, Peng-Jen Chen, Changhan Wang, Sravya Popuri, Juan Miguel Pino, Jiatao Gu, Wei-Ning Hsu:
Textless Speech-to-Speech Translation on Real Data. CoRR abs/2112.08352 (2021) - 2020
- [c25]David Harwath, Wei-Ning Hsu, James R. Glass:
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech. ICLR 2020 - [c24]Michael Gump, Wei-Ning Hsu, James R. Glass:
Unsupervised Methods for Evaluating Speech Representations. INTERSPEECH 2020: 170-174 - [c23]Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James R. Glass:
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning. INTERSPEECH 2020: 3790-3794 - [i21]Wei-Ning Hsu, Ann Lee, Gabriel Synnaeve, Awni Y. Hannun:
Semi-Supervised Speech Recognition via Local Prior Matching. CoRR abs/2002.10336 (2020) - [i20]Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James R. Glass:
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning. CoRR abs/2006.02547 (2020) - [i19]Awni Y. Hannun, Vineel Pratap, Jacob Kahn, Wei-Ning Hsu:
Differentiable Weighted Finite-State Transducers. CoRR abs/2010.01003 (2020) - [i18]Wei-Ning Hsu, David Harwath, Christopher Song, James R. Glass:
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units. CoRR abs/2012.15454 (2020)
2010 – 2019
- 2019
- [c22]Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Yu-An Chung, Yuxuan Wang, Yonghui Wu, James R. Glass:
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization. ICASSP 2019: 5901-5905 - [c21]Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, Yu Zhang, R. J. Skerry-Ryan:
Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis. ICASSP 2019: 6940-6944 - [c20]Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang:
Hierarchical Generative Modeling for Controllable Speech Synthesis. ICLR (Poster) 2019 - [c19]Yu-An Chung, Wei-Ning Hsu, Hao Tang, James R. Glass:
An Unsupervised Autoregressive Model for Speech Representation Learning. INTERSPEECH 2019: 146-150 - [c18]Wei-Ning Hsu, David Harwath, James R. Glass:
Transfer Learning from Audio-Visual Grounding to Speech Recognition. INTERSPEECH 2019: 3242-3246 - [i17]Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia Xu Chen, Ye Jia, Anjuli Kannan, Tara N. Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George F. Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel Bacchiani, Thomas B. Jablin, Robert Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon:
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling. CoRR abs/1902.08295 (2019) - [i16]Yu-An Chung, Wei-Ning Hsu, Hao Tang, James R. Glass:
An Unsupervised Autoregressive Model for Speech Representation Learning. CoRR abs/1904.03240 (2019) - [i15]Wei-Ning Hsu, David F. Harwath, James R. Glass:
Transfer Learning from Audio-Visual Grounding to Speech Recognition. CoRR abs/1907.04355 (2019) - [i14]David Harwath, Wei-Ning Hsu, James R. Glass:
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech. CoRR abs/1911.09602 (2019) - 2018
- [c17]Wei-Ning Hsu, James R. Glass:
Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition. ICASSP 2018: 5614-5618 - [c16]Siqi Zheng, Jianzong Wang, Jing Xiao, Wei-Ning Hsu, James R. Glass:
A Noise-Robust Self-Adaptive Multitarget Speaker Detection System. ICPR 2018: 1068-1072 - [c15]Wei-Ning Hsu, James R. Glass:
Scalable Factorized Hierarchical Variational Autoencoder Training. INTERSPEECH 2018: 1462-1466 - [c14]Wei-Ning Hsu, Hao Tang, James R. Glass:
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition. INTERSPEECH 2018: 1576-1580 - [c13]Hao Tang, Wei-Ning Hsu, François Grondin, James R. Glass:
A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition. INTERSPEECH 2018: 2928-2932 - [c12]Suwon Shon, Wei-Ning Hsu, James R. Glass:
Unsupervised Representation Learning of Speech for Dialect Identification. SLT 2018: 105-111 - [i13]Wei-Ning Hsu, James R. Glass:
Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition. CoRR abs/1803.02551 (2018) - [i12]Wei-Ning Hsu, James R. Glass:
Scalable Factorized Hierarchical Variational Autoencoder Training. CoRR abs/1804.03201 (2018) - [i11]Wei-Ning Hsu, James R. Glass:
Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data. CoRR abs/1805.11264 (2018) - [i10]Hao Tang, Wei-Ning Hsu, François Grondin, James R. Glass:
A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition. CoRR abs/1806.04841 (2018) - [i9]Wei-Ning Hsu, Hao Tang, James R. Glass:
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition. CoRR abs/1806.04872 (2018) - [i8]Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, Yu Zhang, R. J. Skerry-Ryan:
Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis. CoRR abs/1808.10128 (2018) - [i7]Suwon Shon, Wei-Ning Hsu, James R. Glass:
Unsupervised Representation Learning of Speech for Dialect Identification. CoRR abs/1809.04458 (2018) - [i6]Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang:
Hierarchical Generative Modeling for Controllable Speech Synthesis. CoRR abs/1810.07217 (2018) - 2017
- [c11]Wei-Ning Hsu, Yu Zhang, James R. Glass:
Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation. ASRU 2017: 16-23 - [c10]Maryam Najafian, Wei-Ning Hsu, Ahmed Ali, James R. Glass:
Automatic speech recognition of Arabic multi-genre broadcast media. ASRU 2017: 353-359 - [c9]Wei-Ning Hsu, Yu Zhang, James R. Glass:
Learning Latent Representations for Speech Generation and Transformation. INTERSPEECH 2017: 1273-1277 - [c8]Wei-Ning Hsu, Yu Zhang, James R. Glass:
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data. NIPS 2017: 1878-1889 - [i5]Wei-Ning Hsu, Yu Zhang, James R. Glass:
Learning Latent Representations for Speech Generation and Transformation. CoRR abs/1704.04222 (2017) - [i4]Wei-Ning Hsu, Yu Zhang, James R. Glass:
Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation. CoRR abs/1707.06265 (2017) - [i3]Wei-Ning Hsu, Yu Zhang, James R. Glass:
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data. CoRR abs/1709.07902 (2017) - 2016
- [c7]Salvatore Romeo, Giovanni Da San Martino, Alberto Barrón-Cedeño, Alessandro Moschitti, Yonatan Belinkov, Wei-Ning Hsu, Yu Zhang, Mitra Mohtarami, James R. Glass:
Neural Attention for Learning to Rank Questions in Community Question Answering. COLING 2016: 1734-1745 - [c6]Wei-Ning Hsu, Yu Zhang, Ann Lee, James R. Glass:
Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition. INTERSPEECH 2016: 395-399 - [c5]Mitra Mohtarami, Yonatan Belinkov, Wei-Ning Hsu, Yu Zhang, Tao Lei, Kfir Bar, Scott Cyphers, James R. Glass:
SLS at SemEval-2016 Task 3: Neural-based Approaches for Ranking in Community Question Answering. SemEval@NAACL-HLT 2016: 828-835 - [c4]Tuka Al Hanai, Wei-Ning Hsu, James R. Glass:
Development of the MIT ASR system for the 2016 Arabic Multi-genre Broadcast Challenge. SLT 2016: 299-304 - [c3]Wei-Ning Hsu, Yu Zhang, James R. Glass:
A prioritized grid long short-term memory RNN for speech recognition. SLT 2016: 467-473 - [i2]Wei-Ning Hsu, Yu Zhang, James R. Glass:
Recurrent Neural Network Encoder with Attention for Community Question Answering. CoRR abs/1603.07044 (2016) - 2015
- [c2]Wei-Ning Hsu, Hsuan-Tien Lin:
Active Learning by Learning. AAAI 2015: 2659-2665 - [c1]Cheng-Tao Chung, Wei-Ning Hsu, Cheng-Yi Lee, Lin-Shan Lee:
Enhancing automatically discovered multi-level acoustic patterns considering context consistency with applications in spoken term detection. ICASSP 2015: 5231-5235 - [i1]Cheng-Tao Chung, Wei-Ning Hsu, Cheng-Yi Lee, Lin-Shan Lee:
Enhancing Automatically Discovered Multi-level Acoustic Patterns Considering Context Consistency With Applications in Spoken Term Detection. CoRR abs/1509.02217 (2015)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-07 20:32 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint