default search action
Rohit Prabhavalkar
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j2]Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe:
End-to-End Speech Recognition: A Survey. IEEE ACM Trans. Audio Speech Lang. Process. 32: 325-351 (2024) - [c73]Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal:
USM-Lite: Quantization and Sparsity Aware Fine-Tuning for Speech Recognition with Universal Speech Models. ICASSP 2024: 10756-10760 - [c72]Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno:
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models. ICASSP 2024: 11816-11820 - [c71]Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar:
Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR. NAACL (Industry Track) 2024: 315-323 - [c70]Weiran Wang, Rohit Prabhavalkar, Haozhe Shan, Zhong Meng, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Chengjian Zheng, Yanzhang He, Tara N. Sainath, Pedro Moreno Mengibar:
Massive End-to-end Speech Recognition Models with Time Reduction. NAACL-HLT 2024: 6206-6217 - [i58]Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno:
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models. CoRR abs/2402.17184 (2024) - [i57]Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar:
Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR. CoRR abs/2404.10180 (2024) - [i56]Lun Wang, Om Thakkar, Zhong Meng, Nicole Rafidi, Rohit Prabhavalkar, Arun Narayanan:
Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping. CoRR abs/2406.02004 (2024) - [i55]Zhong Meng, Zelin Wu, Rohit Prabhavalkar, Cal Peyser, Weiran Wang, Nanxin Chen, Tara N. Sainath, Bhuvana Ramabhadran:
Text Injection for Neural Contextual Biasing. CoRR abs/2406.02921 (2024) - 2023
- [c69]Guru Prakash Arumugam, Shuo-Yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, Quan Wang, Shaan Bijwadia:
Improved Long-Form Speech Recognition By Jointly Modeling The Primary And Non-Primary Speakers. ASRU 2023: 1-8 - [c68]Xingyu Cai, David Qiu, Shaojin Ding, Dongseong Hwang, Weiran Wang, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He:
Efficient Cascaded Streaming ASR System Via Frame Rate Reduction. ASRU 2023: 1-8 - [c67]Lillian Zhou, Yuxin Ding, Mingqing Chen, Harry Zhang, Rohit Prabhavalkar, Dhruv Guliani, Giovanni Motta, Rajiv Mathews:
The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections Through Federated Learning. ASRU 2023: 1-7 - [c66]Rami Botros, Rohit Prabhavalkar, Johan Schalkwyk, Ciprian Chelba, Tara N. Sainath, Françoise Beaufays:
Lego-Features: Exporting Modular Encoder Features for Streaming and Deliberation ASR. ICASSP 2023: 1-5 - [c65]Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw:
Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models. ICASSP 2023: 1-5 - [c64]W. Ronny Huang, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, David Rybach, Robert David, Rohit Prabhavalkar, Cyril Allauzen, Cal Peyser, Trevor D. Strohman:
E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model. ICASSP 2023: 1-5 - [c63]Soheil Khorram, Anshuman Tripathi, Jaeyoung Kim, Han Lu, Qian Zhang, Rohit Prabhavalkar, Hasim Sak:
Cross-Training: A Semi-Supervised Training Scheme for Speech Recognition. ICASSP 2023: 1-5 - [c62]Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang, Bo Li, Andrew Rosenberg, Bhuvana Ramabhadran:
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition. ICASSP 2023: 1-5 - [c61]Cal Peyser, Michael Picheny, Kyunghyun Cho, Rohit Prabhavalkar, W. Ronny Huang, Tara N. Sainath:
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale. ICASSP 2023: 1-5 - [c60]Tara N. Sainath, Rohit Prabhavalkar, Diamantino Caseiro, Pat Rondon, Cyril Allauzen:
Improving Contextual Biasing with Text Injection. ICASSP 2023: 1-5 - [c59]Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman:
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition. ICASSP 2023: 1-5 - [c58]Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath:
How to Estimate Model Transferability of Pre-Trained Speech Models? INTERSPEECH 2023: 456-460 - [c57]Cal Peyser, Zhong Meng, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho, Ke Hu:
Improving Joint Speech-Text Representations Without Alignment. INTERSPEECH 2023: 1354-1358 - [i54]Cal Peyser, W. Ronny Huang, Tara N. Sainath, Rohit Prabhavalkar, Michael Picheny, Kyunghyun Cho:
Dual Learning for Large Vocabulary On-Device ASR. CoRR abs/2301.04327 (2023) - [i53]Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman:
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition. CoRR abs/2301.07851 (2023) - [i52]Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang, Bo Li, Andrew Rosenberg, Bhuvana Ramabhadran:
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition. CoRR abs/2302.08583 (2023) - [i51]Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara N. Sainath, Pedro J. Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu:
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages. CoRR abs/2303.01037 (2023) - [i50]Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe:
End-to-End Speech Recognition: A Survey. CoRR abs/2303.03329 (2023) - [i49]Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw:
Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models. CoRR abs/2303.08343 (2023) - [i48]Rami Botros, Rohit Prabhavalkar, Johan Schalkwyk, Ciprian Chelba, Tara N. Sainath, Françoise Beaufays:
Lego-Features: Exporting modular encoder features for streaming and deliberation ASR. CoRR abs/2304.00173 (2023) - [i47]Cal Peyser, Michael Picheny, Kyunghyun Cho, Rohit Prabhavalkar, W. Ronny Huang, Tara N. Sainath:
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale. CoRR abs/2304.11053 (2023) - [i46]Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath:
How to Estimate Model Transferability of Pre-Trained Speech Models? CoRR abs/2306.01015 (2023) - [i45]Cal Peyser, Zhong Meng, Ke Hu, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho:
Improving Joint Speech-Text Representations Without Alignment. CoRR abs/2308.06125 (2023) - [i44]Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara N. Sainath, Pedro Moreno Mengibar:
Massive End-to-end Models for Short Search Queries. CoRR abs/2309.12963 (2023) - [i43]Lillian Zhou, Yuxin Ding, Mingqing Chen, Harry Zhang, Rohit Prabhavalkar, Dhruv Guliani, Giovanni Motta, Rajiv Mathews:
The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning. CoRR abs/2310.00141 (2023) - [i42]Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng, Ding Zhao, Tara N. Sainath, Pedro Moreno Mengibar:
Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm. CoRR abs/2310.00178 (2023) - [i41]Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Shivani Agrawal, Zhonglin Han, Jian Li, Amir Yazdanbakhsh:
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models. CoRR abs/2312.08553 (2023) - [i40]Guru Prakash Arumugam, Shuo-Yiin Chang, Tara N. Sainath, Rohit Prabhavalkar, Quan Wang, Shaan Bijwadia:
Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers. CoRR abs/2312.11123 (2023) - 2022
- [c56]Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu, Bo Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer:
Neural-FST Class Language Model for End-to-End Speech Recognition. ICASSP 2022: 6107-6111 - [c55]Tara N. Sainath, Yanzhang He, Arun Narayanan, Rami Botros, Weiran Wang, David Qiu, Chung-Cheng Chiu, Rohit Prabhavalkar, Alexander Gruenstein, Anmol Gulati, Bo Li, David Rybach, Emmanuel Guzman, Ian McGraw, James Qin, Krzysztof Choromanski, Qiao Liang, Robert David, Ruoming Pang, Shuo-Yiin Chang, Trevor Strohman, W. Ronny Huang, Wei Han, Yonghui Wu, Yu Zhang:
Improving The Latency And Quality Of Cascaded Encoders. ICASSP 2022: 8112-8116 - [c54]Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach:
Improving Rare Word Recognition with LM-aware MWER Training. INTERSPEECH 2022: 1031-1035 - [c53]Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman:
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes. INTERSPEECH 2022: 1706-1710 - [c52]Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang:
Improving Deliberation by Text-Only and Semi-Supervised Training. INTERSPEECH 2022: 4940-4944 - [c51]W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Tara N. Sainath, Rohit Prabhavalkar, Cal Peyser, Zhiyun Lu, Cyril Allauzen:
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR. INTERSPEECH 2022: 4995-4999 - [c50]Tara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang, Zhouyuan Huo, Zhehuai Chen, Bo Li, Weiran Wang, Trevor Strohman:
JOIST: A Joint Speech and Text Streaming Model for ASR. SLT 2022: 52-59 - [c49]Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno:
Modular Hybrid Autoregressive Transducer. SLT 2022: 197-204 - [c48]Cal Peyser, W. Ronny Huang, Tara N. Sainath, Rohit Prabhavalkar, Michael Picheny, Kyunghyun Cho:
Dual Learning for Large Vocabulary On-Device ASR. SLT 2022: 245-251 - [i39]Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu, Bo Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer:
Neural-FST Class Language Model for End-to-End Speech Recognition. CoRR abs/2201.11867 (2022) - [i38]Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman:
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes. CoRR abs/2204.06164 (2022) - [i37]Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach:
Improving Rare Word Recognition with LM-aware MWER Training. CoRR abs/2204.07553 (2022) - [i36]W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Rohit Prabhavalkar, Tara N. Sainath, Cyril Allauzen, Cal Peyser, Zhiyun Lu:
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR. CoRR abs/2204.10749 (2022) - [i35]Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang:
Improving Deliberation by Text-Only and Semi-Supervised Training. CoRR abs/2206.14716 (2022) - [i34]Tara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang, Zhouyuan Huo, Zhehuai Chen, Bo Li, Weiran Wang, Trevor Strohman:
JOIST: A Joint Speech and Text Streaming Model For ASR. CoRR abs/2210.07353 (2022) - [i33]Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno:
Modular Hybrid Autoregressive Transducer. CoRR abs/2210.17049 (2022) - [i32]W. Ronny Huang, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, David Rybach, Robert David, Rohit Prabhavalkar, Cyril Allauzen, Cal Peyser, Trevor D. Strohman:
E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model. CoRR abs/2211.15432 (2022) - 2021
- [c47]Arun Narayanan, Tara N. Sainath, Ruoming Pang, Jiahui Yu, Chung-Cheng Chiu, Rohit Prabhavalkar, Ehsan Variani, Trevor Strohman:
Cascaded Encoders for Unifying Streaming and Non-Streaming ASR. ICASSP 2021: 5629-5633 - [c46]Rohit Prabhavalkar, Yanzhang He, David Rybach, Sean Campbell, Arun Narayanan, Trevor Strohman, Tara N. Sainath:
Less is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging. ICASSP 2021: 5659-5663 - [c45]David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara N. Sainath, Ian McGraw:
Learning Word-Level Confidence for Subword End-To-End ASR. ICASSP 2021: 6393-6397 - [c44]Nathan Howard, Alex Park, Turaj Zakizadeh Shabestary, Alexander Gruenstein, Rohit Prabhavalkar:
A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer and Large Scale Synthetic Data. ICASSP 2021: 7128-7132 - [c43]Daria Soboleva, Ondrej Skopek, Márius Sajgalík, Victor Carbune, Felix Weissenberger, Julia Proskurnia, Bogdan Prisacari, Daniel Valcarce, Justin Lu, Rohit Prabhavalkar, Balint Miklos:
Replacing Human Audio with Synthetic Audio for on-Device Unspoken Punctuation Prediction. ICASSP 2021: 7653-7657 - [c42]Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer:
Dynamic Encoder Transducer: A Flexible Solution for Trading Off Accuracy for Latency. Interspeech 2021: 2042-2046 - [c41]Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer:
Dissecting User-Perceived Latency of On-Device E2E Speech Recognition. Interspeech 2021: 4553-4557 - [c40]Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu:
RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions. SLT 2021: 873-880 - [i31]David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara N. Sainath, Ian McGraw:
Learning Word-Level Confidence For Subword End-to-End ASR. CoRR abs/2103.06716 (2021) - [i30]Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer:
Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency. CoRR abs/2104.02176 (2021) - [i29]Yuan Shangguan, Rohit Prabhavalkar, Hang Su, Jay Mahadeokar, Yangyang Shi, Jiatong Zhou, Chunyang Wu, Duc Le, Ozlem Kalinli, Christian Fuegen, Michael L. Seltzer:
Dissecting User-Perceived Latency of On-Device E2E Speech Recognition. CoRR abs/2104.02207 (2021) - [i28]Nathan Howard, Alex Park, Turaj Zakizadeh Shabestary, Alexander Gruenstein, Rohit Prabhavalkar:
A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer And Large Scale Synthetic Data. CoRR abs/2106.00856 (2021) - [i27]Zhiyun Lu, Yanwei Pan, Thibault Doutre, Liangliang Cao, Rohit Prabhavalkar, Chao Zhang, Trevor Strohman:
Input Length Matters: An Empirical Study Of RNN-T And MWER Training For Long-form Telephony Speech Recognition. CoRR abs/2110.03841 (2021) - 2020
- [c39]Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang, Ding Zhao:
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency. ICASSP 2020: 6059-6063 - [c38]Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prabhavalkar:
Deliberation Model Based Two-Pass End-To-End Speech Recognition. ICASSP 2020: 7799-7803 - [c37]Antoine Bruguier, Ananya Misra, Arun Narayanan, Rohit Prabhavalkar:
Anti-Aliasing Regularization in Stacking Layers. INTERSPEECH 2020: 314-318 - [i26]Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prabhavalkar:
Deliberation Model Based Two-Pass End-to-End Speech Recognition. CoRR abs/2003.07962 (2020) - [i25]Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alexander Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirkó Visontai, Yonghui Wu, Yu Zhang, Ding Zhao:
A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency. CoRR abs/2003.12710 (2020) - [i24]Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu:
RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions. CoRR abs/2005.03271 (2020) - [i23]Daria Soboleva, Ondrej Skopek, Márius Sajgalík, Victor Carbune, Felix Weissenberger, Julia Proskurnia, Bogdan Prisacari, Daniel Valcarce, Justin Lu, Rohit Prabhavalkar, Balint Miklos:
Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction. CoRR abs/2010.10203 (2020) - [i22]Arun Narayanan, Tara N. Sainath, Ruoming Pang, Jiahui Yu, Chung-Cheng Chiu, Rohit Prabhavalkar, Ehsan Variani, Trevor Strohman:
Cascaded encoders for unifying streaming and non-streaming ASR. CoRR abs/2010.14606 (2020) - [i21]Rohit Prabhavalkar, Yanzhang He, David Rybach, Sean Campbell, Arun Narayanan, Trevor Strohman, Tara N. Sainath:
Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging. CoRR abs/2012.06749 (2020)
2010 – 2019
- 2019
- [c36]Chung-Cheng Chiu, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara N. Sainath, Yonghui Wu, Wei Han, Yu Zhang, Ruoming Pang, Sergey Kishchenko, Patrick Nguyen, Arun Narayanan, Hank Liao, Shuyuan Zhang:
A Comparison of End-to-End Models for Long-Form Speech Recognition. ASRU 2019: 889-896 - [c35]Arun Narayanan, Rohit Prabhavalkar, Chung-Cheng Chiu, David Rybach, Tara N. Sainath, Trevor Strohman:
Recognizing Long-Form Speech Using Streaming End-to-End Models. ASRU 2019: 920-927 - [c34]Shuo-Yiin Chang, Rohit Prabhavalkar, Yanzhang He, Tara N. Sainath, Gabor Simko:
Joint Endpointing and Decoding with End-to-end Models. ICASSP 2019: 5626-5630 - [c33]Antoine Bruguier, Rohit Prabhavalkar, Golan Pundak, Tara N. Sainath:
Phoebe: Pronunciation-aware Contextualization for End-to-end Speech Recognition. ICASSP 2019: 6171-6175 - [c32]Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein:
Streaming End-to-end Speech Recognition for Mobile Devices. ICASSP 2019: 6381-6385 - [c31]Ke Hu, Antoine Bruguier, Tara N. Sainath, Rohit Prabhavalkar, Golan Pundak:
Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models. INTERSPEECH 2019: 2155-2159 - [c30]Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu:
Two-Pass End-to-End Speech Recognition. INTERSPEECH 2019: 2773-2777 - [c29]Kazuki Irie, Rohit Prabhavalkar, Anjuli Kannan, Antoine Bruguier, David Rybach, Patrick Nguyen:
On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition. INTERSPEECH 2019: 3800-3804 - [i20]Kazuki Irie, Rohit Prabhavalkar, Anjuli Kannan, Antoine Bruguier, David Rybach, Patrick Nguyen:
Model Unit Exploration for Sequence-to-Sequence Speech Recognition. CoRR abs/1902.01955 (2019) - [i19]Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia Xu Chen, Ye Jia, Anjuli Kannan, Tara N. Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George F. Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel Bacchiani, Thomas B. Jablin, Robert Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon:
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling. CoRR abs/1902.08295 (2019) - [i18]Ke Hu, Antoine Bruguier, Tara N. Sainath, Rohit Prabhavalkar, Golan Pundak:
Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models. CoRR abs/1906.09292 (2019) - [i17]Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian McGraw, Chung-Cheng Chiu:
Two-Pass End-to-End Speech Recognition. CoRR abs/1908.10992 (2019) - [i16]Arun Narayanan, Rohit Prabhavalkar, Chung-Cheng Chiu, David Rybach, Tara N. Sainath, Trevor Strohman:
Recognizing long-form speech using streaming end-to-end models. CoRR abs/1910.11455 (2019) - [i15]Chung-Cheng Chiu, Wei Han, Yu Zhang, Ruoming Pang, Sergey Kishchenko, Patrick Nguyen, Arun Narayanan, Hank Liao, Shuyuan Zhang, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara N. Sainath, Yonghui Wu:
A comparison of end-to-end models for long-form speech recognition. CoRR abs/1911.02242 (2019) - 2018
- [c28]Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani:
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models. ICASSP 2018: 4774-4778 - [c27]Rohit Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Kannan:
Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models. ICASSP 2018: 4839-4843 - [c26]Chris Donahue, Bo Li, Rohit Prabhavalkar:
Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition. ICASSP 2018: 5024-5028 - [c25]Anjuli Kannan, Yonghui Wu, Patrick Nguyen, Tara N. Sainath, Zhifeng Chen, Rohit Prabhavalkar:
An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model. ICASSP 2018: 5824-5828 - [c24]Tara N. Sainath, Rohit Prabhavalkar, Shankar Kumar, Seungji Lee, Anjuli Kannan, David Rybach, Vlad Schogol, Patrick Nguyen, Bo Li, Yonghui Wu, Zhifeng Chen, Chung-Cheng Chiu:
No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models. ICASSP 2018: 5859-5863 - [c23]Tara N. Sainath, Chung-Cheng Chiu, Rohit Prabhavalkar, Anjuli Kannan, Yonghui Wu, Patrick Nguyen, Zhifeng Chen:
Improving the Performance of Online Neural Transducer Models. ICASSP 2018: 5864-5868 - [c22]Ruoming Pang, Tara N. Sainath, Rohit Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang, Chung-Cheng Chiu:
Compression of End-to-End Models. INTERSPEECH 2018: 27-31 - [c21]Golan Pundak, Tara N. Sainath, Rohit Prabhavalkar, Anjuli Kannan, Ding Zhao:
Deep Context: End-to-end Contextual Speech Recognition. SLT 2018: 418-425 - [c20]Parisa Haghani, Arun Narayanan, Michiel Bacchiani, Galen Chuang, Neeraj Gaur, Pedro J. Moreno, Rohit Prabhavalkar, Zhongdi Qu, Austin Waters:
From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding. SLT 2018: 720-726 - [i14]Kanishka Rao, Hasim Sak, Rohit Prabhavalkar:
Exploring Architectures, Data and Units For Streaming End-to-End Speech Recognition with RNN-Transducer. CoRR abs/1801.00841 (2018) - [i13]Golan Pundak, Tara N. Sainath, Rohit Prabhavalkar, Anjuli Kannan, Ding Zhao:
Deep context: end-to-end contextual speech recognition. CoRR abs/1808.02480 (2018) - [i12]Parisa Haghani, Arun Narayanan, Michiel Bacchiani, Galen Chuang, Neeraj Gaur, Pedro J. Moreno, Rohit Prabhavalkar, Zhongdi Qu, Austin Waters:
From Audio to Semantics: Approaches to end-to-end spoken language understanding. CoRR abs/1809.09190 (2018) - [i11]Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-Yiin Chang, Kanishka Rao, Alexander Gruenstein:
Streaming End-to-end Speech Recognition For Mobile Devices. CoRR abs/1811.06621 (2018) - 2017
- [c19]Kanishka Rao, Hasim Sak, Rohit Prabhavalkar:
Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer. ASRU 2017: 193-199 - [c18]Yanzhang He, Rohit Prabhavalkar, Kanishka Rao, Wei Li, Anton Bakhtin, Ian McGraw:
Streaming small-footprint keyword spotting using sequence-to-sequence models. ASRU 2017: 474-481 - [c17]Rohit Prabhavalkar, Kanishka Rao, Tara N. Sainath, Bo Li, Leif Johnson, Navdeep Jaitly:
A Comparison of Sequence-to-Sequence Models for Speech Recognition. INTERSPEECH 2017: 939-943 - [c16]Rohit Prabhavalkar, Tara N. Sainath, Bo Li, Kanishka Rao, Navdeep Jaitly:
An Analysis of "Attention" in Sequence-to-Sequence Models. INTERSPEECH 2017: 3702-3706 - [i10]Yanzhang He, Rohit Prabhavalkar, Kanishka Rao, Wei Li, Anton Bakhtin, Ian McGraw:
Streaming Small-Footprint Keyword Spotting using Sequence-to-Sequence Models. CoRR abs/1710.09617 (2017) - [i9]Chris Donahue, Bo Li, Rohit Prabhavalkar:
Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition. CoRR abs/1711.05747 (2017) - [i8]Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Katya Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani:
State-of-the-art Speech Recognition With Sequence-to-Sequence Models. CoRR abs/1712.01769 (2017) - [i7]Tara N. Sainath, Chung-Cheng Chiu, Rohit Prabhavalkar, Anjuli Kannan, Yonghui Wu, Patrick Nguyen, Zhifeng Chen:
Improving the Performance of Online Neural Transducer Models. CoRR abs/1712.01807 (2017) - [i6]Rohit Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Kannan:
Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models. CoRR abs/1712.01818 (2017) - [i5]Tara N. Sainath, Rohit Prabhavalkar, Shankar Kumar, Seungji Lee, Anjuli Kannan, David Rybach, Vlad Schogol, Patrick Nguyen, Bo Li, Yonghui Wu, Zhifeng Chen, Chung-Cheng Chiu:
No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models. CoRR abs/1712.01864 (2017) - [i4]Anjuli Kannan, Yonghui Wu, Patrick Nguyen, Tara N. Sainath, Zhifeng Chen, Rohit Prabhavalkar:
An analysis of incorporating an external language model into a sequence-to-sequence model. CoRR abs/1712.01996 (2017) - 2016
- [c15]Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hasim Sak, Alexander Gruenstein, Françoise Beaufays, Carolina Parada:
Personalized speech recognition on mobile devices. ICASSP 2016: 5955-5959 - [c14]Rohit Prabhavalkar, Ouais Alsharif, Antoine Bruguier, Ian McGraw:
On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition. ICASSP 2016: 5970-5974 - [c13]Raziel Alvarez, Rohit Prabhavalkar, Anton Bakhtin:
On the Efficient Representation and Execution of Deep Acoustic Models. INTERSPEECH 2016: 2746-2750 - [i3]Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hasim Sak, Alexander Gruenstein, Françoise Beaufays, Carolina Parada:
Personalized Speech recognition on mobile devices. CoRR abs/1603.03185 (2016) - [i2]Rohit Prabhavalkar, Ouais Alsharif, Antoine Bruguier, Ian McGraw:
On the Compression of Recurrent Neural Networks with an Application to LVCSR acoustic modeling for Embedded Speech Recognition. CoRR abs/1603.08042 (2016) - [i1]Raziel Alvarez, Rohit Prabhavalkar, Anton Bakhtin:
On the efficient representation and execution of deep acoustic models. CoRR abs/1607.04683 (2016) - 2015
- [c12]Rohit Prabhavalkar, Raziel Alvarez, Carolina Parada, Preetum Nakkiran, Tara N. Sainath:
Automatic gain control and multi-style training for robust small-footprint keyword spotting with deep neural networks. ICASSP 2015: 4704-4708 - [c11]Preetum Nakkiran, Raziel Alvarez, Rohit Prabhavalkar, Carolina Parada:
Compressing deep neural networks using a rank-constrained topology. INTERSPEECH 2015: 1473-1477 - 2013
- [j1]Eric Fosler-Lussier, Yanzhang He, Preethi Jyothi, Rohit Prabhavalkar:
Conditional Random Fields in Speech, Audio, and Language Processing. Proc. IEEE 101(5): 1054-1075 (2013) - [c10]Rohit Prabhavalkar, Tara N. Sainath, David Nahamoo, Bhuvana Ramabhadran, Dimitri Kanevsky:
An evaluation of posterior modeling techniques for phonetic recognition. ICASSP 2013: 7165-7169 - [c9]Rohit Prabhavalkar, Karen Livescu, Eric Fosler-Lussier, Joseph Keshet:
Discriminative articulatory models for spoken term detection in low-resource conversational settings. ICASSP 2013: 8287-8291 - 2012
- [c8]Rohit Prabhavalkar, Jasha Droppo:
A chunk-based phonetic score for mobile voice search. ICASSP 2012: 4729-4732 - [c7]Rohit Prabhavalkar, Joseph Keshet, Karen Livescu, Eric Fosler-Lussier:
Discriminative spoken term detection with limited data. MLSLP 2012: 22-25 - 2011
- [c6]Rohit Prabhavalkar, Eric Fosler-Lussier, Karen Livescu:
A factored conditional random field model for articulatory feature forced transcription. ASRU 2011: 77-82 - [c5]Arild Brandrud Næss, Karen Livescu, Rohit Prabhavalkar:
Articulatory Feature Classification Using Nearest Neighbors. INTERSPEECH 2011: 2301-2304 - 2010
- [c4]Rohit Prabhavalkar, Eric Fosler-Lussier:
Backpropagation training for multilayer conditional random field based phone recognition. ICASSP 2010: 5534-5537 - [c3]John Woodruff, Rohit Prabhavalkar, Eric Fosler-Lussier, DeLiang Wang:
Combining monaural and binaural evidence for reverberant speech segregation. INTERSPEECH 2010: 406-409 - [c2]Rohit Prabhavalkar, Preethi Jyothi, William Hartmann, Jeremy Morris, Eric Fosler-Lussier:
Investigations into the Crandem Approach to Word Recognition. HLT-NAACL 2010: 725-728
2000 – 2009
- 2009
- [c1]Rohit Prabhavalkar, Zhaozhang Jin, Eric Fosler-Lussier:
Monaural segregation of voiced speech using discriminative random fields. INTERSPEECH 2009: 856-859
Coauthor Index
aka: Pedro Moreno Mengibar
aka: Trevor D. Strohman
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-07 21:09 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint