


Остановите войну!
for scientists:


default search action
21st Interspeech 2020: Shanghai, China
- Helen Meng, Bo Xu, Thomas Fang Zheng:
Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020. ISCA 2020
Keynote 1
- Janet B. Pierrehumbert:
The cognitive status of simple and complex models.
ASR Neural Network Architectures I
- Jinyu Li
, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu:
On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition. 1-5 - Zhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin
:
SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition. 6-10 - Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf:
Contextual RNN-T for Open Domain ASR. 11-15 - Jing Pan, Joshua Shapiro, Jeremy Wohlwend, Kyu Jeong Han, Tao Lei, Tao Ma:
ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition. 16-20 - Deepak Kadetotad, Jian Meng, Visar Berisha, Chaitali Chakrabarti, Jae-sun Seo:
Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity. 21-25 - Timo Lohrenz
, Tim Fingscheidt
:
BLSTM-Driven Stream Fusion for Automatic Speech Recognition: Novel Methods and a Multi-Size Window Fusion Example. 26-30 - Ngoc-Quan Pham, Thanh-Le Ha, Tuan-Nam Nguyen, Thai-Son Nguyen, Elizabeth Salesky
, Sebastian Stüker, Jan Niehues
, Alex Waibel:
Relative Positional Encoding for Speech Recognition and Direct Translation. 31-35 - Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka:
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers. 36-40 - Takashi Fukuda, Samuel Thomas:
Implicit Transfer of Privileged Acoustic Information in a Generalized Knowledge Distillation Framework. 41-45 - Jinhwan Park, Wonyong Sung:
Effect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech Recognition. 46-50
Multi-Channel Speech Enhancement
- Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Zhanlei Yang, Longshuai Xiao:
Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition. 51-55 - Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Chao Weng, Jianming Liu, Dong Yu:
Neural Spatio-Temporal Beamformer for Target Speech Separation. 56-60 - Li Li, Kazuhito Koishida, Shoji Makino
:
Online Directional Speech Enhancement Using Geometrically Constrained Independent Vector Analysis. 61-65 - Meng Yu, Xuan Ji, Bo Wu, Dan Su, Dong Yu:
End-to-End Multi-Look Keyword Spotting. 66-70 - Weilong Huang, Jinwei Feng:
Differential Beamforming for Uniform Circular Array with Directional Microphones. 71-75 - Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee:
Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement. 76-80 - Jian Wu, Zhuo Chen, Jinyu Li
, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie:
An End-to-End Architecture of Online Multi-Channel Speech Separation. 81-85 - Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi:
Mentoring-Reverse Mentoring for Unsupervised Multi-Channel Speech Source Separation. 86-90 - Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita
, Hiroshi Sawada, Shoko Araki:
Computationally Efficient and Versatile Framework for Joint Optimization of Blind Speech Separation and Dereverberation. 91-95 - Yanhui Tu, Jun Du, Lei Sun, Feng Ma, Jia Pan, Chin-Hui Lee:
A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge. 96-100
Speech Processing in the Brain
- Youssef Hmamouche, Laurent Prévot
, Magalie Ochs, Thierry Chaminade
:
Identifying Causal Relationships Between Behavior and Local Brain Activity During Natural Conversation. 101-105 - Di Zhou
, Gaoyan Zhang, Jianwu Dang, Shuang Wu, Zhuo Zhang:
Neural Entrainment to Natural Speech Envelope Based on Subject Aligned EEG Signals. 106-110 - Chongyuan Lian, Tianqi Wang, Mingxiao Gu, Manwa L. Ng, Feiqi Zhu, Lan Wang, Nan Yan:
Does Lexical Retrieval Deteriorate in Patients with Mild Cognitive Impairment? Analysis of Brain Functional Network Will Tell. 111-115 - Zhen Fu, Jing Chen:
Congruent Audiovisual Speech Enhances Cortical Envelope Tracking During Auditory Selective Attention. 116-120 - Lei Wang, Ed X. Wu, Fei Chen:
Contribution of RMS-Level-Based Speech Segments to Target Speech Decoding Under Noisy Conditions. 121-124 - Bin Zhao, Jianwu Dang, Gaoyan Zhang, Masashi Unoki
:
Cortical Oscillatory Hierarchy for Natural Sentence Processing. 125-129 - Louis ten Bosch
, Kimberley Mulder, Lou Boves:
Comparing EEG Analyses with Different Epoch Alignments in an Auditory Lexical Decision Experiment. 130-134 - Tanya Talkar, Sophia Yuditskaya, James R. Williamson, Adam C. Lammert, Hrishikesh Rao, Daniel J. Hannon, Anne T. O'Brien, Gloria Vergara-Diaz, Richard DeLaura, Douglas E. Sturim, Gregory A. Ciccarelli, Ross Zafonte, Jeff Palmer, Paolo Bonato, Thomas F. Quatieri:
Detection of Subclinical Mild Traumatic Brain Injury (mTBI) Through Speech and Gait. 135-139
Speech Signal Representation
- Joel Shor, Aren Jansen, Ronnie Maor, Oran Lang, Omry Tuval, Félix de Chaumont Quitry, Marco Tagliasacchi, Ira Shavitt, Dotan Emanuel, Yinnon Haviv:
Towards Learning a Universal Non-Semantic Representation of Speech. 140-144 - Rajeev Rajan, Aiswarya Vinod Kumar, Ben P. Babu:
Poetic Meter Classification Using i-Vector-MTF Fusion. 145-149 - Wang Dai, Jinsong Zhang, Yingming Gao, Wei Wei, Dengfeng Ke, Binghuai Lin, Yanlu Xie:
Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism. 150-154 - Na Hu, Berit Janssen, Judith Hanssen, Carlos Gussenhoven, Aoju Chen
:
Automatic Analysis of Speech Prosody in Dutch. 155-159 - Adrien Gresse, Mathias Quillot, Richard Dufour, Jean-François Bonastre:
Learning Voice Representation Using Knowledge Distillation for Automatic Voice Casting. 160-164 - B. Yegnanarayana, Joseph M. Anand, Vishala Pannala
:
Enhancing Formant Information in Spectrographic Display of Speech. 165-169 - Michael Gump, Wei-Ning Hsu, James R. Glass:
Unsupervised Methods for Evaluating Speech Representations. 170-174 - Dung N. Tran, Uros Batricevic, Kazuhito Koishida:
Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise Environments. 175-179 - Amrith Setlur, Barnabás Póczos, Alan W. Black:
Nonlinear ISA with Auxiliary Variables for Learning Speech Representations. 180-184 - Hirotoshi Takeuchi, Kunio Kashino, Yasunori Ohishi, Hiroshi Saruwatari:
Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals. 185-189
Speech Synthesis: Neural Waveform Generation I
- Yang Ai, Zhen-Hua Ling:
Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders. 190-194 - Qiao Tian, Zewang Zhang, Heng Lu, Ling-Hui Chen, Shan Liu
:
FeatherWave: An Efficient High-Fidelity Neural Vocoder with Multi-Band Linear Prediction. 195-199 - Jinhyeok Yang, Junmo Lee, Young-Ik Kim, Hoon-Young Cho, Injung Kim:
VocGAN: A High-Fidelity Real-Time Vocoder with a Hierarchically-Nested Adversarial Network. 200-204 - Hiroki Kanagawa, Yusuke Ijima:
Lightweight LPCNet-Based Neural Vocoder with Tensor Decomposition. 205-209 - Po-Chun Hsu, Hung-yi Lee:
WG-WaveNet: Real-Time High-Fidelity Speech Synthesis Without GPU. 210-214 - Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber:
What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS. 215-219 - Vadim Popov, Stanislav Kamenev, Mikhail A. Kudinov, Sergey Repyevsky, Tasnima Sadekova, Vitalii Bushaev, Vladimir Kryzhanovskiy, Denis Parkhomenko:
Fast and Lightweight On-Device TTS with Tacotron2 and LPCNet. 220-224 - Wei Song, Guanghui Xu, Zhengchen Zhang, Chao Zhang, Xiaodong He, Bowen Zhou:
Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced Speed. 225-229 - Sébastien Le Maguer, Naomi Harte:
Can Auditory Nerve Models Tell us What's Different About WaveNet Vocoded Speech? 230-234 - Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou:
Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions. 235-239 - Zhijun Liu, Kuan Chen, Kai Yu:
Neural Homomorphic Vocoder. 240-244
Automatic Speech Recognition for Non-Native Children’s Speech
- Roberto Gretter, Marco Matassoni, Daniele Falavigna, Keelan Evanini, Chee Wee Leong:
Overview of the Interspeech TLT2020 Shared Task on ASR for Non-Native Children's Speech. 245-249 - Tien-Hong Lo, Fu-An Chao, Shi-Yan Weng, Berlin Chen:
The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR Challenge. 250-254 - Kate M. Knill, Linlin Wang, Yu Wang, Xixin Wu, Mark J. F. Gales:
Non-Native Children's Automatic Speech Recognition: The INTERSPEECH 2020 Shared Task ALTA Systems. 255-259 - Hemant Kumar Kathania, Mittul Singh
, Tamás Grósz
, Mikko Kurimo:
Data Augmentation Using Prosody and False Starts to Recognize Non-Native Children's Speech. 260-264 - Mostafa Ali Shahin, Renée Lu, Julien Epps, Beena Ahmed:
UNSW System Description for the Shared Task on Automatic Speech Recognition for Non-Native Children's Speech. 265-268
Speaker Diarization
- Shota Horiguchi, Yusuke Fujita, Shinji Watanabe
, Yawen Xue, Kenji Nagamatsu:
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors. 269-273 - Ivan Medennikov, Maxim Korenevsky, Tatiana Prisyach, Yuri Y. Khokhlov, Mariya Korenevskaya, Ivan Sorokin, Tatiana Timofeeva, Anton Mitrofanov
, Andrei Andrusenko
, Ivan Podluzhny, Aleksandr Laptev
, Aleksei Romanenko
:
Target-Speaker Voice Activity Detection: A Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario. 274-278 - Hagai Aronowitz, Weizhong Zhu, Masayuki Suzuki, Gakuto Kurata, Ron Hoory:
New Advances in Speaker Diarization. 279-283 - Qingjian Lin, Yu Hou, Ming Li:
Self-Attentive Similarity Measurement Strategies in Speaker Diarization. 284-288 - Jixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno:
Speaker Attribution with Voice Profiles by Graph-Based Semi-Supervised Learning. 289-293 - Prachi Singh, Sriram Ganapathy:
Deep Self-Supervised Hierarchical Clustering for Speaker Diarization. 294-298 - Joon Son Chung, Jaesung Huh, Arsha Nagrani, Triantafyllos Afouras, Andrew Zisserman:
Spot the Conversation: Speaker Diarisation in the Wild. 299-303
Noise Robust and Distant Speech Recognition
- Wangyou Zhang, Yanmin Qian:
Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition. 304-308 - Zhihao Du, Jiqing Han, Xueliang Zhang:
Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition. 309-313 - Antoine Bruguier, Ananya Misra, Arun Narayanan, Rohit Prabhavalkar
:
Anti-Aliasing Regularization in Stacking Layers. 314-318 - Andrei Andrusenko
, Aleksandr Laptev
, Ivan Medennikov:
Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription. 319-323 - Wangyou Zhang, Aswin Shanmugam Subramanian
, Xuankai Chang, Shinji Watanabe
, Yanmin Qian:
End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming. 324-328 - Xinchi Qiu, Titouan Parcollet, Mirco Ravanelli, Nicholas D. Lane, Mohamed Morchid:
Quaternion Neural Networks for Multi-Channel Distant Speech Recognition. 329-333 - Hangting Chen, Pengyuan Zhang, Qian Shi, Zuozhen Liu:
Improved Guided Source Separation Integrated with a Strong Back-End for the CHiME-6 Dinner Party Scenario. 334-338 - Dongmei Wang, Zhuo Chen, Takuya Yoshioka:
Neural Speech Separation Using Spatially Distributed Microphones. 339-343 - Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu:
Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones. 344-348 - Jack Deadman, Jon Barker:
Simulating Realistically-Spatialised Simultaneous Speech Using Video-Driven Speaker Detection and the CHiME-5 Dataset. 349-353
Speech in Multimodality
- Catarina Botelho, Lorenz Diener, Dennis Küster
, Kevin Scheck, Shahin Amiriparian, Björn W. Schuller, Tanja Schultz
, Alberto Abad
, Isabel Trancoso:
Toward Silent Paralinguistics: Speech-to-EMG - Retrieving Articulatory Muscle Activity from Speech. 354-358 - Jiaxuan Zhang, Sarah Ita Levitan
, Julia Hirschberg:
Multimodal Deception Detection Using Automatically Extracted Acoustic, Visual, and Lexical Features. 359-363 - Zexu Pan, Zhaojie Luo
, Jichen Yang, Haizhou Li:
Multi-Modal Attention for Speech Emotion Recognition. 364-368 - Guang Shen, Riwei Lai, Rui Chen, Yu Zhang, Kejia Zhang, Qilong Han, Hongtao Song:
WISE: Word-Level Interaction-Based Multimodal Fusion for Speech Emotion Recognition. 369-373 - Ming Chen, Xudong Zhao:
A Multi-Scale Fusion Framework for Bimodal Speech Emotion Recognition. 374-378 - Pengfei Liu
, Kun Li, Helen Meng:
Group Gated Fusion on Attention-Based Bidirectional Alignment for Multimodal Emotion Recognition. 379-383 - Aparna Khare
, Srinivas Parthasarathy, Shiva Sundaram:
Multi-Modal Embeddings Using Multi-Task Learning for Emotion Recognition. 384-388 - Jeng-Lin Li, Chi-Chun Lee:
Using Speaker-Aligned Graph Memory Block in Multimodally Attentive Emotion Recognition Network. 389-393 - Zheng Lian
, Jianhua Tao, Bin Liu, Jian Huang, Zhanlei Yang, Rongjun Li:
Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition. 394-398
Speech, Language, and Multimodal Resources
- Bo Yang, Xianlong Tan, Zhengmao Chen, Bing Wang, Min Ruan, Dan Li, Zhongping Yang, Xiping Wu, Yi Lin
:
ATCSpeech: A Multilingual Pilot-Controller Speech Corpus from Real Air Traffic Control Environment. 399-403 - Alexander Gutkin
, Isin Demirsahin, Oddur Kjartansson, Clara Rivera, Kólá Túbosún:
Developing an Open-Source Corpus of Yoruba Speech. 404-408 - Jung-Woo Ha, Kihyun Nam, Jingu Kang, Sang-Woo Lee, Sohee Yang, Hyunhoon Jung, Hyeji Kim, Eunmi Kim, Soojin Kim, Hyun Ah Kim, Kyoungtae Doh, Chan Kyu Lee, Nako Sung, Sunghun Kim:
ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers. 409-413 - Yanhong Wang, Huan Luan, Jiahong Yuan, Bin Wang, Hui Lin:
LAIX Corpus of Chinese Learner English: Towards a Benchmark for L2 English ASR. 414-418 - Vikram Ramanarayanan:
Design and Development of a Human-Machine Dialog Corpus for the Automated Assessment of Conversational English Proficiency. 419-423 - Si Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee, Kathy Yuet-Sheung Lee, Michael Chi-Fai Tong
:
CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment. 424-428 - Katri Leino, Juho Leinonen, Mittul Singh
, Sami Virpioja
, Mikko Kurimo:
FinChat: Corpus and Evaluation Setup for Finnish Chat Conversations on Everyday Topics. 429-433 - Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas:
DiPCo - Dinner Party Corpus. 434-436 - Bo Wang, Yue Wu, Niall Taylor, Terry J. Lyons, Maria Liakata, Alejo J. Nevado-Holgado, Kate E. A. Saunders:
Learning to Detect Bipolar Disorder and Borderline Personality Disorder with Language and Speech in Non-Clinical Interviews. 437-441 - Andreas Kirkedal, Marija Stepanovic, Barbara Plank
:
FT Speech: Danish Parliament Speech Corpus. 442-446
Language Recognition
- Raphaël Duroselle, Denis Jouvet, Irina Illina:
Metric Learning Loss Functions to Reduce Domain Mismatch in the x-Vector Space for Language Recognition. 447-451 - Zheng Li, Miao Zhao, Jing Li, Yiming Zhi, Lin Li, Qingyang Hong:
The XMUSPEECH System for the AP19-OLR Challenge. 452-456 - Zheng Li, Miao Zhao, Jing Li, Lin Li, Qingyang Hong:
On the Usage of Multi-Feature Integration for Speaker Verification and Language Identification. 457-461 - Shammur A. Chowdhury, Ahmed Ali, Suwon Shon, James R. Glass:
What Does an End-to-End Dialect Identification Model Learn About Non-Dialectal Information? 462-466 - Matias Lindgren, Tommi Jauhiainen
, Mikko Kurimo:
Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets. 467-471 - Aitor Arronte Alvarez
, Elsayed Sabry Abdelaal Issa:
Learning Intonation Pattern Embeddings for Arabic Dialect Identification. 472-476 - Badr M. Abdullah, Tania Avgustinova, Bernd Möbius, Dietrich Klakow:
Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages. 477-481
Speech Processing and Analysis
- Noé Tits, Kevin El Haddad, Thierry Dutoit:
ICE-Talk: An Interface for a Controllable Expressive Talking Machine. 482-483 - Mathieu Hu, Laurent Pierron, Emmanuel Vincent, Denis Jouvet:
Kaldi-Web: An Installation-Free, On-Device Speech Recognition System. 484-485 - Amelia C. Kelly, Eleni Karamichali, Armin Saeb, Karel Veselý, Nicholas Parslow, Agape Deng, Arnaud Letondor, Robert O'Regan, Qiru Zhou:
Soapbox Labs Verification Platform for Child Speech. 486-487 - Amelia C. Kelly, Eleni Karamichali, Armin Saeb, Karel Veselý, Nicholas Parslow, Gloria Montoya Gomez, Agape Deng, Arnaud Letondor, Niall Mullally, Adrian Hempel, Robert O'Regan, Qiru Zhou:
SoapBox Labs Fluency Assessment Platform for Child Speech. 488-489 - Baybars Külebi, Alp Öktem, Alex Peiró Lilja, Santiago Pascual, Mireia Farrús:
CATOTRON - A Neural Text-to-Speech System in Catalan. 490-491 - Vikram Ramanarayanan, Oliver Roesler, Michael Neumann, David Pautler, Doug Habberstad, Andrew Cornish, Hardik Kothare, Vignesh Murali, Jackson Liscombe, Dirk Schnelle-Walka, Patrick L. Lange, David Suendermann-Oeft:
Toward Remote Patient Monitoring of Speech, Video, Cognitive and Respiratory Biomarkers Using Multimodal Dialog Technology. 492-493 - Baihan Lin, Xinxin Zhang:
VoiceID on the Fly: A Speaker Recognition System that Learns from Scratch. 494-495
Speech Emotion Recognition I
- Zhao Ren, Jing Han, Nicholas Cummins, Björn W. Schuller:
Enhancing Transferability of Black-Box Adversarial Attacks via Lifelong Learning for Speech Emotion Recognition Models. 496-500 - Han Feng, Sei Ueno, Tatsuya Kawahara
:
End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model. 501-505 - Bo-Hao Su, Chun-Min Chang, Yun-Shao Lin, Chi-Chun Lee:
Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network. 506-510 - Adria Mallol-Ragolta, Nicholas Cummins, Björn W. Schuller:
An Investigation of Cross-Cultural Semi-Supervised Learning for Continuous Affect Recognition. 511-515 - Kusha Sridhar, Carlos Busso
:
Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition. 516-520 - Siddique Latif
, Muhammad Asim, Rajib Rana, Sara Khalifa, Raja Jurdak
, Björn W. Schuller:
Augmenting Generative Adversarial Networks for Speech Emotion Recognition. 521-525 - Vipula Dissanayake
, Haimo Zhang, Mark Billinghurst, Suranga Nanayakkara
:
Speech Emotion Recognition 'in the Wild' Using an Autoencoder. 526-530 - Shuiyang Mao, Pak-Chung Ching, Tan Lee:
Emotion Profile Refinery for Speech Emotion Classification. 531-535 - Sung-Lin Yeh, Yun-Shao Lin, Chi-Chun Lee:
Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation. 536-540
ASR Neural Network Architectures and Training I
- Kshitiz Kumar, Emilian Stoimenov, Hosam Khalil, Jian Wu:
Fast and Slow Acoustic Model. 541-545 - Takafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, Marc Delcroix
:
Self-Distillation for Improving CTC-Transformer-Based ASR Systems. 546-550 - Zoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury:
Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard. 551-555 - Zhehuai Chen, Andrew Rosenberg, Yu Zhang, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno:
Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection. 556-560