


default search action
SLT 2022: Doha, Qatar
- IEEE Spoken Language Technology Workshop, SLT 2022, Doha, Qatar, January 9-12, 2023. IEEE 2023, ISBN 979-8-3503-9690-4

- Vasista Sai Lodagala, Sreyan Ghosh, Srinivasan Umesh:

CCC-WAV2VEC 2.0: Clustering AIDED Cross Contrastive Self-Supervised Learning of Speech Representations. 1-8 - Hyung Yong Kim, Byeong-Yeol Kim, Seung Woo Yoo, Youshin Lim, Yunkyu Lim, Hanbin Lee:

ASBERT: ASR-Specific Self-Supervised Learning with Self-Training. 9-14 - Kai Zhen, Martin Radfar, Hieu Duy Nguyen, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris:

Sub-8-Bit Quantization for On-Device Speech Recognition: A Regularization-Free Approach. 15-22 - Gary Wang, Ekin D. Cubuk, Andrew Rosenberg, Shuyang Cheng, Ron J. Weiss, Bhuvana Ramabhadran, Pedro J. Moreno, Quoc V. Le, Daniel S. Park:

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR. 23-30 - David Qiu, Tsendsuren Munkhdalai, Yanzhang He, Khe Chai Sim:

Context-Aware Neural Confidence Estimation for Rare Word Speech Recognition. 31-37 - Antoine Bruguier, David Qiu, Trevor Strohman, Yanzhang He:

Flickering Reduction with Partial Hypothesis Reranking for Streaming ASR. 38-45 - Tatsuya Komatsu, Yusuke Fujita:

Interdecoder: using Attention Decoders as Intermediate Regularization for CTC-Based Speech Recognition. 46-51 - Tara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang, Zhouyuan Huo, Zhehuai Chen, Bo Li, Weiran Wang, Trevor Strohman:

JOIST: A Joint Speech and Text Streaming Model for ASR. 52-59 - Ke-Han Lu

, Kuan-Yu Chen:
A Context-Aware Knowledge Transferring Strategy for CTC-Based ASR. 60-67 - Zhehuai Chen, Ankur Bapna, Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Pedro J. Moreno, Nanxin Chen:

Maestro-U: Leveraging Joint Speech-Text Representation Learning for Zero Supervised Speech ASR. 68-75 - Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida:

Alternate Intermediate Conditioning with Syllable-Level and Character-Level Targets for Japanese ASR. 76-83 - Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu Jeong Han, Shinji Watanabe

:
E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition. 84-91 - Jinhwan Park, Sichen Jin, Junmo Park, Sungsoo Kim, Dhairya Sandhyana, Changheon Lee, Myoungji Han, Jungin Lee, Seokyeong Jung, Changwoo Han, Chanwoo Kim

:
Conformer-Based on-Device Streaming Speech Recognition with KD Compression and Two-Pass Architecture. 92-99 - Suhaila M. Shakiah, Rupak Vignesh Swaminathan, Hieu Duy Nguyen, Raviteja Chinta, Tariq Afzal, Nathan Susanj, Athanasios Mouchtaris, Grant P. Strimel, Ariya Rastrow:

Accelerator-Aware Training for Transducer-Based Speech Recognition. 100-107 - Lahiru Samarakoon, Ivan Fung:

Untied Positional Encodings for Efficient Transformer-Based Speech Recognition. 108-114 - Yan Gao, Javier Fernández-Marqués, Titouan Parcollet, Pedro P. B. de Gusmao, Nicholas D. Lane:

Match to Win: Analysing Sequences Lengths for Efficient Self-Supervised Learning in Speech and Audio. 115-122 - Peng Shen, Xugang Lu, Hisashi Kawai:

Pronunciation-Aware Unique Character Encoding for RNN Transducer-Based Mandarin Speech Recognition. 123-129 - Somshubra Majumdar, Shantanu Acharya

, Vitaly Lavrukhin, Boris Ginsburg:
Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition. 130-135 - Vasista Sai Lodagala, Sreyan Ghosh, Srinivasan Umesh:

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations. 136-143 - Fan Yu, Shiliang Zhang, Pengcheng Guo, Yuhao Liang, Zhihao Du, Yuxiao Lin, Lei Xie:

MFCCA:Multi-Frame Cross-Channel Attention for Multi-Speaker ASR in Multi-Party Meeting Scenario. 144-151 - Aleksandr Laptev, Boris Ginsburg:

Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition. 152-159 - Sungjun Han, Deepak Baby, Valentin Mendelev:

Residual Adapters for Targeted Updates in RNN-Transducer Based Speech Recognition System. 160-166 - Tian Li, Qingliang Meng

, Yujian Sun:
Improved Noisy Iterative Pseudo-Labeling for Semi-Supervised Speech Recognition. 167-173 - Aparna Khare

, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas:
Guided Contrastive Self-Supervised Pre-Training for Automatic Speech Recognition. 174-181 - Jakob Poncelet, Hugo Van hamme

:
Learning to Jointly Transcribe and Subtitle for End-To-End Spontaneous Speech Recognition. 182-189 - Tsendsuren Munkhdalai, Zelin Wu, Golan Pundak, Khe Chai Sim, Jiayang Li, Pat Rondon, Tara N. Sainath:

NAM+: Towards Scalable End-to-End Contextual Biasing for Adaptive ASR. 190-196 - Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno:

Modular Hybrid Autoregressive Transducer. 197-204 - Juan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Seyyed Saeed Sarfjoo, Petr Motlícek, Matthias Kleinert, Hartmut Helmke, Oliver Ohneiser

, Qingran Zhan:
How Does Pre-Trained Wav2Vec 2.0 Perform on Domain-Shifted Asr? an Extensive Benchmark on Air Traffic Control Communications. 205-212 - Adam Stooke, Khe Chai Sim, Mason Chua, Tsendsuren Munkhdalai, Trevor Strohman:

Internal Language Model Personalization of E2E Automatic Speech Recognition Using Random Encoder Features. 213-220 - Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski:

Towards End-to-End Unsupervised Speech Recognition. 221-228 - Albert Zeyer

, Robin Schmitt, Wei Zhou, Ralf Schlüter, Hermann Ney:
Monotonic Segmental Attention for Automatic Speech Recognition. 229-236 - Yashesh Gaur, Nick Kibre, Jian Xue, Kangyuan Shu, Yuhui Wang, Issac Alphanso, Jinyu Li, Yifan Gong:

Streaming, Fast and Accurate on-Device Inverse Text Normalization for Automatic Speech Recognition. 237-244 - Cal Peyser, W. Ronny Huang, Tara N. Sainath, Rohit Prabhavalkar

, Michael Picheny, Kyunghyun Cho:
Dual Learning for Large Vocabulary On-Device ASR. 245-251 - Aditya Patil, Vikas Joshi, Purvi Agrawal, Rupesh R. Mehta:

Streaming Bilingual End-to-End ASR Model Using Attention Over Multiple Softmax. 252-259 - Yoshiki Masuyama, Xuankai Chang, Samuele Cornell

, Shinji Watanabe
, Nobutaka Ono:
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation. 260-265 - Dongjune Lee, Minchan Kim

, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim:
Fully Unsupervised Training of Few-Shot Keyword Spotting. 266-272 - Chunxi Liu, Yuan Shangguan, Haichuan Yang, Yangyang Shi, Raghuraman Krishnamoorthi, Ozlem Kalinli:

Learning a Dual-Mode Speech Recognition Model VIA Self-Pruning. 273-279 - Ji Won Yoon, Beom Jun Woo, Sunghwan Ahn, Hyeonseung Lee, Nam Soo Kim:

Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition. 280-286 - Tina Raissi, Wei Zhou

, Simon Berger, Ralf Schlüter, Hermann Ney:
HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch. 287-294 - Vrunda N. Sukhadia, Srinivasan Umesh:

Domain Adaptation of Low-Resource Target-Domain Models Using Well-Trained ASR Conformer Models. 295-301 - Saket Dingliwal, Monica Sunkara, Srikanth Ronanki, Jeff Farris, Katrin Kirchhoff, Sravan Bodapati:

Personalization of CTC Speech Recognition Models. 302-309 - Shaan Bijwadia, Shuo-Yiin Chang, Bo Li, Tara N. Sainath, Chao Zhang, Yanzhang He:

Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems. 310-316 - Arun Narayanan, James Walker, Sankaran Panchapagesan, Nathan Howard, Yuma Koizumi:

Learning Mask Scalars for Improved Robust Automatic Speech Recognition. 317-323 - Niko Moritz, Frank Seide, Duc Le, Jay Mahadeokar, Christian Fuegen:

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition. 324-330 - Chanwoo Kim

, Sathish Indurti, Jinhwan Park, Wonyong Sung:
Macro-Block Dropout for Improved Regularization in Training End-to-End Speech Recognition Models. 331-338 - Ragheb Al-Ghezi, Yaroslav Getman

, Ekaterina Voskoboinik, Mittul Singh, Mikko Kurimo:
Automatic Rating of Spontaneous Speech for Low-Resource Languages. 339-345 - Benjamin Kleiner, Jack G. M. Fitzgerald, Haidar Khan, Gohkan Tur:

Mixture of Domain Experts for Language Understanding: an Analysis of Modularity, Task Performance, and Memory Tradeoffs. 346-352 - Anupama Chingacham, Vera Demberg, Dietrich Klakow:

A Data-Driven Investigation of Noise-Adaptive Utterance Generation with Linguistic Modification. 353-360 - Gaëlle Laperrière, Valentin Pelloin, Mickaël Rouvier, Themos Stafylakis

, Yannick Estève:
On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding. 361-368 - Jin Sakuma, Shinya Fujie, Tetsunori Kobayashi:

Response Timing Estimation for Spoken Dialog Systems Based on Syntactic Completeness Prediction. 369-374 - Jinzi Qi, Hugo Van hamme

:
Weak-Supervised Dysarthria-Invariant Features for Spoken Language Understanding Using an Fhvae and Adversarial Training. 375-381 - Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng:

Building Markovian Generative Architectures Over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems. 382-389 - Mohan Li, Rama Doddipatla

:
Non-Autoregressive End-to-End Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding. 390-397 - Yasufumi Moriya, Gareth J. F. Jones:

Improving Noise Robustness for Spoken Content Retrieval Using Semi-Supervised ASR and N-Best Transcripts for BERT-Based Ranking Models. 398-405 - Yifan Peng

, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe
:
A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding. 406-413 - Wei-Tsung Kao, Yuan-Kuei Wu

, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-Yi Lee:
On the Efficiency of Integrating Self-Supervised Learning and Meta-Learning for User-Defined Few-Shot Keyword Spotting. 414-421 - Liang Wen, Lizhong Wang, Ying Zhang, Kwang Pyo Choi

:
Multi-Stage Progressive Audio Bandwidth Extension. 422-427 - Sandipana Dowerah

, Romain Serizel, Denis Jouvet, Mohammad MohammadAmini, Driss Matrouf:
Joint Optimization of Diffusion Probabilistic-Based Multichannel Speech Enhancement with Far-Field Speaker Verification. 428-435 - Shubo Lv, Yihui Fu, Yukai Jv, Lei Xie, Weixin Zhu, Wei Rao, Yannan Wang:

Spatial-DCCRN: DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement. 436-443 - Martin Strauss, Matteo Torcoli

, Bernd Edler:
Improved Normalizing Flow-Based Speech Enhancement Using an all-Pole Gammatone Filterbank for Conditional Input Representation. 444-450 - Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu:

Exploring WavLM on Speech Enhancement. 451-457 - Yu-sheng Tsao, Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen:

Adaptive-FSN: Integrating Full-Band Extraction and Adaptive Sub-Band Encoding for Monaural Speech Enhancement. 458-464 - Andrea Lorena Aldana Blanco, Cassia Valentini-Botinhao, Ondrej Klejch, Mandar Gogate, Kia Dashtipour, Amir Hussain, Peter Bell:

AVSE Challenge: Audio-Visual Speech Enhancement Challenge. 465-471 - Yukai Ju, Shimin Zhang, Wei Rao, Yannan Wang, Tao Yu, Lei Xie, Shidong Shang:

TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement. 472-479 - Soumi Maiti, Yushi Ueda, Shinji Watanabe

, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu:
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. 480-487 - Qinghua Liu, Yating Huang, Yunzhe Hao, Jiaming Xu, Bo Xu:

LiMuSE: Lightweight Multi-Modal Speaker Extraction. 488-495 - Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe

, Yanmin Qian:
End-to-End Multi-Speaker ASR with Independent Vector Analysis. 496-501 - Wolfgang Mack, Emanuël A. P. Habets:

A Hybrid Acoustic Echo Reduction Approach Using Kalman Filtering and Informed Source Extraction with Improved Training. 502-508 - Chendong Zhao, Jianzong Wang

, Xiaoyang Qu, Haoqian Wang, Jing Xiao:
Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation. 509-516 - Tianyu Cao

, Laureano Moro-Velázquez, Piotr Zelasko, Jesús Villalba, Najim Dehak:
Vsameter: Evaluation of a New Open-Source Tool to Measure Vowel Space Area and Related Metrics. 517-524 - Tyler Vuong, Nikhil Madaan, Rohan Panda, Richard M. Stern:

Investigating the Important Temporal Modulations for Deep-Learning-Based Speech Activity Detection. 525-531 - Anna Favaro

, Chelsie Motley, Tianyu Cao
, Miguel Iglesias
, Ankur A. Butala
, Esther S. Oh, Robert D. Stevens, Jesús Villalba, Najim Dehak, Laureano Moro-Velázquez:
A Multi-Modal Array of Interpretable Features to Evaluate Language and Speech Patterns in Different Neurological Disorders. 532-539 - Donghyeon Kim, Jeong-gi Kwak, Hanseok Ko

:
Efficient Dynamic Filter For Robust and Low Computational Feature Extraction. 540-547 - Sung Hwan Mun, Jee-weon Jung, Min Hyun Han, Nam Soo Kim:

Frequency and Multi-Scale Selective Kernel Attention for Speaker Verification. 548-554 - Junyi Peng, Oldrich Plchot, Themos Stafylakis

, Ladislav Mosner, Lukás Burget, Jan Cernocký:
An Attention-Based Backend Allowing Efficient Fine-Tuning of Transformer Models for Speaker Verification. 555-562 - Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan:

Flow-ER: A Flow-Based Embedding Regularization Strategy for Robust Speech Representation Learning. 563-570 - Ismail Rasim Ülgen, Levent M. Arslan:

Unsupervised Domain Adaptation of Neural PLDA Using Segment Pairs for Speaker Verification. 571-576 - Bhusan Chettri

:
The Clever Hans Effect in Voice Spoofing Detection. 577-584 - Xin Wang

, Junichi Yamagishi:
Investigating Active-Learning-Based Training Data Selection for Speech Spoofing Countermeasure. 585-592 - Xinyue Ma, Shanshan Zhang, Shen Huang, Ji Gao, Ying Hu, Liang He

:
How to Boost Anti-Spoofing with X-Vectors. 593-598 - Zhengyang Chen, Yao Qian, Bing Han, Yanmin Qian, Michael Zeng:

A Comprehensive Study on Self-Supervised Distillation for Speaker Representation Learning. 599-604 - Jeremy Heng Meng Wong, Yifan Gong:

Joint Speaker Diarisation and Tracking in Switching State-Space Model. 605-612 - Jeremy Heng Meng Wong, Igor Abramovski, Xiong Xiao, Yifan Gong:

Diarisation Using Location Tracking with Agglomerative Clustering. 613-619 - Shota Horiguchi, Yuki Takashima, Shinji Watanabe

, Paola García:
Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization. 620-625 - Juan Manuel Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset:

Continual Self-Supervised Domain Adaptation for End-to-End Speaker Diarization. 626-632 - Juan Zuluaga-Gomez, Seyyed Saeed Sarfjoo, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlícek, Karel Ondrej

, Oliver Ohneiser
, Hartmut Helmke:
Bertraffic: Bert-Based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications. 633-640 - Giovanni Morrone

, Samuele Cornell
, Desh Raj
, Luca Serafini, Enrico Zovato, Alessio Brutti
, Stefano Squartini
:
Low-Latency Speech Separation Guided Diarization for Telephone Conversations. 641-646 - Samantha Kotey, Rozenn Dahyot

, Naomi Harte
:
Fine Grained Spoken Document Summarization Through Text Segmentation. 647-654 - Jwala Dhamala, Varun Kumar, Rahul Gupta, Kai-Wei Chang, Aram Galstyan:

An Analysis of The Effects of Decoding Algorithms on Fairness in Open-Ended Language Generation. 655-662 - Lu Zeng, Sree Hari Krishnan Parthasarathi, Dilek Hakkani-Tur

:
N-Best Hypotheses Reranking for Text-to-SQL Systems. 663-670 - Jia Cui, Heng Lu, Wenjie Wang, Shiyin Kang, Liqiang He, Guangzhi Li, Dong Yu:

Efficient Text Analysis with Pre-Trained Neural Network Models. 671-676 - Sharman Tan, Piyush Behre, Nick Kibre, Issac Alphonso, Shuangyu Chang:

Four-in-One: a Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition. 677-684 - Hiroaki Sugiyama, Masahiro Mizukami, Tsunehiro Arimoto, Hiromi Narimatsu, Yuya Chiba, Hideharu Nakajima, Toyomi Meguro:

Empirical Analysis of Training Strategies of Transformer-Based Japanese Chit-Chat Systems. 685-691 - Xuanjun Chen, Haibin Wu, Helen Meng, Hung-yi Lee, Jyh-Shing Roger Jang:

Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection. 692-699 - Leanne Nortje, Herman Kamper

:
Towards Visually Prompted Keyword Localisation for Zero-Resource Spoken Languages. 700-707 - Binghuai Lin, Liyuan Wang:

Exploiting Information From Native Data for Non-Native Automatic Pronunciation Assessment. 708-714 - Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath:

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model. 715-722 - Zhengyang Li

, Timo Lohrenz
, Matthias Dunkelberg, Tim Fingscheidt
:
Transformer-Based Lip-Reading with Regularized Dropout and Relaxed Attention. 723-730 - Kayode Olaleye, Dan Oneata, Herman Kamper

:
YFACC: A Yorùbá Speech-Image Dataset for Cross-Lingual Keyword Localisation Through Visual Grounding. 731-738 - Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato:

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis. 739-746 - Muhammad Huzaifah, Ivan Kukanov:

An Analysis of Semantically-Aligned Speech-Text Embeddings. 747-754 - Brady Houston, Katrin Kirchhoff:

Exploration of Language-Specific Self-Attention Parameters for Multilingual End-to-End Speech Recognition. 755-762 - Shelly Jain, Aditya Yadavalli, Ganesh Mirishkar, Anil Kumar Vuppala:

How do Phonological Properties Affect Bilingual Automatic Speech Recognition? 763-770 - Ke Hu, Bo Li, Tara N. Sainath:

Scaling Up Deliberation For Multilingual ASR. 771-776 - Amir Hussein, Shammur Absar Chowdhury

, Ahmed Abdelali, Najim Dehak, Ahmed Ali, Sanjeev Khudanpur:
Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition. 777-784 - Joshua Jansen van Vüren, Thomas Niesler:

Code-Switched Language Modelling Using a Code Predictive Lstm in Under-Resourced South African Languages. 785-791 - Le Minh Nguyen, Shekhar Nayak

, Matt Coler
:
Improving Luxembourgish Speech Recognition with Cross-Lingual Speech Representations. 792-797 - Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna:

FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech. 798-805 - Zihan Wang, Qi Meng, HaiFeng Lan, Xinrui Zhang, KeHao Guo, Akshat Gupta:

Multilingual Speech Emotion Recognition with Multi-Gating Mechanism and Neural Architecture Search. 806-813 - Hui Lu

, Disong Wang, Xixin Wu, Zhiyong Wu, Xunying Liu, Helen Meng:
Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE. 814-821 - Chia-Yu Li, Ngoc Thang Vu:

Improving Semi-Supervised End-To-End Automatic Speech Recognition Using Cyclegan and Inter-Domain Losses. 822-829 - Chandran Savithri Anoop

, A. G. Ramakrishnan
:
Exploring a Unified ASR for Multiple South Indian Languages Leveraging Multilingual Acoustic and Language Models. 830-837 - Sepand Mavandadi, Bo Li, Chao Zhang, Brian Farris, Tara N. Sainath, Trevor Strohman:

A Truly Multilingual First Pass and Monolingual Second Pass Streaming on-Device ASR System. 838-845 - Xiaoming Zhang, Fan Zhang, Xiaodong Cui, Wei Zhang:

Speech Emotion Recognition with Complementary Acoustic Representations. 846-852 - Amruta Saraf, Ganesh Sivaraman, Elie Khoury

:
A Zero-Shot Approach to Identifying Children's Speech in Automatic Gender Classification. 853-859 - Wen Wu

, Chao Zhang, Philip C. Woodland:
Distribution-Based Emotion Recognition in Conversation. 860-867 - Yuanchao Li, Yumnah Mohamied, Peter Bell, Catherine Lai:

Exploration of a Self-Supervised Speech Model: A Study on Emotional Corpora. 868-875 - Florian Lux, Ching-Yi Chen, Ngoc Thang Vu:

Combining Contrastive and Non-Contrastive Losses for Fine-Tuning Pretrained Models in Speech Analysis. 876-883 - Yuma Koizumi, Kohei Yatabe

, Heiga Zen
, Michiel Bacchiani:
Wavefit: an Iterative and Non-Autoregressive Neural Vocoder Based on Fixed-Point Iteration. 884-891 - Mikolaj Babianski, Kamil Pokora, Raahil Shah, Rafal Sienkiewicz, Daniel Korzekwa, Viacheslav Klimkov:

On Granularity of Prosodic Representations in Expressive Text-to-Speech. 892-899 - Sewade Ogun, Vincent Colotte, Emmanuel Vincent:

Can We Use Common Voice to Train a Multi-Speaker TTS System? 900-905 - Matthew Baas, Herman Kamper

:
GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models. 906-911 - Sarina Meyer, Pascal Tilli, Pavel Denisov, Florian Lux, Julia Koch, Ngoc Thang Vu:

Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy. 912-919 - Yinghao Aaron Li, Cong Han, Nima Mesgarani:

Styletts-VC: One-Shot Voice Conversion by Knowledge Transfer From Style-Based TTS Models. 920-927 - Jan Melechovský

, Ambuj Mehrish, Dorien Herremans
, Berrak Sisman:
Learning Accent Representation with Multi-Level VAE Towards Controllable Speech Synthesis. 928-935 - Yoshifumi Nakano, Takaaki Saeki, Shinnosuke Takamichi, Katsuhito Sudoh

, Hiroshi Saruwatari:
VTTS: Visual-Text To Speech. 936-942 - Dominik Wagner, Sebastian P. Bayerl, Héctor A. Cordourier Maruri, Tobias Bocklet

:
Generative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech. 943-948 - Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda:

Two-Stage Training Method for Japanese Electrolaryngeal Speech Enhancement Based on Sequence-to-Sequence Voice Conversion. 949-954 - Hiroki Kanagawa, Yusuke Ijima:

SIMD-Size Aware Weight Regularization for Fast Neural Vocoding on CPU. 955-961 - Florian Lux, Julia Koch, Ngoc Thang Vu:

Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech. 962-969 - Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji, Andros Tjandra, Sakriani Sakti:

NIX-TTS: Lightweight and End-to-End Text-to-Speech Via Module-Wise Distillation. 970-976 - Efthymios Georgiou

, Kosmas Kritsis, Georgios Paraskevopoulos, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos:
Regotron: Regularizing the Tacotron2 Architecture Via Monotonic Alignment Loss. 977-983 - Abdelhamid Ezzerg

, Thomas Merritt, Kayoko Yanagisawa, Piotr Bilinski
, Magdalena Proszewska, Kamil Pokora, Renard Korzeniowski, Roberto Barra-Chicote, Daniel Korzekwa:
Remap, Warp and Attend: Non-Parallel Many-to-Many Accent Conversion with Normalizing Flows. 984-990 - Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Anh Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed:

Stop: A Dataset for Spoken Task Oriented Semantic Parsing. 991-998 - Injy Hamed, Amir Hussein, Oumnia Chellah, Shammur Absar Chowdhury

, Hamdy Mubarak, Sunayana Sitaram, Nizar Habash, Ahmed Ali:
Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition. 999-1005 - Mohammad Al-Fetyani, Muhammad Al-Barham

, Gheith A. Abandah, Adham Alsharkawi, Maha Dawas:
MASC: Massive Arabic Speech Corpus. 1006-1013 - Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Shogo Seki:

Distilling Sequence-to-Sequence Voice Conversion Models for Streaming Conversion Applications. 1022-1028 - Chuanbo Zhu, Takuya Kunihara, Daisuke Saito, Nobuaki Minematsu, Noriko Nakanishi:

Automatic Prediction of Intelligibility of Words and Phonemes Produced Orally by Japanese Learners of English. 1029-1036 - Zuheng Kang, Jianzong Wang

, Junqing Peng, Jing Xiao:
SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning. 1037-1044 - Bi-Cheng Yan, Hsin-Wei Wang, Berlin Chen:

Peppanet: Effective Mispronunciation Detection and Diagnosis Leveraging Phonetic, Phonological, and Acoustic Cues. 1045-1051 - Samuele Cornell

, Thomas Balestri, Thibaud Sénéchal:
Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection. 1052-1058 - Suliang Bu, Tuo Zhao, Yunxin Zhao:

TDOA Estimation of Speech Source in Noisy Reverberant Environments. 1059-1066 - Luke Strgar, David Harwath:

Phoneme Segmentation Using Self-Supervised Speech Models. 1067-1073 - Chao-Han Huck Yang, I-Fan Chen, Andreas Stolcke, Sabato Marco Siniscalchi, Chin-Hui Lee:

An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition. 1074-1080 - Aghilas Sini, Antoine Perquin, Damien Lolive, Arnaud Delhay:

Phone-Level Pronunciation Scoring for L1 Using Weighted-Dynamic Time Warping. 1081-1087 - Stefano Bannò, Marco Matassoni:

Proficiency Assessment of L2 Spoken English Using Wav2Vec 2.0. 1088-1095 - Tzu-hsun Feng, Shuyan Annie Dong, Ching-Feng Yeh, Shu-Wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe

, Abdelrahman Mohamed, Shang-Wen Li, Hung-yi Lee:
Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning. 1096-1103 - Chaoyue Ding, Jiakui Li, Martin Zong, Baoxiang Li:

Speed-Robust Keyword Spotting Via Soft Self-Attention on Multi-Scale Features. 1104-1111 - Guan-Ting Lin, Chi-Luen Feng, Wei-Ping Huang, Yuan Tseng, Tzu-Han Lin, Chen-An Li, Hung-yi Lee, Nigel G. Ward:

On the Utility of Self-Supervised Models for Prosody-Related Tasks. 1104-1111 - Kuan-Po Huang, Yu-Kuan Fu, Tsu-Yuan Hsu, Fabian Ritter Gutierrez, Fan-Lin Wang, Liang-Hsuan Tseng, Yu Zhang, Hung-yi Lee:

Improving Generalizability of Distilled Self-Supervised Speech Processing Models Under Distorted Settings. 1112-1119 - Zih-Ching Chen, Chin-Lun Fu, Chih-Ying Liu, Shang-Wen (Daniel) Li, Hung-yi Lee:

Exploring Efficient-Tuning Methods in Self-Supervised Speech Models. 1120-1127 - Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe

, Paola García, Hung-yi Lee, Hao Tang:
On Compressing Sequences for Self-Supervised Speech Models. 1128-1135 - Themos Stafylakis

, Ladislav Mosner, Sofoklis Kakouros
, Oldrich Plchot, Lukás Burget, Jan Cernocký:
Extracting Speaker and Emotion Information from Self-Supervised Speech Models via Channel-Wise Correlations. 1136-1143

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














