default search action

combined dblp search
author search
venue search
publication search

ask others

23rd Interspeech 2022: Incheon, Korea

> Home > Conferences and Workshops > Interspeech

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/2022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/2022
Hanseok Ko, John H. L. Hansen:
23rd Annual Conference of the International Speech Communication Association, Interspeech 2022, Incheon, Korea, September 18-22, 2022. ISCA 2022

Speech Synthesis: Toward end-to-end synthesis

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoJLW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoJLW22
Hyunjae Cho, Wonbin Jung, Junhyeok Lee, Sang Hoon Woo:
SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech. 1-5
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaeJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaeJ22
Hanbin Bae, Young-Sun Joo:
Enhancement of Pitch Controllability using Timbre-Preserving Pitch Augmentation in FastPitch. 6-10
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LengletPB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LengletPB22
Martin Lenglet, Olivier Perrotin, Gérard Bailly:
Speaking Rate Control of end-to-end TTS Models by Direct Manipulation of the Encoder's Output Embeddings. 11-15
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JuKYKKM022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JuKYKKM022
Yooncheol Ju, Ilhwan Kim, Hongsun Yang, Ji-Hoon Kim, Byeongyeol Kim, Soumi Maiti, Shinji Watanabe:
TriniTTS: Pitch-controllable End-to-end TTS without External Aligner. 16-20
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LimJK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LimJK22
Dan Lim, Sunghee Jung, Eesung Kim:
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech. 21-25

Technology for Disordered Speech

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TurrisiB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TurrisiB22
Rosanna Turrisi, Leonardo Badino:
Interpretable dysarthric speaker adaptation based on optimal-transport. 26-30
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YueLCBC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YueLCBC22
Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic:
Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs. 31-35
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PranantaH0S22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PranantaH0S22
Luke Prananta, Bence Mark Halpern, Siyuan Feng, Odette Scharenborg:
The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition. 36-40
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VioletaHT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VioletaHT22
Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda:
Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition. 41-45
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BhatPS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BhatPS22
Chitralekha Bhat, Ashish Panda, Helmer Strik:
Improved ASR Performance for Dysarthric Speech Using Two-stage DataAugmentation. 46-50
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HernandezPNOMY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HernandezPNOMY22
Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas K. Maier, Seung Hee Yang:
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition. 51-55

Neural Network Training Methods for ASR I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeCLSPK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeCLSPK22
Mun-Hak Lee, Joon-Hyuk Chang, Sang-Eon Lee, Ju-Seok Seong, Chanhee Park, Haeyoung Kwon:
Regularizing Transformer-based Acoustic Models by Penalizing Attention Weights. 56-60
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChanG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChanG22
David M. Chan, Shalini Ghosh:
Content-Context Factorized Representations for Automated Speech Recognition. 61-65
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KarakasidisGK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KarakasidisGK22
Georgios Karakasidis, Tamás Grósz, Mikko Kurimo:
Comparison and Analysis of New Curriculum Criteria for End-to-End ASR. 66-70
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BabyDM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BabyDM22
Deepak Baby, Pasquale D'Alterio, Valentin Mendelev:
Incremental learning for RNN-Transducer based speech recognition models. 71-75
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HardPCASP0NNLMB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HardPCASP0NNLMB22
Andrew Hard, Kurt Partridge, Neng Chen, Sean Augenstein, Aishanee Shah, Hyun Jin Park, Alex Park, Sara Ng, Jessica Nguyen, Ignacio López-Moreno, Rajiv Mathews, Françoise Beaufays:
Production federated keyword spotting via distillation, filtering, and joint federated-centralized training. 76-80

Acoustic Phonetics and Prosody

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SongJK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SongJK22
Jieun Song, Hae-Sung Jeon, Jieun Kiaer:
Use of prosodic and lexical cues for disambiguating wh-words in Korean. 81-85
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RibeiroL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RibeiroL22
Vinicius Ribeiro, Yves Laprie:
Autoencoder-Based Tongue Shape Estimation During Continuous Speech. 86-90
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MagistroC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MagistroC22
Giuseppe Magistro, Claudia Crocco:
Phonetic erosion and information structure in function words: the case of mia. 91-95
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OhL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OhL22
Miran Oh, Yoon-Jeong Lee:
Dynamic Vertical Larynx Actions Under Prosodic Focus. 96-100
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BradshawCJD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BradshawCJD22
Leah Bradshaw, Eleanor Chodroff, Lena A. Jäger, Volker Dellwo:
Fundamental Frequency Variability over Time in Telephone Interactions. 101-105

Spoken Machine Translation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TsiamasGFC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TsiamasGFC22
Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà:
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation. 106-110
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoYHS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoYHS22
Jinming Zhao, Hao Yang, Gholamreza Haffari, Ehsan Shareghi:
M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation. 111-115
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZaidiL0K22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZaidiL0K22
Mohd Abbas Zaidi, Beomseok Lee, Sangha Kim, Chanwoo Kim:
Cross-Modal Decision Regularization for Simultaneous Speech Translation. 116-120
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FukudaS022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FukudaS022
Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura:
Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation. 121-125
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RKNJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RKNJ22
Kirandevraj R, Vinod Kumar Kurmi, Vinay P. Namboodiri, C. V. Jawahar:
Generalized Keyword Spotting using ASR embeddings. 126-130

(Multimodal) Speech Emotion Recognition I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AhnLS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AhnLS22
Youngdo Ahn, Sung Joo Lee, Jong Won Shin:
Multi-Corpus Speech Emotion Recognition for Unseen Corpus Using Corpus-Wise Weights in Classification Loss. 131-135
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimAK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimAK22
Junghun Kim, Yoojin An, Jihie Kim:
Improving Speech Emotion Recognition Through Focus and Calibration Attention Mechanisms. 136-140
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Lee22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Lee22
Joosung Lee:
The Emotion is Not One-hot Encoding: Learning with Grayscale Label for Emotion Recognition in Conversation. 141-145
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Triantafyllopoulos22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Triantafyllopoulos22
Andreas Triantafyllopoulos, Johannes Wagner, Hagen Wierstorf, Maximilian Schmitt, Uwe D. Reichel, Florian Eyben, Felix Burkhardt, Björn W. Schuller:
Probing speech emotion recognition transformers for linguistic knowledge. 146-150
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PrabhuCLG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PrabhuCLG22
Navin Raj Prabhu, Guillaume Carbajal, Nale Lehmann-Willenbrock, Timo Gerkmann:
End-To-End Label Uncertainty Modeling for Speech-based Arousal Recognition Using Bayesian Neural Networks. 151-155
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PerezJNGRTLKP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PerezJNGRTLKP22
Matthew Perez, Mimansa Jaiswal, Minxue Niu, Cristina Gorrostieta, Matthew Roddy, Kye Taylor, Reza Lotfian, John Kane, Emily Mower Provost:
Mind the gap: On the value of silence representations to lexical-based speech emotion recognition. 156-160
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChouLB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChouLB22
Huang-Cheng Chou, Chi-Chun Lee, Carlos Busso:
Exploiting Co-occurrence Frequency of Emotions in Perceptual Evaluations To Train A Speech Emotion Classifier. 161-165
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DhamyalRS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DhamyalRS22
Hira Dhamyal, Bhiksha Raj, Rita Singh:
Positional Encoding for Capturing Modality Specific Cadence for Emotion Detection. 166-170

Dereverberation, Noise Reduction, and Speaker Extraction

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoKA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoKA22
Tuan Vu Ho, Maori Kobayashi, Masato Akagi:
Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion. 171-175
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoNAU22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoNAU22
Tuan Vu Ho, Quoc Huy Nguyen, Masato Akagi, Masashi Unoki:
Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement. 176-180
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimSCS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimSCS22
Minseung Kim, Hyungchan Song, Sein Cheong, Jong Won Shin:
iDeepMMSE: An improved deep learning approach to MMSE speech and noise power spectrum estimation for speech enhancement. 181-185
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HungFTC0L22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HungFTC0L22
Kuo-Hsuan Hung, Szu-Wei Fu, Huan-Hsin Tseng, Hsin-Tien Chiang, Yu Tsao, Chii-Wann Lin:
Boosting Self-Supervised Embeddings for Speech Enhancement. 186-190
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HwangPP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HwangPP22
Seorim Hwang, Youngcheol Park, Sungwook Park:
Monoaural Speech Enhancement Using a Nested U-Net with Two-Level Skip Connections. 191-195
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MuckenhirnSEQTW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MuckenhirnSEQTW22
Hannah Muckenhirn, Aleksandr Safin, Hakan Erdogan, Felix de Chaumont Quitry, Marco Tagliasacchi, Scott Wisdom, John R. Hershey:
CycleGAN-based Unpaired Speech Dereverberation. 196-200
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0004W22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0004W22
Ashutosh Pandey, DeLiang Wang:
Attentive Training: A New Training Framework for Talker-independent Speaker Extraction. 201-205
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VuongS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VuongS22
Tyler Vuong, Richard M. Stern:
Improved Modulation-Domain Loss for Neural-Network-based Speech Enhancement. 206-210
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengCSY0C22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengCSY0C22
Chiang-Jen Peng, Yun-Ju Chan, Yih-Liang Shen, Cheng Yu, Yu Tsao, Tai-Shih Chi:
Perceptual Characteristics Based Multi-objective Model for Speech Enhancement. 211-215
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DelcroixKOZSN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DelcroixKOZSN22
Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Katerina Zmolíková, Hiroshi Sato, Tomohiro Nakatani:
Listen only to me! How well can target speech extraction handle false alarms? 216-220
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiW0DK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiW0DK22
Hao Shi, Longbiao Wang, Sheng Li, Jianwu Dang, Tatsuya Kawahara:
Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction. 221-225
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LemercierTKG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LemercierTKG22
Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann:
Neural Network-augmented Kalman Filtering for Robust Online Speech Dereverberation in Noisy Reverberant Environments. 226-230

Source Separation II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SchmidtPM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SchmidtPM22
Nicolás Schmidt, Jordi Pons, Marius Miron:
PodcastMix: A dataset for separating music and speech in podcasts. 231-235
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaijoS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaijoS22
Kohei Saijo, Robin Scheibler:
Independence-based Joint Dereverberation and Separation with Neural Source Model. 236-240
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaijoS22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaijoS22a
Kohei Saijo, Robin Scheibler:
Spatial Loss for Unsupervised Multi-channel Source Separation. 241-245
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BellowsL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BellowsL22
Samuel Bellows, Timothy W. Leishman:
Effect of Head Orientation on Speech Directivity. 246-250
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaijoO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaijoO22
Kohei Saijo, Tetsuji Ogawa:
Unsupervised Training of Sequential Neural Beamformer Using Coarsely-separated and Non-separated Signals. 251-255
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BorsdorfS0S22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BorsdorfS0S22
Marvin Borsdorf, Kevin Scheck, Haizhou Li, Tanja Schultz:
Blind Language Separation: Disentangling Multilingual Cocktail Party Voices by Language. 256-260
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuzikK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuzikK22
Mateusz Guzik, Konrad Kowalczyk:
NTF of Spectral and Spatial Features for Tracking and Separation of Moving Sound Sources in Spherical Harmonic Domain. 261-265
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DeadmanB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DeadmanB22
Jack Deadman, Jon Barker:
Modelling Turn-taking in Multispeaker Parties for Realistic Data Simulation. 266-270
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BoddekerCNH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BoddekerCNH22
Christoph Böddeker, Tobias Cord-Landwehr, Thilo von Neumann, Reinhold Haeb-Umbach:
An Initialization Scheme for Meeting Separation with Spatial Mixture Models. 271-275
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MunGLHLK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MunGLHLK22
Seongkyu Mun, Dhananjaya Gowda, Jihwan Lee, Changwoo Han, Dokyun Lee, Chanwoo Kim:
Prototypical speaker-interference loss for target voice separation using non-parallel audio samples. 276-280

Embedding and Network Architecture for Speaker Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BousquetRB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BousquetRB22
Pierre-Michel Bousquet, Mickael Rouvier, Jean-François Bonastre:
Reliability criterion based on learning-phase entropy for speaker recognition with neural network. 281-285
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuCQ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuCQ22
Bei Liu, Zhengyang Chen, Yanmin Qian:
Attentive Feature Fusion for Robust Speaker Verification. 286-290
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuCQ22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuCQ22a
Bei Liu, Zhengyang Chen, Yanmin Qian:
Dual Path Embedding Learning for Speaker Verification with Triplet Attention. 291-295
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuCWWHQ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuCWWHQ22
Bei Liu, Zhengyang Chen, Shuai Wang, Haoyu Wang, Bing Han, Yanmin Qian:
DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design. 296-300
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiFML22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiFML22
Ruida Li, Shuo Fang, Chenguang Ma, Liang Li:
Adaptive Rectangle Loss for Speaker Verification. 301-305
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangLWZH0LM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangLWZH0LM22
Yang Zhang, Zhiqiang Lv, Haibin Wu, Shanshan Zhang, Pengfei Hu, Zhiyong Wu, Hung-yi Lee, Helen Meng:
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification. 306-310
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangCQ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangCQ22
Leying Zhang, Zhengyang Chen, Yanmin Qian:
Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification. 311-315
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TianLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TianLL22
Yusheng Tian, Jingyu Li, Tan Lee:
Transport-Oriented Feature Aggregation for Speaker Embedding Learning. 316-320
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SangH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SangH22
Mufan Sang, John H. L. Hansen:
Multi-Frequency Information Enhanced Channel Attention Module for Speaker Representation Learning. 321-325
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CaiYCTC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CaiYCTC22
Linjun Cai, Yuhong Yang, Xufeng Chen, Weiping Tu, Hongyang Chen:
CS-CTCSCONV1D: Small footprint speaker verification with channel split time-channel-time separable 1-dimensional convolution. 326-330
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiLHW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiLHW22
Pengqi Li, Lantian Li, Askar Hamdulla, Dong Wang:
Reliable Visualization for Deep Speaker Recognition. 331-335
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengHDLW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengHDLW22
Zhiyuan Peng, Xuanji He, Ke Ding, Tan Lee, Guanglu Wan:
Unifying Cosine and PLDA Back-ends for Speaker Verification. 336-340
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiDLW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiDLW22
Yuheng Wei, Junzhao Du, Hui Liu, Qian Wang:
CTFALite: Lightweight Channel-specific Temporal and Frequency Attention Mechanism for Enhancing the Speaker Embedding Extractor. 341-345

Speech Representation II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenXXPD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenXXPD22
Weidong Chen, Xiaofen Xing, Xiangmin Xu, Jianxin Pang, Lan Du:
SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech. 346-350
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Feinberg22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Feinberg22
David Feinberg:
VoiceLab: Software for Fully Reproducible Automated Voice Analysis. 351-355
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShorV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShorV22
Joel Shor, Subhashini Venugopalan:
TRILLsson: Distilled Universal Paralinguistic Speech Representations. 356-360
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiGWU0D22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiGWU0D22
Nan Li, Meng Ge, Longbiao Wang, Masashi Unoki, Sheng Li, Jianwu Dang:
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network. 361-365
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SadeghiM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SadeghiM22
Mostafa Sadeghi, Paul Magron:
A Sparsity-promoting Dictionary Model for Variational Autoencoders. 366-370
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoWYZZZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoWYZZZ22
Yan Zhao, Jincen Wang, Ru Ye, Yuan Zong, Wenming Zheng, Li Zhao:
Deep Transductive Transfer Regression Network for Cross-Corpus Speech Emotion Recognition. 371-375
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HansenW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HansenW22
John H. L. Hansen, Zhenyu Wang:
Audio Anti-spoofing Using Simple Attention Module and Joint Optimization Based on Additive Angular Margin Loss and Meta-learning. 376-380
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BergsmaYC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BergsmaYC22
Boris Bergsma, Minhao Yang, Milos Cernak:
PEAF: Learnable Power Efficient Analog Acoustic Features for Audio Recognition. 381-385
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ElbannaBSOMKBC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ElbannaBSOMKBC22
Gasser Elbanna, Alice Biryukov, Neil Scheidwasser-Clow, Lara Orlandic, Pablo Mainar, Mikolaj Kegler, Pierre Beckmann, Milos Cernak:
Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load. 386-390
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangHGB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangHGB22
Shijun Wang, Hamed Hemati, Jón Guðnason, Damian Borth:
Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition. 391-395
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YadavZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YadavZ22
Sarthak Yadav, Neil Zeghidour:
Learning neural audio features without supervision. 396-400
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangWW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangWW22
Yixuan Zhang, Heming Wang, DeLiang Wang:
Densely-connected Convolutional Recurrent Network for Fundamental Frequency Estimation in Noisy Speech. 401-405
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FarideeG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FarideeG22
Abu Zaher Md Faridee, Hannes Gamper:
Predicting label distribution improves non-intrusive speech quality estimation. 406-410
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AshiharaMMT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AshiharaMMT22
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, Tomohiro Tanaka:
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models. 411-415
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AzeemiQR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AzeemiQR22
Abdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha Ali Raza:
Dataset Pruning for Resource-constrained Spoofed Audio Detection. 416-420

Speech Synthesis: Linguistic Processing, Paradigms and Other Topics II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaeKK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaeKK22
Jaesung Tae, Hyeongju Kim, Taesu Kim:
EdiTTS: Score-based Editing for Controllable Text-to-Speech. 421-425
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenSTWK0M22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenSTWK0M22
Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng:
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information. 426-430
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BorsosST22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BorsosST22
Zalan Borsos, Matthew Sharifi, Marco Tagliasacchi:
SpeechPainter: Text-conditioned Speech Inpainting. 431-435
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZZL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZZL22
Song Zhang, Ken Zheng, Xiaoxu Zhu, Baoxiang Li:
A polyphone BERT for Polyphone Disambiguation in Mandarin Chinese. 436-440
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001Y0S22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001Y0S22
Mutian He, Jingzhou Yang, Lei He, Frank K. Soong:
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge. 441-445
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuZJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuZJ22
Jian Zhu, Cong Zhang, David Jurgens:
ByT5 model for massively multilingual grapheme-to-phoneme conversion. 446-450
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MathurDTGNMJM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MathurDTGNMJM22
Puneet Mathur, Franck Dernoncourt, Quan Hung Tran, Jiuxiang Gu, Ani Nenkova, Vlad I. Morariu, Rajiv Jain, Dinesh Manocha:
DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis. 451-455
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangS0TYLWZQLZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangS0TYLWZQLZ22
Guangyan Zhang, Kaitao Song, Xu Tan, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao:
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech. 456-460
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiWGQ0CH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiWGQ0CH22
Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson:
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition. 461-465
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TranCHBT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TranCHBT22
Tho Nguyen Duc Tran, The Chuong Chu, Vu Hoang, Trung Huu Bui, Steven Hung Quoc Truong:
An Efficient and High Fidelity Vietnamese Streaming End-to-End Speech Synthesis. 466-470
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Valentini-Botinhao22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Valentini-Botinhao22
Cassia Valentini-Botinhao, Manuel Sam Ribeiro, Oliver Watts, Korin Richmond, Gustav Eje Henter:
Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks. 471-475
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenWP022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenWP022
Zikai Chen, Lin Wu, Junjie Pan, Xiang Yin:
An Automatic Soundtracking System for Text-to-Speech Audiobooks. 476-480
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanZL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanZL22
Daxin Tan, Guangyan Zhang, Tan Lee:
Environment Aware Text-to-Speech Synthesis. 481-485
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PloujnikovR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PloujnikovR22
Artem Ploujnikov, Mirco Ravanelli:
SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation. 486-490
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BakhturinaZG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BakhturinaZG22
Evelina Bakhturina, Yang Zhang, Boris Ginsburg:
Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization. 491-495
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VirkarFEB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VirkarFEB22
Yogesh Virkar, Marcello Federico, Robert Enyedi, Roberto Barra-Chicote:
Prosodic alignment for off-screen automatic dubbing. 496-500
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaiKZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaiKZ22
Qibing Bai, Tom Ko, Yu Zhang:
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis. 501-505
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KameokaKST22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KameokaKST22
Hirokazu Kameoka, Takuhiro Kaneko, Shogo Seki, Kou Tanaka:
CAUSE: Crossmodal Action Unit Sequence Estimation from Speech. 506-510
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AbeysingheJ0M22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AbeysingheJ0M22
Binu Nisal Abeysinghe, Jesin James, Catherine I. Watson, Felix Marattukalam:
Visualising Model Training via Vowel Space for Text-To-Speech Systems. 511-515

Audio Deep PLC (Packet Loss Concealment) Challenge

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuanYLZW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuanYLZW22
Yuansheng Guan, Guochen Yu, Andong Li, Chengshi Zheng, Jie Wang:
TMGAN-PLC: Audio Packet Loss Concealment using Temporal Memory Generative Adversarial Network. 565-569
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ValinMMTKSK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ValinMMTKSK22
Jean-Marc Valin, Ahmed Mustafa, Christopher Montgomery, Timothy B. Terriberry, Michael Klingbeil, Paris Smaragdis, Arvindh Krishnaswamy:
Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model. 570-574
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuSYYW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuSYYW22
Baiyun Liu, Qi Song, Mingxue Yang, Wuwen Yuan, Tianbao Wang:
PLCNet: Real-time Packet Loss Concealment with Semi-supervised Generative Adversarial Network. 575-579
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DienerSBSAC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DienerSBSAC22
Lorenz Diener, Sten Sootla, Solomiya Branets, Ando Saabas, Robert Aichner, Ross Cutler:
INTERSPEECH 2022 Audio Deep Packet Loss Concealment Challenge. 580-584
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiZZGY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiZZGY22
Nan Li, Xiguang Zheng, Chen Zhang, Liang Guo, Bing Yu:
End-to-End Multi-Loss Training for Low Delay Packet Loss Concealment. 585-589

Robust Speaker Recognition

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimHSY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimHSY22
Ju-ho Kim, Jungwoo Heo, Hye-jin Shim, Ha-Jin Yu:
Extended U-Net for Speaker Verification in Noisy Environments. 590-594
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangDCPY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangDCPY22
Seunghan Yang, Debasmit Das, Janghoon Cho, Hyoungwoo Park, Sungrack Yun:
Domain Agnostic Few-shot Learning for Speaker Verification. 595-599
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangL022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangL022
Qiongqiong Wang, Kong Aik Lee, Tianchi Liu:
Scoring of Large-Margin Embeddings for Speaker Verification: Cosine or PLDA? 600-604
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/StafylakisMPRSB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/StafylakisMPRSB22
Themos Stafylakis, Ladislav Mosner, Oldrich Plchot, Johan Rohdin, Anna Silnova, Lukás Burget, Jan Cernocký:
Training speaker embedding extractors using multi-speaker audio with unknown speaker boundaries. 605-609
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuuR022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuuR022
Chau Luu, Steve Renals, Peter Bell:
Investigating the contribution of speaker attributes to speaker separability using disentangled speaker representations. 610-614
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KatariaVMD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KatariaVMD22
Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Najim Dehak:
Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification. 615-619

Speech Production

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoshinagaMI22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoshinagaMI22
Tsukasa Yoshinaga, Kikuo Maekawa, Akiyoshi Iida:
Variability in Production of Non-Sibilant Fricative [ç] in /hi/. 620-624
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UdupaIG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UdupaIG22
Sathvik Udupa, Aravind Illa, Prasanta Kumar Ghosh:
Streaming model for Acoustic to Articulatory Inversion with transformer networks. 625-629
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RakotomalalaBP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RakotomalalaBP22
Tsiky Rakotomalala, Pierre Baraduc, Pascal Perrier:
Trajectories predicted by optimal speech motor control using LSTM networks. 630-634
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiekerkXGKBX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiekerkXGKBX22
Daniel R. van Niekerk, Anqi Xu, Branislav Gerazov, Paul Konstantin Krug, Peter Birkholz, Yi Xu:
Exploration strategies for articulatory synthesis of complex syllable onsets. 635-639
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeK22
Yoonjeong Lee, Jody Kreiman:
Linguistic versus biological factors governing acoustic voice variation. 640-643
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Nagamine22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Nagamine22
Takayuki Nagamine:
Acquisition of allophonic variation in second language speech: An acoustic and articulatory study of English laterals by Japanese speakers. 644-648

Speech Quality Assessment

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Manocha0XMGIC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Manocha0XMGIC22
Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel Dejene Gebru, Vamsi Krishna Ithapu, Paul Calamia:
SAQAM: Spatial Audio Quality Assessment Metric. 649-653
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Manocha022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Manocha022
Pranay Manocha, Anurag Kumar:
Speech Quality Assessment through MOS using Non-Matching References. 654-658
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KawaharaYSKBM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KawaharaYSKBM22
Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara, Tatsuya Kitamura, Hideki Banno, Masanori Morise:
An objective test tool for pitch extractors' response attributes. 659-663
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Li0LALZZWDU22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Li0LALZZWDU22
Kai Li, Sheng Li, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang, Masashi Unoki:
Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection. 664-668
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZaiemPE22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZaiemPE22
Salah Zaiem, Titouan Parcollet, Slim Essid:
Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning. 669-673
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MumtazJJNG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MumtazJJNG22
Deebha Mumtaz, Ajit Jena, Vinit Jakhetiya, Karan Nathwani, Sharath Chandra Guntuku:
Transformer-based quality assessment model for generalized user-generated multimedia audio content. 674-678

Language Modeling and Lexical Modeling for ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GyselHPOO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GyselHPOO22
Christophe Van Gysel, Mirko Hannemann, Ernest Pusateri, Youssef Oualil, Ilya Oparin:
Space-Efficient Representation of Entity-centric Query Language Models. 679-683
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DingliwalSBGGK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DingliwalSBGGK22
Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff:
Domain Prompts: Towards memory and compute efficient domain adaptation of ASR systems. 684-688
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangPSPSK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangPSPSK22
W. Ronny Huang, Cal Peyser, Tara N. Sainath, Ruoming Pang, Trevor D. Strohman, Shankar Kumar:
Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition. 689-693
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BreinerRVGMSGCM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BreinerRVGMSGCM22
Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey:
UserLibri: A Dataset for ASR Personalization Using Only Text. 694-698
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChienC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChienC22
Chin-Yueh Chien, Kuan-Yu Chen:
A BERT-based Language Modeling Framework. 699-703

Challenges and Opportunities for Signal Processing and Machine Learning for Multiple Smart Devices

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MasuyamaYO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MasuyamaYO22
Yoshiki Masuyama, Kouei Yamaoka, Nobutaka Ono:
Joint Optimization of Sampling Rate Offsets Based on Entire Signal Relationship Among Distributed Microphones. 704-708
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CiccarelliBNCZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CiccarelliBNCZ22
Gregory Ciccarelli, Jarred Barber, Arun Nair, Israel Cohen, Tao Zhang:
Challenges and Opportunities in Multi-device Speech Processing. 709-713
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Agaskar22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Agaskar22
Ameya Agaskar:
Practical Over-the-air Perceptual AcousticWatermarking. 714-718
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoppelmannBNGS022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoppelmannBNGS022
Timm Koppelmann, Luca Becker, Alexandru Nelus, Rene Glitza, Lea Schönherr, Rainer Martin:
Clustering-based Wake Word Detection in Privacy-aware Acoustic Sensor Networks. 719-723
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NespoliBN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NespoliBN22
Francesco Nespoli, Daniel Barreda, Patrick A. Naylor:
Relative Acoustic Features for Distance Estimation in Smart-Homes. 724-728
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0004X0DCW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0004X0DCW22
Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang:
Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network. 729-733

Speech Processing & Measurement

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FietkauSB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FietkauSB22
Arne-Lukas Fietkau, Simon Stone, Peter Birkholz:
Relationship between the acoustic time intervals and tongue movements of German diphthongs. 734-738
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MatsuiIM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MatsuiIM22
Sanae Matsui, Kyoji Iwamoto, Reiko Mazuka:
Development of allophonic realization until adolescence: A production study of the affricate-fricative variation of /z/ among Japanese children. 739-743
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AhnKSR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AhnKSR22
Chung Soo Ahn, L. L. Chamara Kasun, Sunil Sivadas, Jagath C. Rajapakse:
Recurrent multi-head attention fusion network for combining audio and text for speech emotion recognition. 744-748
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GibsonG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GibsonG22
Louise Coppieters de Gibson, Philip N. Garner:
Low-Level Physiological Implications of End-to-End Learning for Speech Recognition. 749-753
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MachadoDH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MachadoDH22
Carolina Lins Machado, Volker Dellwo, Lei He:
Idiosyncratic lingual articulation of American English /æ/ and /ɑ/ using network analysis. 754-758
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ToyaZKNU22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ToyaZKNU22
Teruki Toya, Wenyu Zhu, Maori Kobayashi, Kenichi Nakamura, Masashi Unoki:
Method for improving the word intelligibility of presented speech using bone-conduction headphones. 759-763
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MohapatraFZBF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MohapatraFZBF22
Debasish Ray Mohapatra, Mario Fleischer, Victor Zappi, Peter Birkholz, Sidney S. Fels:
Three-dimensional finite-difference time-domain acoustic analysis of simplified vocal tract shapes. 764-768
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JongPND22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JongPND22
Dorina De Jong, Aldo Pastore, Noël Nguyen, Alessandro D'Ausilio:
Speech imitation skills predict automatic phonetic convergence: a GMM-UBM study on L2. 769-773
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GeorgesSH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GeorgesSH22
Marc-Antoine Georges, Jean-Luc Schwartz, Thomas Hueber:
Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE. 774-778
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wu0GBA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wu0GBA22
Peter Wu, Shinji Watanabe, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Deep Speech Synthesis from Articulatory Representations. 779-783
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AshokumarS022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AshokumarS022
Monica Ashokumar, Jean-Luc Schwartz, Takayuki Ito:
Orofacial somatosensory inputs in speech perceptual training modulate speech production. 784-787

Speech Synthesis: Acoustic Modeling and Neural Waveform Generation I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimJCALK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimJCALK22
Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Sunghwan Ahn, Joun Yeop Lee, Nam Soo Kim:
Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus. 788-792
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaekiTY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaekiTY22
Takaaki Saeki, Kentaro Tachibana, Ryuichi Yamamoto:
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning. 793-797
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MitsuiS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MitsuiS22
Kentaro Mitsui, Kei Sawada:
MSR-NV: Neural Vocoder Using Multiple Sampling Rates. 798-802
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoizumiZYCB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoizumiZYCB22
Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani:
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping. 803-807
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkCLPOS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkCLPOS22
Sangjun Park, Kihyun Choo, Joohyung Lee, Anton V. Porov, Konstantin Osipov, June Sig Sung:
Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge. 808-812
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaeYBJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaeYBJ22
Jae-Sung Bae, Jinhyeok Yang, Taejun Bak, Young-Sun Joo:
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech. 813-817
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SubramaniVISK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SubramaniVISK22
Krishna Subramani, Jean-Marc Valin, Umut Isik, Paris Smaragdis, Arvindh Krishnaswamy:
End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation. 818-822
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LamZCS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LamZCS22
Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman:
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models. 823-827
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NikitarasVEKMRS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NikitarasVEKMRS22
Karolos Nikitaras, Georgios Vamvoukakis, Nikolaos Ellinas, Konstantinos Klapsas, Konstantinos Markopoulos, Spyros Raptis, June Sig Sung, Gunu Jho, Aimilios Chalamandaris, Pirros Tsiakoulis:
Fine-grained Noise Control for Multispeaker Speech Synthesis. 828-832
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SiuzdakDRJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SiuzdakDRJ22
Hubert Siuzdak, Piotr Dura, Pol van Rijn, Nori Jacoby:
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis. 833-837
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VovkSGPKW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VovkSGPKW22
Ivan Vovk, Tasnima Sadekova, Vladimir Gogoryan, Vadim Popov, Mikhail A. Kudinov, Jiansheng Wei:
Fast Grad-TTS: Towards Efficient Diffusion-Based Speech Generation on CPU. 838-842
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLHABG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLHABG22
Alexander H. Liu, Cheng-I Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James R. Glass:
Simple and Effective Unsupervised Speech Synthesis. 843-847
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoneyamaWT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoneyamaWT22
Reo Yoneyama, Yi-Chiao Wu, Tomoki Toda:
Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation. 848-852

Show and Tell I

- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/ParkKJBG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkKJBG22
Taejin Park, Nithin Rao Koluguri, Fei Jia, Jagadeesh Balam, Boris Ginsburg:
NeMo Open Source Speaker Diarization System. 853-854
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/Lin22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Lin22
Baihan Lin:
Voice2Alliance: Automatic Speaker Diarization and Quality Assurance of Conversational Alignment. 855-856
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/KumarAKDRJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KumarAKDRJ22
Rishabh Kumar, Devaraja Adiga, Mayank Kothyari, Jatin Dalal, Ganesh Ramakrishnan, Preethi Jyothi:
VAgyojaka: An Annotating and Post-Editing Tool for Automatic Speech Recognition. 857-858
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/BadiPKARB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BadiPKARB22
Alzahra Badi, Chungho Park, Min-Seok Keum, Miguel Alba, Youngsuk Ryu, Jeongmin Bae:
SKYE: More than a conversational AI. 859-860

Spatial Audio

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MunakataTK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MunakataTK22
Hokuto Munakata, Ryu Takeda, Kazunori Komatani:
Training Data Generation with DOA-based Selecting and Remixing for Unsupervised Training of Deep Separation Models. 861-865
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenYDZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenYDZ22
Hangting Chen, Yi Yang, Feng Dang, Pengyuan Zhang:
Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output. 866-870
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiongWYF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiongWYF22
Feifei Xiong, Pengyu Wang, Zhongfu Ye, Jinwei Feng:
Joint Estimation of Direction-of-Arrival and Distance for Arrays with Directional Sensors based on Sparse Bayesian Learning. 871-875
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuFSB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuFSB22
Ho-Hsiang Wu, Magdalena Fuentes, Prem Seetharaman, Juan Pablo Bello:
How to Listen? Rethinking Visual Sound Localization. 876-880
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OuyangW022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OuyangW022
Zhiheng Ouyang, Miao Wang, Wei-Ping Zhu:
Small Footprint Neural Networks for Acoustic Direction of Arrival Estimation. 881-885
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangKPL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangKPL22
Xiaoyu Wang, Xiangyu Kong, Xiulian Peng, Yan Lu:
Multi-Modal Multi-Correlation Learning for Audio-Visual Speech Separation. 886-890
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YinGFZWZQD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YinGFZWZQD22
Haoran Yin, Meng Ge, Yanjie Fu, Gaoyan Zhang, Longbiao Wang, Lei Zhang, Lin Qiu, Jianwu Dang:
MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources. 891-895
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FuGYQWZD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FuGYQWZD22
Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang:
Iterative Sound Source Localization for Unknown Number of Sources. 896-900
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PattersonWWH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PattersonWWH22
Katharine Patterson, Kevin W. Wilson, Scott Wisdom, John R. Hershey:
Distance-Based Sound Separation. 901-905
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiGPWD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiGPWD22
Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang:
VCSE: Time-Domain Visual-Contextual Speaker Extraction Network. 906-910
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AroudiUF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AroudiUF22
Ali Aroudi, Stefan Uhlich, Marc Ferras Font:
TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation. 911-915

Single-channel Speech Enhancement II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GeHLG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GeHLG22
Xiaofeng Ge, Jiangyu Han, Yanhua Long, Haixin Guan:
PercepNet+: A Phase and SNR Aware PercepNet for Real-Time Speech Enhancement. 916-920
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenZ22
Zhuangqi Chen, Pingjian Zhang:
Lightweight Full-band and Sub-band Fusion Network for Real Time Speech Enhancement. 921-925
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChengLXZSJP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChengLXZSJP22
Jiaming Cheng, Ruiyu Liang, Yue Xie, Li Zhao, Björn W. Schuller, Jie Jia, Yiyuan Peng:
Cross-Layer Similarity Knowledge Distillation for Speech Enhancement. 926-930
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiongCWLF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiongCWLF22
Feifei Xiong, Weiguang Chen, Pengyu Wang, Xiaofei Li, Jinwei Feng:
Spectro-Temporal SubNet for Real-Time Monaural Speech Denoising and Dereverberation. 931-935
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CaoAY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CaoAY22
Ruizhe Cao, Sherif Abdulatif, Bin Yang:
CMGAN: Conformer-based Metric GAN for Speech Enhancement. 936-940
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiHZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiHZ22
Zeyuan Wei, Li Hao, Xueliang Zhang:
Model Compression by Iterative Pruning with Knowledge Distillation and Its Application to Speech Enhancement. 941-945
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangP22
Chenhui Zhang, Xiang Pan:
Single-channel speech enhancement using Graph Fourier Transform. 946-950
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Guo0Y22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Guo0Y22
Zilu Guo, Xu Xu, Zhongfu Ye:
Joint Optimization of the Module and Sign of the Spectral Real Part Based on CRN for Speech Denoising. 951-955
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhang0W22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhang0W22
Hao Zhang, Ashutosh Pandey, DeLiang Wang:
Attentive Recurrent Network for Low-Latency Active Noise Control. 956-960
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangW22
Jen-Hung Huang, Chung-Hsien Wu:
Memory-Efficient Multi-Step Speech Enhancement with Neural ODE. 961-965
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuWJCH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuWJCH22
Xinmeng Xu, Yang Wang, Jie Jia, Binbin Chen, Jianjun Hao:
GLD-Net: Improving Monaural Speech Enhancement by Learning Global and Local Dependency Features with GLD Block. 966-970
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuWJCL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuWJCL22
Xinmeng Xu, Yang Wang, Jie Jia, Binbin Chen, Dejun Li:
Improving Visual Speech Enhancement Network by Learning Audio-visual Affinity with Multi-head Attention. 971-975
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenRW0WYSM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenRW0WYSM22
Jun Chen, Wei Rao, Zilin Wang, Zhiyong Wu, Yannan Wang, Tao Yu, Shidong Shang, Helen Meng:
Speech Enhancement with Fullband-Subband Cross-Attention Network. 976-980
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuFH0R22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuFH0R22
Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli:
OSSEM: one-shot speaker adaptive speech enhancement using meta learning. 981-985
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiangLY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiangLY22
Wenbin Jiang, Tao Liu, Kai Yu:
Efficient Speech Enhancement with Neural Homomorphic Synthesis. 986-990
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThakkerEYW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThakkerEYW22
Manthan Thakker, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang:
Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation. 991-995
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SatoODKMMITM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SatoODKMMITM22
Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura:
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations. 996-1000

Novel Models and Training Methods for ASR II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MehmoodDSO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MehmoodDSO22
Haaris Mehmood, Agnieszka Dobrowolska, Karthikeyan Saravanan, Mete Ozay:
FedNST: Federated Noisy Student Training for Automatic Speech Recognition. 1001-1005
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FuLWFZ0W022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FuLWFZ0W022
Li Fu, Xiaoxiao Li, Runyu Wang, Fan Lu, Zhengchen Zhang, Meng Chen, Youzheng Wu, Xiaodong He:
SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition. 1006-1010
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLZ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLZ022
Yukun Liu, Ta Li, Pengyuan Zhang, Yonghong Yan:
NAS-SCAE: Searching Compact Attention-based Encoders For End-to-end Automatic Speech Recognition. 1011-1015
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiZSXM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiZSXM22
Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma:
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR. 1016-1020
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaHYHH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaHYHH22
Guodong Ma, Pengfei Hu, Nurmemet Yolwas, Shen Huang, Hao Huang:
PM-MMUT: Boosted Phone-mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition. 1021-1025
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AudhkhasiHRM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AudhkhasiHRM22
Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno:
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition. 1026-1030
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangCSVPHRGMPSH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangCSVPHRGMPSH22
Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, W. Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach:
Improving Rare Word Recognition with LM-aware MWER Training. 1031-1035
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZeineldeenXLSN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZeineldeenXLSN22
Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney:
Improving the Training Recipe for a Robust Conformer-based Hybrid Model. 1036-1040
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LaptevMG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LaptevMG22
Aleksandr Laptev, Somshubra Majumdar, Boris Ginsburg:
CTC Variations Through New WFST Topologies. 1041-1045
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SustekSH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SustekSH22
Martin Sustek, Samik Sadhu, Hynek Hermansky:
Dealing with Unknowns in Continual Learning for End-to-end Automatic Speech Recognition. 1046-1050
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MiaoZZWMW022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MiaoZZWMW022
Chenfeng Miao, Kun Zou, Ziyang Zhuang, Tao Wei, Jun Ma, Shaojun Wang, Jing Xiao:
Towards Efficiently Learning Monotonic Alignments for Attention-based End-to-End Speech Recognition. 1051-1055
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZDB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZDB22
Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker:
On monoaural speech enhancement for automatic recognition of real noisy speech using mixture invariant training. 1056-1060
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KabilB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KabilB22
Selen Hande Kabil, Hervé Bourlard:
From Undercomplete to Sparse Overcomplete Autoencoders to Improve LF-MMI based Speech Recognition. 1061-1065
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanakaMSIMAM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanakaMSIMAM22
Tomohiro Tanaka, Ryo Masumura, Hiroshi Sato, Mana Ihori, Kohei Matsuura, Takanori Ashihara, Takafumi Moriya:
Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks. 1066-1070
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaekakuFP022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaekakuFP022
Takashi Maekaku, Yuya Fujita, Yifan Peng, Shinji Watanabe:
Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR. 1071-1075

Spoken Dialogue Systems and Multimodality

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UchidaHIS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UchidaHIS22
Naokazu Uchida, Takeshi Homma, Makoto Iwayama, Yasuhiro Sogawa:
Reducing Offensive Replies in Open Domain Dialogue Systems. 1076-1080
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuJ22
Ting-Wei Wu, Biing-Hwang Juang:
Induce Spoken Dialog Intents via Deep Unsupervised Context Contrastive Clustering. 1081-1085
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NiheiINNMFN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NiheiINNMFN22
Fumio Nihei, Ryo Ishii, Yukiko I. Nakano, Kyosuke Nishida, Ryo Masumura, Atsushi Fukayama, Takao Nakamura:
Dialogue Acts Aided Important Utterance Detection Based on Multiparty and Multimodal Information. 1086-1090
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BekalSRBK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BekalSRBK22
Dhanush Bekal, Sundararajan Srinivasan, Srikanth Ronanki, Sravan Bodapati, Katrin Kirchhoff:
Contextual Acoustic Barge-In Classification for Spoken Dialog Systems. 1091-1095
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouCWZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouCWZ22
Peilin Zhou, Dading Chong, Helin Wang, Qingcheng Zeng:
Calibrate and Refine! A Novel and Agile Framework for ASR Error Robust Intent Detection. 1096-1100
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FengYWL0Z22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FengYWL0Z22
Lingyun Feng, Jianwei Yu, Yan Wang, Songxiang Liu, Deng Cai, Haitao Zheng:
ASR-Robust Natural Language Understanding on ASR-GLUE dataset. 1101-1105
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DaoTN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DaoTN22
Mai Hoang Dao, Thinh Hung Truong, Dat Quoc Nguyen:
From Disfluency Detection to Intent Detection and Slot Filling. 1106-1110
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouDZNLS0SCXG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouDZNLS0SCXG22
Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao:
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis. 1111-1115
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SartzetakiPP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SartzetakiPP22
Christina Sartzetaki, Georgios Paraskevopoulos, Alexandros Potamianos:
Extending Compositional Attention Networks for Social Reasoning in Videos. 1116-1120
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangSWWZZD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangSWWZZD22
Shiquan Wang, Yuke Si, Xiao Wei, Longbiao Wang, Zhiqiang Zhuang, Xiaowang Zhang, Jianwu Dang:
TopicKS: Topic-driven Knowledge Selection for Knowledge-grounded Dialogue Generation. 1121-1125
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiesenfeldD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiesenfeldD22
Andreas Liesenfeld, Mark Dingemanse:
Bottom-up discovery of structure and variation in response tokens ('backchannels') across diverse languages. 1126-1130
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuWLWFC022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuWLWFC022
Yi Zhu, Zexun Wang, Hang Liu, Peiying Wang, Mingchao Feng, Meng Chen, Xiaodong He:
Cross-modal Transfer Learning via Multi-grained Alignment for End-to-End Spoken Language Understanding. 1131-1135
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OchiOOKSY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OchiOOKSY22
Keiko Ochi, Nobutaka Ono, Keiho Owada, Miho Kuroda, Shigeki Sagayama, Hidenori Yamasue:
Use of Nods Less Synchronized with Turn-Taking and Prosody During Conversations in Adults with Autism. 1136-1140

Show and Tell I(VR)

- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/IvankoRKAKL022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IvankoRKAKL022
Denis Ivanko, Dmitry Ryumin, Alexey M. Kashevnik, Alexandr Axyonov, Andrey Kitenko, Igor Lashkov, Alexey Karpov:
DAVIS: Driver's Audio-Visual Speech recognition. 1141-1142

Speech Emotion Recognition I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VaarasAR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VaarasAR22
Einari Vaaras, Manu Airaksinen, Okko Räsänen:
Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition. 1143-1147
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenLL22
Chun-Yu Chen, Yun-Shao Lin, Chi-Chun Lee:
Emotion-Shift Aware CRF for Decoding Emotion Sequence in Conversation. 1148-1152
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuL22
Bo-Hao Su, Chi-Chun Lee:
Vaccinating SER to Neutralize Adversarial Attacks with Self-Supervised Augmentation Strategy. 1153-1157
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParryDKIMCP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParryDKIMCP22
Jack Parry, Eric DeMattos, Anita Klementiev, Axel Ind, Daniela Morse-Kopp, Georgia Clarke, Dimitri Palaz:
Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning. 1158-1162
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GudmalwarBDR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GudmalwarBDR22
Ashishkumar Prabhakar Gudmalwar, Biplove Basel, Anirban Dutta, Ch V. Rama Rao:
The Magnitude and Phase based Speech Representation Learning using Autoencoder for Classifying Speech Emotions using Deep Canonical Correlation Analysis. 1163-1167
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GoncalvesB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GoncalvesB22
Lucas Goncalves, Carlos Busso:
Improving Speech Emotion Recognition Using Self-Supervised Learning with Domain-Specific Audiovisual Tasks. 1168-1172

Single-channel Speech Enhancement I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoizumiKNPB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoizumiKNPB22
Yuma Koizumi, Shigeki Karita, Arun Narayanan, Sankaran Panchapagesan, Michiel Bacchiani:
SNRi Target Training for Joint Speech Enhancement and Recognition. 1173-1177
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SanadaNWTZTKY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SanadaNWTZTKY22
Yutaro Sanada, Takumi Nakagawa, Yuichiro Wada, Kosaku Takanashi, Yuhui Zhang, Kiichi Tokuyama, Takafumi Kanamori, Tomonori Yamada:
Deep Self-Supervised Learning of Speech Denoising from Noisy Speeches. 1178-1182
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeHLCWT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeHLCWT22
Chi-Chang Lee, Cheng-Hung Hu, Yu-Chen Lin, Chu-Song Chen, Hsin-Min Wang, Yu Tsao:
NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling. 1183-1187
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShchekotovAIAV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShchekotovAIAV22
Ivan Shchekotov, Pavel K. Andreev, Oleg Ivanov, Aibek Alanov, Dmitry P. Vetrov:
FFC-SE: Fast Fourier Convolution for Speech Enhancement. 1188-1192
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TalMKA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TalMKA22
Or Tal, Moshe Mandel, Felix Kreuk, Yossi Adi:
A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement. 1193-1197
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShinPKLH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShinPKLH22
Wooseok Shin, Hyun Joon Park, Jin Sob Kim, Byung Hoon Lee, Sung Won Han:
Multi-View Attention Transfer for Efficient Speech Enhancement. 1198-1202

Speech Synthesis: New Applications

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GoswamiH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GoswamiH22
Nabarun Goswami, Tatsuya Harada:
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate. 1203-1207
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SimonKACK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SimonKACK22
Talia Ben Simon, Felix Kreuk, Faten Awwad, Jacob T. Cohen, Joseph Keshet:
Correcting Mispronunciations in Speech using Spectrogram Inpainting. 1208-1212
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FongLHTK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FongLHTK22
Jason Fong, Daniel Lyth, Gustav Eje Henter, Hao Tang, Simon King:
Speech Audio Corrector: using speech from non-target speakers for one-off correction of mispronunciations in grapheme-input text-to-speech. 1213-1217
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangMRGM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangMRGM22
Wen-Chin Huang, Dejan Markovic, Alexander Richard, Israel Dejene Gebru, Anjali Menon:
End-to-End Binaural Speech Synthesis. 1218-1222
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KochLSBDKRVV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KochLSBDKRVV22
Julia Koch, Florian Lux, Nadja Schauffler, Toni Bernhart, Felix Dieterle, Jonas Kuhn, Sandra Richter, Gabriel Viehhauser, Ngoc Thang Vu:
PoeticTTS - Controllable Poetry Reading for Literary Studies. 1223-1227
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KrugBGNXX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KrugBGNXX22
Paul Konstantin Krug, Peter Birkholz, Branislav Gerazov, Daniel Rudolph van Niekerk, Anqi Xu, Yi Xu:
Articulatory Synthesis for Data Augmentation in Phoneme Recognition. 1228-1232

Spoken Language Understanding I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeL22
Jihyun Lee, Gary Geunbae Lee:
SF-DST: Few-Shot Self-Feeding Reading Comprehension Dialogue State Tracking with Auxiliary Task. 1233-1237
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CattanGSR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CattanGSR22
Oralie Cattan, Sahar Ghannay, Christophe Servan, Sophie Rosset:
Benchmarking Transformers-based models on French Spoken Language Understanding tasks. 1238-1242
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeoLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeoLL22
Seong-Hwan Heo, WonKee Lee, Jong-Hyeok Lee:
mcBERT: Momentum Contrastive Learning with BERT for Zero-Shot Slot Filling. 1243-1247
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wangh22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wangh22
Pu Wang, Hugo Van hamme:
Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding. 1248-1252
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RajuRTDAZLBR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RajuRTDAZLBR22
Anirudh Raju, Milind Rao, Gautam Tiwari, Pranav Dheram, Bryan Anderson, Zhe Zhang, Chul Lee, Bach Bui, Ariya Rastrow:
On joint training with interfaces for spoken language understanding. 1253-1257
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GargRDAMADT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GargRDAMADT22
Vineet Garg, Ognjen Rudovic, Pranay Dighe, Ahmed Hussen Abdelaziz, Erik Marchi, Saurabh Adya, Chandra Dhir, Ahmed H. Tewfik:
Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models. 1258-1262

Inclusive and Fair Speech Technologies I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OgayoNB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OgayoNB22
Perez Ogayo, Graham Neubig, Alan W. Black:
Building African Voices. 1263-1267
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DheramRRCKPSSS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DheramRRCKPSSS22
Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke:
Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities. 1268-1272
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChanCLCGH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChanCLCGH22
May Pik Yu Chan, June Choe, Aini Li, Yiran Chen, Xin Gao, Nicole R. Holliday:
Training and typological bias in ASR performance for world Englishes. 1273-1277

Inclusive and Fair Speech Technologies II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BoitoBTE22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BoitoBTE22
Marcely Zanon Boito, Laurent Besacier, Natalia A. Tomashenko, Yannick Estève:
A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems. 1278-1282
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JohnsonERGOA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JohnsonERGOA22
Alexander Johnson, Kevin Everson, Vijay Ravi, Anissa Gladney, Mari Ostendorf, Abeer Alwan:
Automatic Dialect Density Estimation for African American English. 1283-1287
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KukkA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KukkA22
Kunnar Kukk, Tanel Alumäe:
Improving Language Identification of Accented Speech. 1288-1292
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ToussaintGD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ToussaintGD22
Wiebke Toussaint, Lauriane Gorce, Aaron Yi Ding:
Design Guidelines for Inclusive Speaker Verification Evaluation Datasets. 1293-1297
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TrinhGKDSM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TrinhGKDSM22
Viet Anh Trinh, Pegah Ghahremani, Brian John King, Jasha Droppo, Andreas Stolcke, Roland Maas:
Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation. 1298-1302

Phonetics I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuniharaZMN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuniharaZMN22
Takuya Kunihara, Chuanbo Zhu, Nobuaki Minematsu, Noriko Nakanishi:
Gradual Improvements Observed in Learners' Perception and Production of L2 Sounds Through Continuing Shadowing Practices on a Daily Basis. 1303-1307
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KirchhubelB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KirchhubelB22
Christin Kirchhübel, Georgina Brown:
Spoofed speech from the perspective of a forensic phonetician. 1308-1312
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JeonN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JeonN22
Hae-Sung Jeon, Stephen Nichols:
Investigating Prosodic Variation in British English Varieties using ProPer. 1313-1317
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HwangHK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HwangHK22
Hyun Kyung Hwang, Manami Hirayama, Takaomi Kato:
Perceived prominence and downstep in Japanese. 1318-1321
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlicehajicH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlicehajicH22
Andrea Alicehajic, Silke Hamann:
The discrimination of [zi]-[dʑi] by Japanese listeners and the prospective phonologization of /zi/. 1322-1326
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LangheinrichSZB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LangheinrichSZB22
Ingo Langheinrich, Simon Stone, Xinyu Zhang, Peter Birkholz:
Glottal inverse filtering based on articulatory synthesis and deep learning. 1327-1331
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LudusanSW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LudusanSW22
Bogdan Ludusan, Marin Schröer, Petra Wagner:
Investigating phonetic convergence of laughter in conversation. 1332-1336
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DelvauxLDSNP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DelvauxLDSNP22
Véronique Delvaux, Audrey Lavallée, Fanny Degouis, Xavier Saloppe, Jean-Louis Nandrino, Thierry Pham:
Telling self-defining memories: An acoustic study of natural emotional speech productions. 1337-1341
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SpinuVLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SpinuVLL22
Laura Spinu, Ioana Vasilescu, Lori Lamel, Jason Lilley:
Voicing neutralization in Romanian fricatives across different speech styles. 1342-1346
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiaoHCKCSKVFH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiaoHCKCSKVFH22
Sishi Liao, Phil Hoole, Conceição Cunha, Esther Kunay, Aletheia Cui, Lia Saki Bucar Shigemori, Felicitas Kleber, Dirk Voit, Jens Frahm, Jonathan Harrington:
Nasal Coda Loss in the Chengdu Dialect of Mandarin: Evidence from RT-MRI. 1347-1351
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BuechRPMH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BuechRPMH22
Philipp Buech, Simon Roessig, Lena Pagel, Doris Mücke, Anne Hermes:
ema2wav: doing articulation by Praat. 1352-1356

Multi-, Cross-lingual and Other Topics in ASR I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RumbergGELO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RumbergGELO22
Lars Rumberg, Christopher Gebauer, Hanna Ehlert, Ulrike Lüdtke, Jörn Ostermann:
Improving Phonetic Transcriptions of Children's Speech by Pronunciation Modelling with Constrained CTC-Decoding. 1357-1361
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kak0MCK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kak0MCK22
Soky Kak, Sheng Li, Masato Mimura, Chenhui Chu, Tatsuya Kawahara:
Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism. 1362-1366
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MussakhojayevaK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MussakhojayevaK22
Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol:
KSC2: An Industrial-Scale Open-Source Kazakh Speech Corpus. 1367-1371
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SzalaySAB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SzalaySAB22
Tünde Szalay, Mostafa Ali Shahin, Beena Ahmed, Kirrie J. Ballard:
Knowledge of accent differences can be used to predict speech recognition. 1372-1376
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ScharfHWKW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ScharfHWKW22
Maximilian Karl Scharf, Sabine Hochmuth, Lena L. N. Wong, Birger Kollmeier, Anna Warzybok:
Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features. 1377-1381
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FlechlYPS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FlechlYPS22
Martin Flechl, Shou-Chun Yin, Junho Park, Peter Skala:
End-to-end speech recognition modeling from de-identified data. 1382-1386
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YadavalliGV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YadavalliGV22
Aditya Yadavalli, Mirishkar Sai Ganesh, Anil Kumar Vuppala:
Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition. 1387-1391
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XieH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XieH22
Jiamin Xie, John H. L. Hansen:
DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition. 1392-1396

Zero, low-resource and multi-modal speech recognition I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeB22
Yuna Lee, Seung Jun Baek:
Keyword Spotting with Synthetic Data using Heterogeneous Knowledge Distillation. 1397-1401
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SeysselLADW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SeysselLADW22
Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski:
Probing phoneme, language and speaker information in unsupervised speech representations. 1402-1406
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BirladeanuMV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BirladeanuMV22
Andrei Bîrladeanu, Helen Minnis, Alessandro Vinciarelli:
Automatic Detection of Reactive Attachment Disorder Through Turn-Taking Analysis in Clinical Child-Caregiver Sessions. 1407-1410
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimJSK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimJSK22
Eesung Kim, Jae-Jin Jeon, Hyeji Seo, Hoon Kim:
Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning. 1411-1415
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MillerH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MillerH22
Tyler Miller, David Harwath:
Exploring Few-Shot Fine-Tuning Strategies for Models of Visually Grounded Speech. 1416-1420
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HwangSHS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HwangSHS22
Dongseong Hwang, Khe Chai Sim, Zhouyuan Huo, Trevor Strohman:
Pseudo Label Is Better Than Human Label. 1421-1425
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MerweKP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MerweKP22
Werner van der Merwe, Herman Kamper, Johan Adam du Preez:
A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery. 1426-1430

Speaker Embedding and Diarization

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengSC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengSC22
Siqi Zheng, Hongbin Suo, Qian Chen:
PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification. 1431-1435
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Qin0W0022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Qin0W0022
Xiaoyi Qin, Na Li, Chao Weng, Dan Su, Ming Li:
Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings. 1436-1440
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLL22
Weiqing Wang, Ming Li, Qingjian Lin:
Online Target Speaker Voice Activity Detection for Speaker Diarization. 1441-1445
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BrummerSMSPSB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BrummerSMSPSB22
Niko Brummer, Albert Swart, Ladislav Mosner, Anna Silnova, Oldrich Plchot, Themos Stafylakis, Lukás Burget:
Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings. 1446-1450
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Gu22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Gu22
Bin Gu:
Deep speaker embedding with frame-constrained training strategy for speaker verification. 1451-1455
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenGLCZ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenGLCZ022
Yifan Chen, Yifan Guo, Qingxuan Li, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan:
Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization. 1456-1460
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeDL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeDL22
Mao-Kui He, Jun Du, Chin-Hui Lee:
End-to-End Audio-Visual Neural Speaker Diarization. 1461-1465
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YueDHYW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YueDHYW22
Yanyan Yue, Jun Du, Mao-Kui He, Yu Ting Yeung, Renyu Wang:
Online Speaker Diarization with Core Samples Selection. 1466-1470
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangW22
Chenyu Yang, Yu Wang:
Robust End-to-end Speaker Diarization with Generic Neural Clustering. 1471-1475
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Liu0XSLSHCYLWQ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Liu0XSLSHCYLWQ022
Tao Liu, Shuai Fan, Xu Xiang, Hongbo Song, Shaoxiong Lin, Jiaqi Sun, Tianyuan Han, Siyuan Chen, Binwei Yao, Sen Liu, Yifei Wu, Yanmin Qian, Kai Yu:
MSDWild: Multi-modal Speaker Diarization Dataset in the Wild. 1476-1480
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanveerCKJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanveerCKJ22
Md. Iftekhar Tanveer, Diego Casabuena, Jussi Karlgren, Rosie Jones:
Unsupervised Speaker Diarization that is Agnostic to Language, Overlap-Aware, and Tuning Free. 1481-1485
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KinoshitaNDBH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KinoshitaNDBH22
Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Böddeker, Reinhold Haeb-Umbach:
Utterance-by-utterance overlap-aware neural diarization with Graph-PIT. 1486-1490
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLWZLXZTLH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLWZLXZTLH22
Jie Wang, Yuji Liu, Binling Wang, Yiming Zhi, Song Li, Shipeng Xia, Jiayang Zhang, Feng Tong, Lin Li, Qingyang Hong:
Spatial-aware Speaker Diarizaiton for Multi-channel Multi-party Meeting. 1491-1495

Acoustic Event Detection and Classification

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangLLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangLLL22
Yunhao Liang, Yanhua Long, Yijie Li, Jiaen Liang:
Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection. 1496-1500
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLT22
Peng Liu, Songbin Li, Jigang Tang:
An End-to-End Macaque Voiceprint Verification Method Based on Channel Fusion Mechanism. 1501-1505
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuWWBZM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuWWBZM22
Liang Xu, Jing Wang, Lizhong Wang, Sijun Bi, Jianqian Zhang, Qiuyue Ma:
Human Sound Classification based on Feature Fusion Method with Air and Bone Conducted Signal. 1506-1510
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangWYZW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangWYZW22
Dongchao Yang, Helin Wang, Zhongjie Ye, Yuexian Zou, Wenwu Wang:
RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection. 1511-1515
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TripathiP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TripathiP22
Achyut Mani Tripathi, Konark Paul:
Temporal Self Attention-Based Residual Network for Environmental Sound Classification. 1516-1520
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001Q0M22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001Q0M22
Juncheng Li, Shuhui Qu, Po-Yao Huang, Florian Metze:
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification. 1521-1525
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangYWYZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangYWYZ22
Helin Wang, Dongchao Yang, Chao Weng, Jianwei Yu, Yuexian Zou:
Improving Target Sound Extraction with Timestamp Information. 1526-1530
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuZLHH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuZLHH22
Ying Hu, Xiujuan Zhu, Yunlong Li, Hao Huang, Liang He:
A Multi-grained based Attention Network for Semi-supervised Sound Event Detection. 1531-1535
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkKE22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkKE22
Sangwook Park, Sandeep Reddy Kothinti, Mounya Elhilali:
Temporal coding with magnitude-phase regularization for sound event detection. 1536-1540
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShaoLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShaoLL22
Nian Shao, Erfan Loweimi, Xiaofei Li:
RCT: Random consistency training for semi-supervised sound event detection. 1541-1545
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XinYZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XinYZ22
Yifei Xin, Dongchao Yang, Yuexian Zou:
Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification. 1546-1550
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0105CB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0105CB22
Yu Wang, Mark Cartwright, Juan Pablo Bello:
Active Few-Shot Learning for Sound Event Detection. 1551-1555
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YeSWCX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YeSWCX22
Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng, Jing Xiao:
Uncertainty Calibration for Deep Audio Classifiers. 1556-1560
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HouB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HouB22
Yuanbo Hou, Dick Botteldooren:
Event-related data conditioning for acoustic event classification. 1561-1565

Speech Synthesis: Acoustic Modeling and Neural Waveform Generation II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoLWM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoLWM22
Haohan Guo, Hui Lu, Xixin Wu, Helen Meng:
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS. 1566-1570
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YinTLWZZXZL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YinTLWZZXZL22
Dacheng Yin, Chuanxin Tang, Yanqing Liu, Xiaoqiang Wang, Zhiyuan Zhao, Yucheng Zhao, Zhiwei Xiong, Sheng Zhao, Chong Luo:
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion. 1571-1575
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuongT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuongT22
Manh Luong, Viet-Anh Tran:
FlowVocoder: A small Footprint Neural Vocoder based Normalizing Flow for Speech Synthesis. 1576-1580
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuXH0Z22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuXH0Z22
Yanqing Liu, Ruiqing Xue, Lei He, Xu Tan, Sheng Zhao:
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders. 1581-1585
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuanFYTZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuanFYTZ22
Xin Yuan, Robin Feng, Mingming Ye, Cheng Tuo, Minghang Zhang:
AdaVocoder: Adaptive Vocoder for Custom Voice. 1586-1590
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuZG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuZG22
Shengyuan Xu, Wenxiao Zhao, Jing Guo:
RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses. 1591-1595
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuGC022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuGC022
Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu:
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature. 1596-1600
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeGLZG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeGLZG22
Mengnan He, Tingwei Guo, Zhenxing Lu, Ruixiong Zhang, Caixia Gong:
Improving GAN-based vocoder for fast and high-quality speech synthesis. 1601-1605
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YiHPWZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YiHPWZ22
Yuanhao Yi, Lei He, Shifeng Pan, Xi Wang, Yuchao Zhang:
SoftSpeech: Unsupervised Duration Model in FastSpeech 2. 1606-1610
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoXSWM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoXSWM22
Haohan Guo, Feng-Long Xie, Frank K. Soong, Xixin Wu, Helen Meng:
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS. 1611-1615
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Li0W022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Li0W022
Yuhan Li, Ying Shen, Dongqing Wang, Lin Zhang:
SiD-WaveFlow: A Low-Resource Vocoder Independent of Prior Knowledge. 1616-1620
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GoraiSM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GoraiSM22
Takeru Gorai, Daisuke Saito, Nobuaki Minematsu:
Text-to-speech synthesis using spectral modeling based on non-negative autoencoder. 1621-1625
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanagawaIT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanagawaIT22
Hiroki Kanagawa, Yusuke Ijima, Hiroyuki Toda:
Joint Modeling of Multi-Sample and Subband Signals for Fast Neural Vocoding on CPU. 1626-1630
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KanekoKTS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KanekoKTS22
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki:
MISRNet: Lightweight Neural Vocoder Using Multi-Input Single Shared Residual Blocks. 1631-1635
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MiaoCCMWX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MiaoCCMWX22
Chenfeng Miao, Ting Chen, Minchuan Chen, Jun Ma, Shaojun Wang, Jing Xiao:
A compact transformer-based GAN vocoder. 1636-1640
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TachibanaIGKW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TachibanaIGKW22
Hideyuki Tachibana, Muneyoshi Inahara, Mocho Go, Yotaro Katayama, Yotaro Watanabe:
Diffusion Generative Vocoder for Fullband Speech Synthesis Based on Weak Third-order SDE Solver. 1641-1645

ASR: Architecture and Search

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Variani0RACR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Variani0RACR22
Ehsan Variani, Michael Riley, David Rybach, Cyril Allauzen, Tongzhou Chen, Bhuvana Ramabhadran:
On Adaptive Weight Interpolation of the Hybrid Autoregressive Transducer. 1646-1650
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuCG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuCG22
Ting-Wei Wu, I-Fan Chen, Ankur Gandhe:
Learning to rank with BERT-based confidence models in ASR rescoring. 1651-1655
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiSH0K22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiSH0K22
Jiatong Shi, George Saon, David Haws, Shinji Watanabe, Brian Kingsbury:
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States. 1656-1660
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangWPSY00YP022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangWPSY00YP022
Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu:
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit. 1661-1665
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuMXHMZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuMXHMZ22
Yufei Liu, Rao Ma, Haihua Xu, Yi He, Zejun Ma, Weibin Zhang:
Internal Language Model Estimation Through Explicit Context Vector Learning for Attention-based Encoder-decoder ASR. 1666-1670
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiMDCTL022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiMDCTL022
Zehan Li, Haoran Miao, Keqi Deng, Gaofeng Cheng, Sanli Tian, Ta Li, Yonghong Yan:
Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies. 1671-1675
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaiLHNXZYW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaiLHNXZYW22
Ye Bai, Jie Li, Wenjing Han, Hao Ni, Kaituo Xu, Zhuo Zhang, Cheng Yi, Xiaorui Wang:
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition. 1676-1680
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangSLZWMX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangSLZWMX22
Zhanheng Yang, Sining Sun, Jin Li, Xiaoming Zhang, Xiong Wang, Long Ma, Lei Xie:
CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer. 1681-1685
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangBAZXWZKL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangBAZXWZKL22
Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko, Haizhou Li:
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT. 1686-1690
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RathodDSG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RathodDSG22
Jash Rathod, Nauman Dawalatabad, Shatrughan Singh, Dhananjaya Gowda:
Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition. 1691-1695
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangHS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangHS22
Weiran Wang, Ke Hu, Tara N. Sainath:
Streaming Align-Refine for Non-autoregressive Deliberation. 1696-1700
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinXYZ0MB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinXYZ0MB22
Rongmei Lin, Yonghui Xiao, Tien-Ju Yang, Ding Zhao, Li Xiong, Giovanni Motta, Françoise Beaufays:
Federated Pruning: Improving Neural Network Efficiency with Federated Learning. 1701-1705
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DingWZSHDBWPLHM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DingWZSHDBWPLHM22
Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman:
A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes. 1706-1710
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DingMHLAR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DingMHLAR22
Shaojin Ding, Phoenix Meadowlark, Yanzhang He, Lukasz Lew, Shivani Agrawal, Oleg Rybakov:
4-bit Conformer with Native Quantization Aware Training for Speech Recognition. 1711-1715
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuSWSLLG0D22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuSWSLLG0D22
Qiang Xu, Tongtong Song, Longbiao Wang, Hao Shi, Yuqin Lin, Yongjie Lv, Meng Ge, Qiang Yu, Jianwu Dang:
Self-Distillation Based on High-level Information Supervision for Compressing End-to-End ASR Model. 1716-1720

Spoken Language Processing II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiaDBC0CM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiaDBC0CM22
Ye Jia, Yifan Ding, Ankur Bapna, Colin Cherry, Yu Zhang, Alexis Conneau, Nobu Morioka:
Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation. 1721-1725
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NguyenTDLN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenTDLN22
Linh The Nguyen, Nguyen Luong Tran, Long Doan, Manh Luong, Dat Quoc Nguyen:
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation. 1726-1730
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWZ22
Qian Wang, Chen Wang, Jiajun Zhang:
Investigating Parameter Sharing in Multilingual Speech Translation. 1731-1735
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangCLYYCXJZZX022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangCLYYCXJZZX022
Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan:
Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset. 1736-1740
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiDWWGCB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiDWWGCB22
Chengfei Li, Shuhao Deng, Yaoping Wang, Guangjing Wang, Yaguang Gong, Changbin Chen, Jinfeng Bai:
TALCS: An open-source Mandarin-English code-switching corpus and a speech recognition baseline. 1741-1745
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Deng0SA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Deng0SA22
Keqi Deng, Shinji Watanabe, Jiatong Shi, Siddhant Arora:
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation. 1746-1750
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TranLN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TranLN22
Nguyen Luong Tran, Duong Minh Le, Dat Quoc Nguyen:
BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese. 1751-1755
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MarkitantovRR022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MarkitantovRR022
Maxim Markitantov, Elena Ryumina, Dmitry Ryumin, Alexey Karpov:
Biometric Russian Audio-Visual Extended MASKS (BRAVE-MASKS) Corpus: Multimodal Mask Type Recognition Task. 1756-1760
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChienH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChienH22
Jen-Tzung Chien, Yu-Han Huang:
Bayesian Transformer Using Disentangled Mask Attention. 1761-1765
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenDDLS0SCYP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenDDLS0SCYP22
Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan:
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis. 1766-1770
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuWGMTP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuWGMTP22
Danni Liu, Changhan Wang, Hongyu Gong, Xutai Ma, Yun Tang, Juan Miguel Pino:
From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation. 1771-1775
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TamLVMF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TamLVMF22
Derek Tam, Surafel Melaku Lakew, Yogesh Virkar, Prashant Mathur, Marcello Federico:
Isochrony-Aware Neural Machine Translation for Automatic Dubbing. 1776-1780
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DongYKWBZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DongYKWBZ22
Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Qibing Bai, Yu Zhang:
Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation. 1781-1785

Source Separation I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PanG022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PanG022
Zexu Pan, Meng Ge, Haizhou Li:
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction. 1786-1790
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BergOAO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BergOAO22
Axel Berg, Mark O'Connor, Kalle Åström, Magnus Oskarsson:
Extending GCC-PHAT using Shift Equivariant Neural Networks. 1791-1795
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TzinisWSSR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TzinisWSSR22
Efthymios Tzinis, Gordon Wichern, Aswin Shanmugam Subramanian, Paris Smaragdis, Jonathan Le Roux:
Heterogeneous Target Speech Separation. 1796-1800
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLKMZHPW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLKMZHPW22
Xubo Liu, Haohe Liu, Qiuqiang Kong, Xinhao Mei, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley, Wenwu Wang:
Separate What You Describe: Language-Queried Audio Source Separation. 1801-1805
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MarkovicDR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MarkovicDR22
Dejan Markovic, Alexandre Défossez, Alexander Richard:
Implicit Neural Spatial Filtering for Multichannel Source Separation in the Waveform Domain. 1806-1810

ASR Technologies and Systems

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NozakiKIH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NozakiKIH22
Jumon Nozaki, Tatsuya Kawahara, Kenkichi Ishizuka, Taiichi Hashimoto:
End-to-end Speech-to-Punctuated-Text Recognition. 1811-1815
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PupierCLG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PupierCLG22
Adrien Pupier, Maximin Coavoux, Benjamin Lecouteux, Jérôme Goulian:
End-to-End Dependency Parsing of Spoken French. 1816-1820
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangLSZSLH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangLSZSLH22
Shuo-Yiin Chang, Bo Li, Tara N. Sainath, Chao Zhang, Trevor Strohman, Qiao Liang, Yanzhang He:
Turn-Taking Prediction for Natural Conversational Speech. 1821-1825
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangPWS0LSUFS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangPWS0LSUFS22
Shuo-Yiin Chang, Guru Prakash, Zelin Wu, Tara N. Sainath, Bo Li, Qiao Liang, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman:
Streaming Intended Query Detection using E2E Modeling for Continued Conversation. 1826-1830
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeheckaSPP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeheckaSPP22
Jan Lehecka, Jan Svec, Ales Prazák, Josef Psutka:
Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech. 1831-1835
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MiraHPSP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MiraHPSP22
Rodrigo Schoburg Carrillo de Mira, Alexandros Haliassos, Stavros Petridis, Björn W. Schuller, Maja Pantic:
SVTS: Scalable Video-to-Speech Synthesis. 1836-1840

Speech Perception

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KishiyamaHH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KishiyamaHH22
Takeshi Kishiyama, Chuyu Huang, Yuki Hirose:
One-step models in pitch perception: Experimental evidence from Japanese. 1841-1845
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RamonCL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RamonCL22
Rubén Pérez Ramón, Martin Cooke, María Luisa García Lecumberri:
Generating iso-accented stimuli for second language research: methodology and a dataset for Spanish-accented English. 1846-1850
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeemannJSL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeemannJSL22
Adrian Leemann, Péter Jeszenszky, Carina Steiner, Corinne Lanthemann:
Factors affecting the percept of Yanny v. Laurel (or mixed): Insights from a large-scale study on Swiss German listeners. 1851-1855
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZK22
Zhaoyan Zhang, Jason Zhang, Jody Kreiman:
Effects of laryngeal manipulations on voice gender perception. 1856-1860
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeYF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeYF22
Boram Lee, Naomi Yamaguchi, Cécile Fougeron:
Why is Korean lenis stop difficult to perceive for L2 Korean learners? 1861-1865
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZuritaC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZuritaC22
Alvaro Martin Iturralde Zurita, Meghan Clayards:
Lexical stress in Spanish word segmentation. 1866-1870

Spoken Term Detection and Voice Search

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShinHKCK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShinHKCK22
Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang:
Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting. 1871-1875
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AbdullahMK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AbdullahMK22
Badr M. Abdullah, Bernd Möbius, Dietrich Klakow:
Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings. 1876-1880
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangKCC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangKCC22
Seunghan Yang, Byeonggeun Kim, Inseop Chung, Simyung Chang:
Personalized Keyword Spotting through Multi-task Learning. 1881-1885
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SvecLS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SvecLS22
Jan Svec, Jan Lehecka, Lubos Smídl:
Deep LSTM Spoken Term Detection using Wav2Vec 2.0 Recognizer. 1886-1890
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JoseWSKMK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JoseWSKMK22
Christin Jose, Joe Wang, Grant P. Strimel, Mohammad Omar Khursheed, Yuriy Mishchenko, Brian Kulis:
Latency Control for Keyword Spotting. 1891-1895
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NayakHGRSSMLCAD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NayakHGRSSMLCAD22
Prateeth Nayak, Takuya Higuchi, Anmol Gupta, Shivesh Ranjan, Stephen Shum, Siddharth Sigtia, Erik Marchi, Varun Lakshminarasimhan, Minsik Cho, Saurabh Adya, Chandra Dhir, Ahmed H. Tewfik:
Improving Voice Trigger Detection with Metric Learning. 1896-1900

Speech and Language in Health: From Remote Monitoring to Medical Conversations I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SoltauSWS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SoltauSWS22
Hagen Soltau, Izhak Shafran, Mingqiu Wang, Laurent El Shafey:
RNN Transducers for Named Entity Recognition with constraints on alignment for understanding medical conversations. 1901-1905
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuHRR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuHRR22
Zixiu Wu, Rim Helaoui, Diego Reforgiato Recupero, Daniele Riboni:
Towards Automated Counselling Decision-Making: Remarks on Therapist Action Forecasting on the AnnoMI Dataset. 1906-1910
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FaraGMC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FaraGMC22
Salvatore Fara, Stefano Goria, Emilia Molimpakis, Nicholas Cummins:
Speech and the n-Back task as a lens into depression. How combining both may allow us to isolate different core symptoms of depression. 1911-1915
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RomanaNPRP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RomanaNPRP22
Amrit Romana, Minxue Niu, Matthew Perez, Angela Roberts, Emily Mower Provost:
Enabling Off-the-Shelf Disfluency Detection and Categorization for Pathological Speech. 1916-1920
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BotelhoSAT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BotelhoSAT22
Catarina Botelho, Tanja Schultz, Alberto Abad, Isabel Trancoso:
Challenges of using longitudinal and cross-domain corpora on studies of pathological speech. 1921-1925

Speech Synthesis: Linguistic Processing, Paradigms and Other Topics I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenSCY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenSCY22
Yi-Chang Chen, Yu-Chuan Steven, Yen-Cheng Chang, Yi-Ren Yeh:
g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin. 1926-1930
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkYT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkYT22
Byeongseon Park, Ryuichi Yamamoto, Kentaro Tachibana:
A Unified Accent Estimation Method Based on Multi-Task Learning for Japanese Text-to-Speech. 1931-1935
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RaitioPLSDS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RaitioPLSDS22
Tuomo Raitio, Petko Petkov, Jiangchuan Li, P. V. Muhammed Shifas, Andrea Davis, Yannis Stylianou:
Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise. 1936-1940
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SongYKSHOYKK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SongYKSHOYKK22
Eunwoo Song, Ryuichi Yamamoto, Ohsung Kwon, Chan-Ho Song, Min-Jae Hwang, Suhyeon Oh, Hyun-Wook Yoon, Jin-Seob Kim, Jae-Min Kim:
TTS-by-TTS 2: Data-Selective Augmentation for Neural Speech Synthesis Using Ranking Support Vector Machine with Variational Autoencoder. 1941-1945
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CominiHRGL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CominiHRGL22
Giulia Comini, Goeric Huybrechts, Manuel Sam Ribeiro, Adam Gabrys, Jaime Lorenzo-Trueba:
Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation. 1946-1950

Show and Tell II

- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/IngleKGV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IngleKGV22
Digvijay Ingle, Ayush Kumar, Krishnachaitanya Gogineni, Jithendra Vepa:
Real-Time Monitoring of Silences in Contact Center Conversations. 1951-1952
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/ZielinskiGH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZielinskiGH22
Konrad Zielinski, Marek Grzelec, Martin Hagmüller:
Humanizing bionic voice: interactive demonstration of aesthetic design and control factors influencing the devices assembly and waveshape engineering. 1953-1954
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/RonssinC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RonssinC22
Damien Ronssin, Milos Cernak:
Application for Real-time Personalized Speaker Extraction. 1955-1956
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/BhattacharyaDSC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BhattacharyaDSC22
Debarpan Bhattacharya, Debottam Dutta, Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K. K, Sadhana Gonuguntla, Murali Alagesan:
Coswara: A website application enabling COVID-19 screening by analysing respiratory sound samples and health symptoms. 1957-1958
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/SchaferPKONMASA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SchaferPKONMASA22
P. Schäfer, Paula Andrea Pérez-Toro, Philipp Klumpp, Juan Rafael Orozco-Arroyave, Elmar Nöth, Andreas K. Maier, A. Abad, Maria Schuster, Tomás Arias-Vergara:
CoachLea: an Android Application to Evaluate the Speech Production and Perception of Children with Hearing Loss. 1959-1960
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/HaiderL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HaiderL22
Fasih Haider, Saturnino Luz:
An Automated Mood Diary for Older User's using Ambient Assisted Living Recorded Speech. 1961-1962

Multimodal Speech Emotion Recognition and Paralinguistics

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuZ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuZ022
Hai-tao Xu, Jie Zhang, Li-Rong Dai:
Differential Time-frequency Log-mel Spectrogram Features for Vision Transformer Based Infant Cry Recognition. 1963-1967
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FernauHFP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FernauHFP22
Daniel Fernau, Stefan Hillmann, Nils Feldhus, Tim Polzehl:
Towards Automated Dialog Personalization using MBTI Personality Indicators. 1968-1972
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QianSH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QianSH22
Fan Qian, Hongwei Song, Jiqing Han:
Word-wise Sparse Attention for Multimodal Sentiment Analysis. 1973-1977
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuptaTAC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuptaTAC22
Tarun Gupta, Duc-Tuan Truong, Tran The Anh, Eng Siong Chng:
Estimation of speaker age and height from speech signal using bi-encoder transformer mixture model. 1978-1982
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengYLZZZF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengYLZZZF22
Weiqiao Zheng, Ping Yang, Rongfeng Lai, Kongyang Zhu, Tao Zhang, Junpeng Zhang, Hongcheng Fu:
Exploring Multi-task Learning Based Gender Recognition and Age Estimation for Class-imbalanced Data. 1983-1987
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiHYLD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiHYLD22
Jie Wei, Guanyu Hu, Xinyu Yang, Anh Tuan Luu, Yizhuo Dong:
Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition. 1988-1992
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangD22
Minyue Zhang, Hongwei Ding:
Impact of Background Noise and Contribution of Visual Information in Emotion Identification by Native Mandarin Speakers. 1993-1997
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangFHO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangFHO22
Wei Yang, Satoru Fukayama, Panikos Heracleous, Jun Ogata:
Exploiting Fine-tuning of Self-supervised Learning Models for Improving Bi-modal Sentiment Analysis and Emotion Recognition. 1998-2002
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaoLCL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaoLCL22
Dehua Tao, Tan Lee, Harold Chui, Sarah Luk:
Characterizing Therapist's Speaking Style in Relation to Empathy in Psychotherapy. 2003-2007
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaoLCL22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaoLCL22a
Dehua Tao, Tan Lee, Harold Chui, Sarah Luk:
Hierarchical Attention Network for Evaluating Therapist Empathy in Counseling Session. 2008-2012
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWCLM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWCLM22
Jinchao Li, Shuai Wang, Yang Chao, Xunying Liu, Helen Meng:
Context-aware Multimodal Fusion for Emotion Recognition. 2013-2017
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangRFA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangRFA22
Jinhan Wang, Vijay Ravi, Jonathan Flint, Abeer Alwan:
Unsupervised Instance Discriminative Learning for Depression Detection from Speech Signals. 2018-2022
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SardhaeiZS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SardhaeiZS22
Nasim Mahdinazhad Sardhaei, Marzena Zygis, Hamid Sharifzadeh:
How do our eyebrows respond to masks and whispering? The case of Persians. 2023-2027
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BairdTBKOGMBKC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BairdTBKOGMBKC22
Alice Baird, Panagiotis Tzirakis, Jeffrey A. Brooks, Lauren Kim, Michael Opara, Christopher B. Gregory, Jacob Metrick, Garrett Boseck, Dacher Keltner, Alan Cowen:
State & Trait Measurement from Nonverbal Vocalizations: A Multi-Task Joint Learning Approach. 2028-2032
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SarafS022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SarafS022
Amruta Saraf, Ganesh Sivaraman, Elie Khoury:
Confidence Measure for Automatic Age Estimation From Speech. 2033-2037

Neural Transducers, Streaming ASR and Novel ASR Models

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FasoliCSVSCKG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FasoliCSVSCKG22
Andrea Fasoli, Chia-Yu Chen, Mauricio J. Serrano, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Kailash Gopalakrishnan:
Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization. 2038-2042
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunZW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunZW22
Guangzhi Sun, Chao Zhang, Philip C. Woodland:
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition. 2043-2047
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HouCLTZM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HouCLTZM22
Junfeng Hou, Jinkun Chen, Wanyu Li, Yufeng Tang, Jun Zhang, Zejun Ma:
Bring dialogue-context into RNN-T for streaming ASR. 2048-2052
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeningerGHFAZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeningerGHFAZ22
Felix Weninger, Marco Gaudesi, Md. Akmal Haidar, Nicola Ferri, Jesús Andrés-Ferrer, Puming Zhan:
Conformer with dual-mode chunked attention for joint online and offline ASR. 2053-2057
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouMSN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouMSN22
Wei Zhou, Wilfried Michel, Ralf Schlüter, Hermann Ney:
Efficient Training of Neural Transducer for Speech Recognition. 2058-2062
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoZ0Y22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoZ0Y22
Zhifu Gao, Shiliang Zhang, Ian McLoughlin, Zhijie Yan:
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition. 2063-2067
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuangGKLLYP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuangGKLLYP22
Fangjun Kuang, Liyong Guo, Wei Kang, Long Lin, Mingshuang Luo, Zengwei Yao, Daniel Povey:
Pruned RNN-T for fast, memory-efficient ASR training. 2068-2072
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wu22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wu22
Xianchao Wu:
Deep Sparse Conformer for Speech Recognition. 2073-2077
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeHCW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeHCW22
Hung-Shin Lee, Pin-Tuan Huang, Yao-Fei Cheng, Hsin-Min Wang:
Chain-based Discriminative Autoencoders for Speech Recognition. 2078-2082
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MahadeokarSLLZC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MahadeokarSLLZC22
Jay Mahadeokar, Yangyang Shi, Ke Li, Duc Le, Jiedan Zhu, Vikas Chandra, Ozlem Kalinli, Michael L. Seltzer:
Streaming parallel transducer beam search with fast slow cascaded encoders. 2083-2087
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiDZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiDZ22
Mohan Li, Rama Sanand Doddipatla, Catalin Zorila:
Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition. 2088-2092
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlbesanoAFZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlbesanoAFZ22
Dario Albesano, Jesús Andrés-Ferrer, Nicola Ferri, Puming Zhan:
On the Prediction Network Architecture in RNN-T for ASR. 2093-2097
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Shinohara022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Shinohara022
Yusuke Shinohara, Shinji Watanabe:
Minimum latency training of sequence transducers for streaming end-to-end speech recognition. 2098-2102
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AnZOXDW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AnZOXDW22
Keyu An, Huahuan Zheng, Zhijian Ou, Hongyu Xiang, Ke Ding, Guanglu Wan:
CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR. 2103-2107
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wu22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wu22a
Xianchao Wu:
Attention Enhanced Citrinet for Speech Recognition. 2108-2112

Zero, Low-resource and Multi-Modal Speech Recognition II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuBA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuBA22
Qiantong Xu, Alexei Baevski, Michael Auli:
Simple and Effective Zero-shot Cross-lingual Phoneme Recognition. 2113-2117
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiHM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiHM22
Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed:
Robust Self-Supervised Audio-Visual Speech Recognition. 2118-2122
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AlgayresNSD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AlgayresNSD22
Robin Algayres, Adel Nabli, Benoît Sagot, Emmanuel Dupoux:
Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning. 2123-2127
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuHLM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuHLM22
Junhao Xu, Shoukang Hu, Xunying Liu, Helen Meng:
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Swithboard Corpus. 2128-2132
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QinW0LD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QinW0LD22
Siqing Qin, Longbiao Wang, Sheng Li, Yuqin Lin, Jianwu Dang:
Finer-grained Modeling units-based Meta-Learning for Low-resource Tibetan Speech Recognition. 2133-2137

Atypical Speech Analysis and Detection

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JanbakhshiK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JanbakhshiK22
Parvaneh Janbakhshi, Ina Kodrasi:
Adversarial-Free Speaker Identity-Invariant Representation Learning for Automatic Dysarthric Speech Classification. 2138-2142
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangYW022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangYW022
Zhenglin Zhang, Lizhuang Yang, Xun Wang, Hai Li:
Automated Detection of Wilson's Disease Based on Improved Mel-frequency Cepstral Coefficients with Signal Decomposition. 2143-2147
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FanSPXW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FanSPXW22
Zixia Fan, Jing Shao, Weigong Pan, Min Xu, Lan Wang:
The effect of backward noise on lexical tone discrimination in Mandarin-speaking amusics. 2148-2152
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KeMM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KeMM22
Xiaoquan Ke, Man-Wai Mak, Helen M. Meng:
Automatic Selection of Discriminative Features for Dementia Detection in Cantonese-Speaking People. 2153-2157
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuHM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuHM22
Zhuoya Liu, Mark A. Huckvale, Julian McGlashan:
Automated Voice Pathology Discrimination from Continuous Speech Benefits from Analysis by Phonetic Context. 2158-2162
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Mallol-RagoltaC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Mallol-RagoltaC22
Adria Mallol-Ragolta, Helena Cuesta, Emilia Gómez, Björn W. Schuller:
Multi-Type Outer Product-Based Fusion of Respiratory Sounds for Detecting COVID-19. 2163-2167
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangSZZ0HTWZZS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangSZZ0HTWZZS22
Xueshuai Zhang, Jiakun Shen, Jun Zhou, Pengyuan Zhang, Yonghong Yan, Zhihua Huang, Yanfen Tang, Yu Wang, Fujie Zhang, Shaoxing Zhang, Aijun Sun:
Robust Cough Feature Extraction and Classification Method for COVID-19 Cough Detection Based on Vocalization Characteristics. 2168-2172
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JavanmardiKKA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JavanmardiKKA22
Farhad Javanmardi, Sudarsana Reddy Kadiri, Manila Kodali, Paavo Alku:
Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers. 2173-2177
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChatzoudisPSDKK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChatzoudisPSDKK22
Gerasimos Chatzoudis, Manos Plitsis, Spyridoula Stamouli, Athanasia-Lida Dimou, Nassos Katsamanis, Vassilis Katsouros:
Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition. 2178-2182
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuLBR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuLBR22
Youxiang Zhu, Xiaohui Liang, John A. Batsis, Robert M. Roth:
Domain-aware Intermediate Pretraining for Dementia Detection with Limited Data. 2183-2187
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FougeronAKJPLBL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FougeronAKJPLBL22
Cécile Fougeron, Nicolas Audibert, Ina Kodrasi, Parvaneh Janbakhshi, Michaela Pernon, Nathalie Lévêque, Stephanie Borel, Marina Laganaro, Hervé Bourlard, Frédéric Assal:
Comparison of 5 methods for the evaluation of intelligibility in mild to moderate French dysarthric speech. 2188-2192

Adaptation, Transfer Learning, and Distillation for ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangFZL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangFZL22
Kuan-Po Huang, Yu-Kuan Fu, Yu Zhang, Hung-yi Lee:
Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation. 2193-2197
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Lin0L22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Lin0L22
Guan-Ting Lin, Shang-Wen Li, Hung-yi Lee:
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition. 2198-2202
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiP22
Kwanghee Choi, Hyung-Min Park:
Distilling a Pretrained Language Model to a Multilingual ASR Model. 2203-2207
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SatoKMKMSO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SatoKMKMSO22
Hiroaki Sato, Tomoyasu Komori, Takeshi Mishima, Yoshihiko Kawai, Takahiro Mochizuki, Shoei Sato, Tetsuji Ogawa:
Text-Only Domain Adaptation Based on Intermediate CTC. 2208-2212
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThienpondtD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThienpondtD22
Jenthe Thienpondt, Kris Demuynck:
Transfer Learning for Robust Low-Resource Children's Speech ASR with Transformers and Source-Filter Warping. 2213-2217
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TakashimaH0GK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TakashimaH0GK22
Yuki Takashima, Shota Horiguchi, Shinji Watanabe, Leibny Paola García-Perera, Yohei Kawaguchi:
Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models. 2218-2222

Speaker and Language Recognition I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiYJC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiYJC22
Jeong-Hwan Choi, Joon-Young Yang, Ye-Rin Jeoung, Joon-Hyuk Chang:
Improved CNN-Transformer using Broadcasted Residual Learning for Text-Independent Speaker Verification. 2223-2227
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JungKHLKC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JungKHLKC22
Jee-weon Jung, You Jin Kim, Hee-Soo Heo, Bong-Jin Lee, Youngki Kwon, Joon Son Chung:
Pushing the limits of raw waveform speaker recognition. 2228-2232
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuGKSK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuGKSK22
Hexin Liu, Leibny Paola García-Perera, Andy W. H. Khong, Suzy J. Styles, Sanjeev Khudanpur:
PHO-LID: A Unified Model Incorporating Acoustic-Phonetic and Phonotactic Information for Language Identification. 2233-2237
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TzudirSP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TzudirSP22
Moakala Tzudir, Priyankoo Sarmah, S. R. Mahadeva Prasanna:
Prosodic Information in Dialect Identification of a Tonal Language: The case of Ao. 2238-2242
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeC22
Wo Jae Lee, Emanuele Coviello:
A Multimodal Strategy for Singing Language Identification. 2243-2247

Pathological Speech Analysis

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DaoudiDVFFTRWM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DaoudiDVFFTRWM22
Khalid Daoudi, Biswajit Das, Solange Milhé de Saint Victor, Alexandra Foubert-Samier, Margherita Fabbri, Anne Pavy-Le Traon, Olivier Rascol, Virginie Woisard, Wassilios G. Meissner:
A comparative study on vowel articulation in Parkinson's disease and multiple system atrophy. 2248-2252
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ArdaillonHP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ArdaillonHP22
Luc Ardaillon, Nathalie Henrich Bernardoni, Olivier Perrotin:
Voicing decision based on phonemes classification and spectral moments for whisper-to-speech conversion. 2253-2257
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TalkarMWSQ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TalkarMWSQ22
Tanya Talkar, Christina Manxhari, James J. Williamson, Kara M. Smith, Thomas F. Quatieri:
Speech Acoustics in Mild Cognitive Impairment and Parkinson's Disease With and Without Concurrent Drawing Tasks. 2258-2262
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TranXSLBU22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TranXSLBU22
Kelvin Tran, Lingfeng Xu, Gabriela Stegmann, Julie Liss, Visar Berisha, Rene Utianski:
Investigating the Impact of Speech Compression on the Acoustics of Dysarthric Speech. 2263-2267
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BrueggemanH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BrueggemanH22
Avamarie Brueggeman, John H. L. Hansen:
Speaker Trait Enhancement for Cochlear Implant Users: A Case Study for Speaker Emotion Perception. 2268-2272
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ReddyLZC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ReddyLZC22
Neha Reddy, Yoonjeong Lee, Zhaoyan Zhang, Dinesh K. Chhetri:
Optimal thyroplasty implant shape and stiffness for treatment of acute unilateral vocal fold paralysis: Evidence from a canine in vivo phonation model. 2273-2277

Cross/Multi-lingual ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BabuWTLXGSPSPBC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BabuWTLXGSPSPBC22
Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli:
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale. 2278-2282
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RugayanSS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RugayanSS22
Janine Rugayan, Torbjørn Svendsen, Giampiero Salvi:
Semantically Meaningful Metrics for Norwegian ASR Systems. 2283-2287
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KlejchW022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KlejchW022
Ondrej Klejch, Electra Wallington, Peter Bell:
Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR. 2288-2292
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KumarARKR0J22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KumarARKR0J22
Rishabh Kumar, Devaraja Adiga, Rishav Ranjan, Amrith Krishna, Ganesh Ramakrishnan, Pawan Goyal, Preethi Jyothi:
Linguistically Informed Post-processing for ASR Error correction in Sanskrit. 2293-2297
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MorshedH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MorshedH22
Mahir Morshed, Mark Hasegawa-Johnson:
Cross-lingual articulatory feature information transfer for speech recognition using recurrent progressive neural networks. 2298-2302

Speaking Styles and Interaction Styles I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AguirreWAL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AguirreWAL22
Diego Aguirre, Nigel G. Ward, Jonathan E. Avila, Heike Lehnert-LeHouillier:
Comparison of Models for Detecting Off-Putting Speaking Styles. 2303-2307
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KawanoAYYIK0Y22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KawanoAYYIK0Y22
Seiya Kawano, Muteki Arioka, Akishige Yuguchi, Kenta Yamamoto, Koji Inoue, Tatsuya Kawahara, Satoshi Nakamura, Koichiro Yoshino:
Multimodal Persuasive Dialogue Corpus using Teleoperated Android. 2308-2312
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShinLJHK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShinLJHK22
Yookyung Shin, Younggun Lee, Suhee Jo, Yeongtae Hwang, Taesu Kim:
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS. 2313-2317
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AdigweK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AdigweK22
Adaeze O. Adigwe, Esther Klabbers:
Strategies for developing a Conversational Speech Dataset for Text-To-Speech Synthesis. 2318-2322
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoNC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoNC22
Xiyuan Gao, Shekhar Nayak, Matt Coler:
Deep CNN-based Inductive Transfer Learning for Sarcasm Detection in Speech. 2323-2327

Speaking Styles and Interaction Styles II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MitsuiZSHNT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MitsuiZSHNT22
Kentaro Mitsui, Tianyu Zhao, Kei Sawada, Yukiya Hono, Yoshihiko Nankaku, Keiichi Tokuda:
End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue. 2328-2332
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AfshanA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AfshanA22
Amber Afshan, Abeer Alwan:
Attention-based conditioning methods using variable frame rate for style-robust speaker verification. 2333-2337
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AfshanA22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AfshanA22a
Amber Afshan, Abeer Alwan:
Learning from human perception to improve automatic speaker verification in style-mismatched conditions. 2338-2342
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MartikainenKT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MartikainenKT22
Katariina Martikainen, Jussi Karlgren, Khiet Truong:
Exploring audio-based stylistic variation in podcasts. 2343-2347

Speech Synthesis: Tools, Data, and Evaluation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DejaSRC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DejaSRC22
Kamil Deja, Ariadna Sánchez, Julian Roth, Marius Cotescu:
Automatic Evaluation of Speaker Similarity. 2348-2352
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangFSAY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangFSAY22
Ziyao Zhang, Alessio Falai, Ariadna Sánchez, Orazio Angelini, Kayoko Yanagisawa:
Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS). 2353-2357
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TakamichiNTS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TakamichiNTS22
Shinnosuke Takamichi, Wataru Nakata, Naoko Tanji, Hiroshi Saruwatari:
J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis. 2358-2362
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WebberLB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WebberLB22
Jacob Webber, Samuel K. Lo, Isaac L. Bleaman:
REYD - The First Yiddish Text-to-Speech Dataset and System. 2363-2367
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KorteKKAK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KorteKKAK22
Marcel de Korte, Jaebok Kim, Aki Kunikoshi, Adaeze Adigwe, Esther Klabbers:
Data-augmented cross-lingual synthesis in a teacher-student framework. 2368-2372
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PandeyMCH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PandeyMCH22
Ayushi Pandey, Sébastien Le Maguer, Julie Carson-Berndsen, Naomi Harte:
Production characteristics of obstruents in WaveNet and older TTS systems. 2373-2377
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaguerKH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaguerKH22
Sébastien Le Maguer, Simon King, Naomi Harte:
Back to the Future: Extending the Blizzard Challenge 2013. 2378-2382
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MeyerACOWWKSOLO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MeyerACOWWKSOLO22
Josh Meyer, David Ifeoluwa Adelani, Edresson Casanova, Alp Öktem, Daniel Whitenack, Julian Weber, Salomon Kabongo, Elizabeth Salesky, Iroro Orife, Colin Leong, Perez Ogayo, Chris Chinenye Emezue, Jonathan Mukiibi, Salomey Osei, Apelete Agbolo, Victor Akinode, Bernard Opoku, Samuel Olanrewaju, Jesujoba O. Alabi, Shamsuddeen Hassan Muhammad:
BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus. 2383-2387
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ManiatiVENKSJCT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ManiatiVENKSJCT22
Georgia Maniati, Alexandra Vioni, Nikolaos Ellinas, Karolos Nikitaras, Konstantinos Klapsas, June Sig Sung, Gunu Jho, Aimilios Chalamandaris, Pirros Tsiakoulis:
SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis. 2388-2392

Acoustic Signal Representation and Analysis II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimYKPLC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimYKPLC22
Byeonggeun Kim, Seunghan Yang, Jangho Kim, Hyunsin Park, Juntae Lee, Simyung Chang:
Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene Classification. 2393-2397
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaoYOW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaoYOW22
Rui Tao, Long Yan, Kazushige Ouchi, Xiangdong Wang:
Couple learning for semi-supervised sound event detection. 2398-2402
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RajanA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RajanA22
Rajeev Rajan, Ananya Ayasi:
Oktoechos Classification in Liturgical Music Using SBU-LSTM/GRU. 2403-2407
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeM22
Yuhang He, Andrew Markham:
SoundDoA: Learn Sound Source Direction of Arrival and Semantics from Sound Raw Waveforms. 2408-2412
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BerglerBPSMN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BerglerBPSMN22
Christian Bergler, Alexander Barnhill, Dominik Perrin, Manuel Schmitt, Andreas K. Maier, Elmar Nöth:
ORCA-WHISPER: An Automatic Killer Whale Sound Type Generation Toolkit Using Deep Learning. 2413-2417
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangC22
Joon-Hyuk Chang, Won-Gook Choi:
Convolutional Recurrent Neural Network with Auxiliary Stream for Robust Variable-Length Acoustic Scene Classification. 2418-2422
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BassanAR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BassanAR22
Shahaf Bassan, Yossi Adi, Jeffrey S. Rosenschein:
Unsupervised Symbolic Music Segmentation using Ensemble Temporal Prediction Errors. 2423-2427
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShirianSSG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShirianSSG22
Amir Shirian, Krishna Somandepalli, Victor Sanchez, Tanaya Guha:
Visually-aware Acoustic Event Detection using Heterogeneous Graphs. 2428-2432
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SinghP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SinghP22
Arshdeep Singh, Mark D. Plumbley:
A Passive Similarity based CNN Filter Pruning for Efficient Acoustic Scene Classification. 2433-2437
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaadePH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaadePH22
Alan Baade, Puyuan Peng, David Harwath:
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer. 2438-2442

Speech and Language in Health: From Remote Monitoring to Medical Conversations II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BayerlRCCDRR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BayerlRCCDRR22
Sebastian Peter Bayerl, Gabriel Roccabruna, Shammur Absar Chowdhury, Tommaso Ciulli, Morena Danieli, Korbinian Riedhammer, Giuseppe Riccardi:
What can Speech and Language Tell us About the Working Alliance in Psychotherapy. 2443-2447
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FrostTN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FrostTN22
Geoffrey T. Frost, Grant Theron, Thomas Niesler:
TB or not TB? Acoustic cough analysis for tuberculosis classification. 2448-2452
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BerishaKSHL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BerishaKSHL22
Visar Berisha, Chelsea Krantsevich, Gabriela Stegmann, Shira Hahn, Julie Liss:
Are reported accuracies in the clinical speech machine learning literature overoptimistic? 2453-2457
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MirheidariBCDFC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MirheidariBCDFC22
Bahman Mirheidari, André Bittar, Nicholas Cummins, Johnny Downs, Helen L. Fisher, Heidi Christensen:
Automatic Detection of Expressed Emotion from Five-Minute Speech Samples: Challenges and Opportunities. 2458-2462
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MirheidariBC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MirheidariBC22
Bahman Mirheidari, Daniel Blackburn, Heidi Christensen:
Automatic cognitive assessment: Combining sparse datasets with disparate cognitive scores. 2463-2467
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DangQM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DangQM22
Ting Dang, Thomas Quinnell, Cecilia Mascolo:
Exploring Semi-supervised Learning for Audio-based COVID-19 Detection using FixMatch. 2468-2472
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BhattacharyaDSC22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BhattacharyaDSC22a
Debarpan Bhattacharya, Debottam Dutta, Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Pravin Mote, Sriram Ganapathy, Chandrakiran C, Sahiti Nori, Suhail K. K, Sadhana Gonuguntla, Murali Alagesan:
Analyzing the impact of SARS-CoV-2 variants on respiratory sound signals. 2473-2477
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BraunFOELHR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BraunFOELHR22
Franziska Braun, Markus Förstel, Bastian Oppermann, Andreas Erzigkeit, Hartmut Lehfeld, Thomas Hillemacher, Korbinian Riedhammer:
Automated Evaluation of Standardized Dementia Screening Tests. 2478-2482
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Perez-ToroKHALS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Perez-ToroKHALS22
Paula Andrea Pérez-Toro, Philipp Klumpp, Abner Hernandez, Tomas Arias, Patricia Lillo, Andrea Slachevsky, Adolfo Martín García, Maria Schuster, Andreas K. Maier, Elmar Nöth, Juan Rafael Orozco-Arroyave:
Alzheimer's Detection from English to Spanish Using Acoustic and Linguistic Embeddings. 2483-2487
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuZHS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuZHS22
Jing Su, Longxiang Zhang, Hamid Reza Hassanzadeh, Thomas Schaaf:
Extract and Abstract with BART for Clinical Notes from Doctor-Patient Conversations. 2488-2492
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LamichhaneMPS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LamichhaneMPS22
Bishal Lamichhane, Nidal Moukaddam, Ankit B. Patel, Ashutosh Sabharwal:
Dyadic Interaction Assessment from Free-living Audio for Depression Severity Assessment. 2493-2497
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NallanthighalHS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NallanthighalHS22
Venkata Srikanth Nallanthighal, Aki Härmä, Helmer Strik:
COVID-19 detection based on respiratory sensing from speech. 2498-2502

Dereverberation and Echo Cancellation

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuoZLKL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuoZLKL22
Xiaoxue Luo, Chengshi Zheng, Andong Li, Yuxuan Ke, Xiaodong Li:
Bifurcation and Reunion: A Loss-Guided Two-Stage Approach for Monaural Speech Dereverberation. 2503-2507
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChengZLWPL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChengZLWPL22
Linjuan Cheng, Chengshi Zheng, Andong Li, Yuquan Wu, Renhua Peng, Xiaodong Li:
A deep complex multi-frame filtering network for stereophonic acoustic echo cancellation. 2508-2512
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HanTYLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HanTYLL22
Chang Han, Weiping Tu, Yuhong Yang, Jingyi Li, Xinhong Li:
Speaker- and Phone-aware Convolutional Transformer Network for Acoustic Echo Cancellation. 2513-2517
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangWJFNFX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangWJFNFX22
Shimin Zhang, Ziteng Wang, Yukai Ju, Yihui Fu, Yueyue Na, Qiang Fu, Lei Xie:
Personalized Acoustic Echo Cancellation for Full-duplex Communications. 2518-2522
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangL022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangL022
Chenggang Zhang, Jinjiang Liu, Xueliang Zhang:
LCSM: A Lightweight Complex Spectral Mapping Framework for Stereophonic Acoustic Echo Cancellation. 2523-2527
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kothapally00Z022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kothapally00Z022
Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu:
Joint Neural AEC and Beamforming with Double-Talk Detection. 2528-2532
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HelwaniSGK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HelwaniSGK22
Karim Helwani, Erfan Soltanmohammadi, Michael Mark Goodwin, Arvindh Krishnaswamy:
Clock Skew Robust Acoustic Echo Cancellation. 2533-2537
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PanchapagesanNS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PanchapagesanNS22
Sankaran Panchapagesan, Arun Narayanan, Turaj Zakizadeh Shabestary, Shuai Shao, Nathan Howard, Alex Park, James Walker, Alexander Gruenstein:
A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy. 2538-2542
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KothapallyH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KothapallyH22
Vinay Kothapally, John H. L. Hansen:
Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation. 2543-2547

Voice Conversion and Adaptation III

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XueYH0022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XueYH0022
Liumeng Xue, Shan Yang, Na Hu, Dan Su, Lei Xie:
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers. 2548-2552
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangTZ0SWCTZWM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangTZ0SWCTZWM22
Sicheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu, Aolan Sun, Jianzong Wang, Ning Cheng, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng:
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion. 2553-2557
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangXLLMX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangXLLMX22
Jiahong Huang, Wen Xu, Yule Li, Junshi Liu, Dongpeng Ma, Wei Xiang:
FlowCPCVC: A Contrastive Predictive Coding Supervised Flow Framework for Any-to-Any Voice Conversion. 2558-2562
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeiYC0022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeiYC0022
Yi Lei, Shan Yang, Jian Cong, Lei Xie, Dan Su:
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion. 2563-2567
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Wu00HZSQL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Wu00HZSQL22
Yihan Wu, Xu Tan, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu:
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios. 2568-2572
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouSLZ0B0M22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouSLZ0B0M22
Yixuan Zhou, Changhe Song, Xiang Li, Luwen Zhang, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng:
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis. 2573-2577
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangDYZX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangDYZX22
Haoquan Yang, Liqun Deng, Yu Ting Yeung, Nianzu Zheng, Yong Xu:
Streamable Speech Representation Disentanglement and Multi-Level Prosody Modeling for Live One-Shot Voice Conversion. 2578-2582
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NguyenPW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenPW22
Tuan-Nam Nguyen, Ngoc-Quan Pham, Alexander Waibel:
Accent Conversion using Pre-trained Model and Synthesized Data from Voice Conversion. 2583-2587
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RijnMSDSHAJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RijnMSDSHAJ22
Pol van Rijn, Silvan Mertes, Dominik Schiller, Piotr Dura, Hubert Siuzdak, Peter M. C. Harrison, Elisabeth André, Nori Jacoby:
VoiceMe: Personalized voice generation in TTS. 2588-2592
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YuanWLK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YuanWLK22
Ruibin Yuan, Yuxuan Wu, Jacob Li, Jaxter Kim:
DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion. 2593-2597
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LianZA022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LianZA022
Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu:
Towards Improved Zero-shot Voice Conversion with Conditional DSVAE. 2598-2602
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuSZ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuSZ022
Zongyang Du, Berrak Sisman, Kun Zhou, Haizhou Li:
Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion. 2603-2607

Novel Models and Training Methods for ASR III

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MengGK0CW022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MengGK0CW022
Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li, Xie Chen, Yu Wu, Yifan Gong:
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition. 2608-2612
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuZZ0WFY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuZZ0WFY22
Ye-Qian Du, Jie Zhang, Qiu-Shi Zhu, Lirong Dai, Ming-Hui Wu, Xin Fang, Zhou-Wang Yang:
A Complementary Joint Training Approach Using Unpaired Speech and Text A Complementary Joint Training Approach Using Unpaired Speech and Text. 2613-2617
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0005ZQ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0005ZQ22
Xun Gong, Zhikai Zhou, Yanmin Qian:
Knowledge Transfer and Distillation from Autoregressive to Non-Autoregessive Speech Recognition. 2618-2622
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DengXWCXJGLLM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DengXWCXJGLLM22
Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng:
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition. 2623-2627
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuWCZ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuWCZ022
Han Zhu, Jindong Wang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan:
Decoupled Federated Learning for ASR with Non-IID Data. 2628-2632
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TianDLYCL022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TianDLYCL022
Sanli Tian, Keqi Deng, Zehan Li, Lingxuan Ye, Gaofeng Cheng, Ta Li, Yonghong Yan:
Knowledge Distillation For CTC-based Speech Recognition Via Consistent Acoustic Representation Learning. 2633-2637
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CuiSNSFKK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CuiSNSFKK22
Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata:
Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing. 2638-2642
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWWC00W22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWWC00W22
Chengyi Wang, Yiming Wang, Yu Wu, Sanyuan Chen, Jinyu Li, Shujie Liu, Furu Wei:
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training. 2643-2647
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Ren00ZW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Ren00ZW22
Shuo Ren, Shujie Liu, Yu Wu, Long Zhou, Furu Wei:
Speech Pre-training with Acoustic Piece. 2648-2652
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangCXZMS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangCXZMS22
Bowen Zhang, Songjun Cao, Xiaoming Zhang, Yike Zhang, Long Ma, Takahiro Shinozaki:
Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training. 2653-2657
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AoZZ00K00QW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AoZZ00K00QW22
Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, Jinyu Li, Yao Qian, Furu Wei:
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data. 2658-2662
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SawhneyTSMSM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SawhneyTSMSM22
Ramit Sawhney, Megh Thakkar, Vishwa Shah, Puneet Mathur, Vasu Sharma, Dinesh Manocha:
PISA: PoIncaré Saliency-Aware Interpolative Augmentation. 2663-2667
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangL022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangL022
Muqiao Yang, Ian R. Lane, Shinji Watanabe:
Online Continual Learning of End-to-End Speech Recognition Models. 2668-2672
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MoriyaSODS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MoriyaSODS22
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takahiro Shinozaki:
Streaming Target-Speaker ASR with Neural Transducer. 2673-2677
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JainSMJ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JainSMJ022
Arjit Jain, Pranay Reddy Samala, Deepak Mittal, Preethi Jyothi, Maneesh Singh:
SPLICEOUT: A Simple and Efficient Audio Augmentation Method. 2678-2682

Spoken Language Modeling and Understanding

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunderF0KK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunderF0KK22
Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang Kuo, Brian Kingsbury:
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems. 2683-2687
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OhsugiSNY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OhsugiSNY22
Yasuhito Ohsugi, Itsumi Saito, Kyosuke Nishida, Sen Yoshida:
Japanese ASR-Robust Pre-trained Language Model with Pseudo-Error Sentences Generated by Grapheme-Phoneme Conversion. 2688-2692
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DongFZLW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DongFZLW22
Jingjing Dong, Jiayi Fu, Peng Zhou, Hao Li, Xiaorui Wang:
Improving Spoken Language Understanding with Cross-Modal Contrastive Learning. 2693-2697
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AvilaBYLXC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AvilaBYLXC22
Anderson R. Avila, Khalil Bibi, Rui Heng Yang, Xinlin Li, Chao Xing, Xiao Chen:
Low-bit Shift Network for End-to-End Spoken Language Understanding. 2698-2702
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoFDZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoFDZ22
Yingying Gao, Junlan Feng, Chao Deng, Shilei Zhang:
Meta Auxiliary Learning for Low-resource Spoken Language Understanding. 2703-2707
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLWXW022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLWXW022
Ye Wang, Baishun Ling, Yanmeng Wang, Junhao Xue, Shaojun Wang, Jing Xiao:
Adversarial Knowledge Distillation For Robust Spoken Language Understanding. 2708-2712
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Ou0ZGM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Ou0ZGM22
Yangyang Ou, Peng Zhang, Jing Zhang, Hui Gao, Xing Ma:
Incorporating Dual-Aware with Hierarchical Interactive Memory Networks for Task-Oriented Dialogue. 2713-2717
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiZLWWZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiZLWWZ22
Yuntao Li, Hanchu Zhang, Yutian Li, Sirui Wang, Wei Wu, Yan Zhang:
Pay More Attention to History: A Context Modeling Strategy for Conversational Text-to-SQL. 2718-2722
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiXHSZJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiXHSZJ22
Yuntao Li, Can Xu, Huang Hu, Lei Sha, Yan Zhang, Daxin Jiang:
Small Changes Make Big Differences: Improving Multi-turn Response Selection in Dialogue Systems via Fine-Grained Contrastive Learning. 2723-2727
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DinarelliNP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DinarelliNP22
Marco Dinarelli, Marco Naguib, François Portet:
Toward Low-Cost End-to-End Spoken Language Understanding. 2728-2732
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KapelonisGP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KapelonisGP22
Eleftherios Kapelonis, Efthymios Georgiou, Alexandros Potamianos:
A Multi-Task BERT Model for Schema-Guided Dialogue State Tracking. 2733-2737
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoNQZCH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoNQZCH22
Heting Gao, Junrui Ni, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson:
WavPrompt: Towards Few-Shot Spoken Language Understanding with Frozen Language Models. 2738-2742
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OgushiOTIFNM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OgushiOTIFNM22
Asahi Ogushi, Toshiki Onishi, Yohei Tahara, Ryo Ishii, Atsushi Fukayama, Takao Nakamura, Akihiro Miyata:
Analysis of praising skills focusing on utterance contents. 2743-2747
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangSZYWLYJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangSZYWLYJ22
Pengwei Wang, Yinpei Su, Xiaohuan Zhou, Xin Ye, Liangchen Wei, Ming Liu, Yuan You, Feijun Jiang:
Speech2Slot: A Limited Generation Framework with Boundary Detection for Slot Filling from Speech. 2748-2752

Acoustic Signal Representation and Analysis I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoutiniSEW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoutiniSEW22
Khaled Koutini, Jan Schlüter, Hamid Eghbal-zadeh, Gerhard Widmer:
Efficient Training of Audio Transformers with Patchout. 2753-2757
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SharmaGQHH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SharmaGQHH22
Mayank Sharma, Tarun Gupta, Kenny Qiu, Xiang Hao, Raffay Hamid:
CNN-based Audio Event Recognition for Automated Violence Classification and Rating for Prime Video Content. 2758-2762
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NamKKP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NamKKP22
Hyeonuk Nam, Seong-Hu Kim, Byeong-Yun Ko, Yong-Hwa Park:
Frequency Dynamic Convolution: Frequency-Adaptive Pattern Recognition for Sound Event Detection. 2763-2767
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MostaaniM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MostaaniM22
Zohreh Mostaani, Mathew Magimai-Doss:
On Breathing Pattern Information in Synthetic Speech. 2768-2772
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenHHZQC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenHHZQC22
Chen Chen, Nana Hou, Yuchen Hu, Heqing Zou, Xiaofeng Qi, Eng Siong Chng:
Interactive Auido-text Representation for Automated Audio Captioning with Contrastive Learning. 2773-2777
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YamamotoNT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YamamotoNT22
Yuya Yamamoto, Juhan Nam, Hiroko Terasawa:
Deformable CNN and Imbalance-Aware Feature Learning for Singing Technique Classification. 2778-2782

Privacy and Security in Speech Communication

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MullerCDFB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MullerCDFB22
Nicolas M. Müller, Pavel Czempin, Franziska Dieckmann, Adam Froghyar, Konstantin Böttinger:
Does Audio Deepfake Detection Generalize? 2783-2787
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MullerDW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MullerDW22
Nicolas M. Müller, Franziska Dieckmann, Jennifer Williams:
Attacker Attribution of Audio Deepfakes. 2788-2792
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChampionLJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChampionLJ22
Pierre Champion, Anthony Larcher, Denis Jouvet:
Are disentangled representations all you need to build speaker anonymization systems? 2793-2797
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TeixeiraART22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TeixeiraART22
Francisco Teixeira, Alberto Abad, Bhiksha Raj, Isabel Trancoso:
Towards End-to-End Private Automatic Speaker Recognition. 2798-2802
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AmidTNMB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AmidTNMB22
Ehsan Amid, Om Dipakbhai Thakkar, Arun Narayanan, Rajiv Mathews, Françoise Beaufays:
Extracting Targeted Training Data from ASR Models, and How to Mitigate It. 2803-2807
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangCTM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangCTM22
W. Ronny Huang, Steve Chien, Om Dipakbhai Thakkar, Rajiv Mathews:
Detecting Unintended Memorization in Language-Model-Fused ASR. 2808-2812

Multimodal Systems

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TaniguchiKTY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TaniguchiKTY22
Shuta Taniguchi, Tsuneo Kato, Akihiro Tamura, Keiji Yasuda:
Transformer-Based Automatic Speech Recognition with Auxiliary Input of Source Language Text Toward Transcribing Simultaneous Interpretation. 2813-2817
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GabeurSN0AS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GabeurSN0AS22
Valentin Gabeur, Paul Hongsuck Seo, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid:
AVATAR: Unconstrained Audiovisual Speech Recognition. 2818-2822
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengH22
Puyuan Peng, David Harwath:
Word Discovery in Visually Grounded, Self-Supervised Speech Models. 2823-2827
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RoseS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RoseS22
Richard Rose, Olivier Siohan:
End-to-End multi-talker audio-visual ASR using an active speaker attention module. 2828-2832
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SerdyukBS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SerdyukBS22
Dmitriy Serdyuk, Otavio Braga, Olivier Siohan:
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Muti-Person Video. 2833-2837
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HongKYR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HongKYR22
Joanna Hong, Minsu Kim, Daehun Yoo, Yong Man Ro:
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition. 2838-2842

Atypical Speech Detection

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HarvillHY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HarvillHY22
John B. Harvill, Mark Hasegawa-Johnson, Chang D. Yoo:
Frame-Level Stutter Detection. 2843-2847
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PriyasadPSKFDFT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PriyasadPSKFDFT22
Darshana Priyasad, Andi Partovi, Sridha Sridharan, Maryam Kashefpoor, Tharindu Fernando, Simon Denman, Clinton Fookes, Jia Tang, David Kaye:
Detecting Heart Failure Through Voice Analysis using Self-Supervised Mode-Based Memory Fusion. 2848-2852
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NgNWL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NgNWL22
Si Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee:
Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations. 2853-2857
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WoszczykHADS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WoszczykHADS22
Dominika Woszczyk, Anna Hlédiková, Alican Akman, Soteris Demetriou, Björn W. Schuller:
Data Augmentation for Dementia Detection in Spoken Language. 2858-2862
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuttaBGPM022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuttaBGPM022
Debottam Dutta, Debarpan Bhattacharya, Sriram Ganapathy, Amir Hossein Poorjam, Deepak Mittal, Maneesh Singh:
Acoustic Representation Learning on Breathing and Speech Signals for COVID-19 Detection. 2863-2867
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BayerlWNR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BayerlWNR22
Sebastian Peter Bayerl, Dominik Wagner, Elmar Nöth, Korbinian Riedhammer:
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0. 2868-2872

Spoofing-Aware Automatic Speaker Verification (SASV) I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiYJC22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiYJC22a
Jeong-Hwan Choi, Joon-Young Yang, Ye-Rin Jeoung, Joon-Hyuk Chang:
HYU Submission for the SASV Challenge 2022: Reforming Speaker Embeddings with Spoofing-Aware Conditioning. 2873-2877
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeoKS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeoKS22
Jungwoo Heo, Ju-Ho Kim, Hyun-seo Shin:
Two Methods for Spoofing-Aware Speaker Verification: Multi-Layer Perceptron Score Fusion Model and Integrated Embedding Projector. 2878-2882
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZengZLY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZengZLY22
Chang Zeng, Lin Zhang, Meng Liu, Junichi Yamagishi:
Spoofing-Aware Attention based ASV Back-end with Multiple Enrollment Utterances and a Sampling Strategy for the SASV Challenge 2022. 2883-2887
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AleninTOMY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AleninTOMY22
Alexander Alenin, Nikita Torgashov, Anton Okhotnikov, Rostislav Makarov, Ivan Yakovlev:
A Subnetwork Approach for Spoofing Aware Speaker Verification. 2888-2892
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JungTSHLCYEK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JungTSHLCYEK22
Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas W. D. Evans, Tomi Kinnunen:
SASV 2022: The First Spoofing-Aware Speaker Verification Challenge. 2893-2897
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeKKL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeKKL22
Jin Woo Lee, Eungbeom Kim, Junghyun Koo, Kyogu Lee:
Representation Selective Self-distillation and wav2vec 2.0 Feature Exploration for Spoof-aware Speaker Verification. 2898-2902

Single-channel and multi-channel Speech Enhancement

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WesthausenM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WesthausenM22
Nils L. Westhausen, Bernd T. Meyer:
tPLCnet: Real-time Deep Packet Loss Concealment in the Time Domain Using a Short Temporal Context. 2903-2907
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TeschMG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TeschMG22
Kristina Tesch, Nils-Hendrik Mohrmann, Timo Gerkmann:
On the Role of Spatial, Spectral, and Temporal Processing for DNN-based Non-linear Multi-channel Speech Enhancement. 2908-2912
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiY22
Haoyu Li, Junichi Yamagishi:
DDS: A new device-degraded speech dataset for speech enhancement. 2913-2917
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuNSB0Y22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuNSB0Y22
Yicheng Du, Aditya Arie Nugraha, Kouhei Sekiguchi, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii:
Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments. 2918-2922
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BartolewskaKK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BartolewskaKK22
Julitta Bartolewska, Stanislaw Kacprzak, Konrad Kowalczyk:
Refining DNN-based Mask Estimation using CGMM-based EM Algorithm for Multi-channel Noise Reduction. 2923-2927
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WelkerRG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WelkerRG22
Simon Welker, Julius Richter, Timo Gerkmann:
Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain. 2928-2932
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AliBF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AliBF22
Mohamed Nabih Ali, Alessio Brutti, Daniele Falavigna:
Enhancing Embeddings for Speech Classification in Noisy Conditions. 2933-2937
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TuretzkyMAP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TuretzkyMAP22
Arnon Turetzky, Tzvi Michelson, Yossi Adi, Shmuel Peleg:
Deep Audio Waveform Prior. 2938-2942
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FrasWK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FrasWK22
Mieszko Fras, Marcin Witkowski, Konrad Kowalczyk:
Convolutive Weighted Multichannel Wiener Filter Front-end for Distant Automatic Speech Recognition in Reverberant Multispeaker Scenarios. 2943-2947
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OliveiraPG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OliveiraPG22
Danilo de Oliveira, Tal Peer, Timo Gerkmann:
Efficient Transformer-based Speech Enhancement Using Long Frames and STFT Magnitudes. 2948-2952
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangKB00R22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangKB00R22
Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, Bhiksha Raj:
Improving Speech Enhancement through Fine-Grained Speech Characteristics. 2953-2957

Voice Conversion and Adaptation II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BilinskiMEPCYBK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BilinskiMEPCYBK22
Piotr Bilinski, Thomas Merritt, Abdelhamid Ezzerg, Kamil Pokora, Sebastian Cygert, Kayoko Yanagisawa, Roberto Barra-Chicote, Daniel Korzekwa:
Creating New Voices using Normalizing Flows. 2958-2962
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SanchezFZAY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SanchezFZAY22
Ariadna Sánchez, Alessio Falai, Ziyao Zhang, Orazio Angelini, Kayoko Yanagisawa:
Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS). 2963-2967
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UdagawaSS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UdagawaSS22
Kenta Udagawa, Yuki Saito, Hiroshi Saruwatari:
Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS. 2968-2972
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ProszewskaBSMEB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ProszewskaBSMEB22
Magdalena Proszewska, Grzegorz Beringer, Daniel Sáez-Trigueros, Thomas Merritt, Abdelhamid Ezzerg, Roberto Barra-Chicote:
GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion. 2973-2977
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeC22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeC22a
Jaeuk Lee, Joon-Hyuk Chang:
One-Shot Speaker Adaptation Based on Initialization by Generative Adversarial Networks for TTS. 2978-2982
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LevkovitchNW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LevkovitchNW22
Alon Levkovitch, Eliya Nachmani, Lior Wolf:
Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models. 2983-2987
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeC22b
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeC22b
Jaeuk Lee, Joon-Hyuk Chang:
Advanced Speaker Embedding with Predictive Variance of Gaussian Distribution for Speaker Adaptation in TTS. 2988-2992
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KakoulidisEVMSJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KakoulidisEVMSJ22
Panagiotis Kakoulidis, Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, June Sig Sung, Gunu Jho, Pirros Tsiakoulis, Aimilios Chalamandaris:
Karaoker: Alignment-free singing voice synthesis with speech training data. 2993-2997
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UmCK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UmCK22
Ji Sub Um, Yeunju Choi, Hoi Rin Kim:
ACNN-VC: Utilizing Adaptive Convolution Neural Network for One-Shot Voice Conversion. 2998-3002
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SadekovaGVPKW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SadekovaGVPKW22
Tasnima Sadekova, Vladimir Gogoryan, Ivan Vovk, Vadim Popov, Mikhail A. Kudinov, Jiansheng Wei:
A Unified System for Voice Cloning and Voice Conversion through Diffusion Probabilistic Modeling. 3003-3007
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimKL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimKL22
Tae-Woo Kim, Min-Su Kang, Gyeong-Hoon Lee:
Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis. 3008-3012
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AgarwalTG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AgarwalTG22
Shrutina Agarwal, Naoya Takahashi, Sriram Ganapathy:
Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer. 3013-3017
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TerashimaYSSYKT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TerashimaYSSYKT22
Ryo Terashima, Ryuichi Yamamoto, Eunwoo Song, Yuma Shirahata, Hyun-Wook Yoon, Jae-Min Kim, Kentaro Tachibana:
Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation. 3018-3022

Resource-constrained ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangL022a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangL022a
Qu Yang, Qi Liu, Haizhou Li:
Deep residual spiking neural network for keyword spotting in low-resource settings. 3023-3027
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaskarRRZS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaskarRRZS22
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang, Nicolás Serrano:
Reducing Domain mismatch in Self-supervised speech pre-training. 3028-3032
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhenNCSMAR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhenNCSMAR22
Kai Zhen, Hieu Duy Nguyen, Raviteja Chinta, Nathan Susanj, Athanasios Mouchtaris, Tariq Afzal, Ariya Rastrow:
Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition. 3033-3037
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimLMC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimLMC22
Dong-Hyun Kim, Jae-Hong Lee, Ji-Hwan Mo, Joon-Hyuk Chang:
W2V2-Light: A Lightweight Version of Wav2vec 2.0 for Automatic Speech Recognition. 3038-3042
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XieMRCKRMS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XieMRCKRMS22
Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian John King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel:
Compute Cost Amortized Transformer for Streaming ASR. 3043-3047
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VyasHAB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VyasHAB22
Apoorv Vyas, Wei-Ning Hsu, Michael Auli, Alexei Baevski:
On-demand compute reduction with stochastic wav2vec 2.0. 3048-3052
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VanderreydtRD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VanderreydtRD22
Geoffroy Vanderreydt, François Remy, Kris Demuynck:
Transfer Learning from Multi-Lingual Speech Translation Benefits Low-Resource Speech Recognition. 3053-3057
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenXH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenXH22
Szu-Jui Chen, Jiamin Xie, John H. L. Hansen:
FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition. 3058-3062

Speech Production, Perception and Multimodality

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KitamuraKKA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KitamuraKKA22
Tatsuya Kitamura, Naoki Kunimoto, Hideki Kawahara, Shigeaki Amano:
Perceptual Evaluation of Penetrating Voices through a Semantic Differential Method. 3063-3067
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TsukadaY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TsukadaY22
Kimiko Tsukada, Yurong Yurong:
Non-native Perception of Japanese Singleton/Geminate Contrasts: Comparison of Mandarin and Mongolian Speakers Differing in Japanese Experience. 3068-3072
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OBrienMG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OBrienMG22
Benjamin O'Brien, Christine Meunier, Alain Ghio:
Evaluating the effects of modified speech on perceptual speaker identification performance. 3073-3077
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangCLTCC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangCLTCC22
Yuhong Yang, Xufeng Chen, Qingmu Liu, Weiping Tu, Hongyang Chen, Linjun Cai:
Mandarin Lombard Grid: a Lombard-grid-like corpus of Standard Chinese. 3078-3082
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AraiYO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AraiYO22
Takayuki Arai, Miho Yamada, Megumi Okusawa:
Syllable sequence of /a/+/ta/ can be heard as /atta/ in Japanese with visual or tactile cues. 3083-3087
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Chen022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Chen022
Yu-Wen Chen, Yu Tsao:
InQSS: a speech intelligibility and quality assessment model using a multi-task learning network. 3088-3092
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiseL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiseL22
Andreas Weise, Rivka Levitan:
Investigating the influence of personality on acoustic-prosodic entrainment. 3093-3097
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiJ22
Yi Li, Xiaoming Jiang:
Common and differential acoustic representation of interpersonal and tactile iconic perception of Mandarin vowels. 3098-3102
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EranovicPSSM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EranovicPSSM22
Jovan Eranovic, Daniel Pape, Magda Stroinska, Elisabet Service, Marijana Matkovski:
Effects of Noise on Speech Perception and Spoken Word Comprehension. 3103-3107
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangL22
Sichen Zhang, Aijun Li:
Acquisition of Two Consecutive Neutral Tones in Mandarin-Speaking Preschoolers: Phonological Representation and Phonetic Realization. 3108-3112
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RoyBG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RoyBG22
Anwesha Roy, Varun Belagali, Prasanta Kumar Ghosh:
Air tissue boundary segmentation using regional loss in real-time Magnetic Resonance Imaging video for speech production. 3113-3117
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GibsonSFA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GibsonSFA22
Mark Gibson, Marcel Schlechtweg, Beatriz Blecua Falgueras, Judit Ayala Alcalde:
Language-specific interactions of vowel discrimination in noise. 3118-3122
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AntonyKLVK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AntonyKLVK22
Ansen Antony, Sumanth Reddy Kota, Akhilesh Lade, Spoorthy V, Shashidhar G. Koolagudi:
An Improved Transformer Transducer Architecture for Hindi-English Code Switched Speech Recognition. 3123-3127
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KadandaleMH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KadandaleMH22
Venkatesh Shenoy Kadandale, Juan F. Montesinos, Gloria Haro:
VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices. 3128-3132

Multi-, Cross-lingual and Other Topics in ASR II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DalhouseI22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DalhouseI22
Jovan M. Dalhouse, Katunobu Itou:
Cross-Lingual Transfer Learning Approach to Phoneme Error Detection via Latent Phonetic Representation. 3133-3137
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Fukuda0SKSK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Fukuda0SKSK22
Takashi Fukuda, Samuel Thomas, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury:
Global RNN Transducer Models For Multi-dialect Speech Recognition. 3138-3142
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BernhardSG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BernhardSG22
Vera Bernhard, Sandra Schwab, Jean-Philippe Goldman:
Acoustic Stress Detection in Isolated English Words for Computer-Assisted Pronunciation Training. 3143-3147
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PundakMS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PundakMS22
Golan Pundak, Tsendsuren Munkhdalai, Khe Chai Sim:
On-the-fly ASR Corrections with Audio Exemplars. 3148-3152
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QuanWCWWX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QuanWCWWX22
Zongfeng Quan, Nick J. C. Wang, Wei Chu, Tao Wei, Shaojun Wang, Jing Xiao:
FFM: A Frame Filtering Mechanism To Accelerate Inference Speed For Conformer In Speech Recognition. 3153-3157
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CuiDHXWHGXLM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CuiDHXWHGXLM22
Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng:
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems. 3158-3162
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YeCYYTZ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YeCYYTZ022
Lingxuan Ye, Gaofeng Cheng, Runyan Yang, Zehui Yang, Sanli Tian, Pengyuan Zhang, Yonghong Yan:
Improving Recognition of Out-of-vocabulary Words in E2E Code-switching ASR by Fusing Speech Generation Methods. 3163-3167
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZHPS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZHPS22
Yuanyuan Zhang, Yixuan Zhang, Bence Mark Halpern, Tanvina Patel, Odette Scharenborg:
Mitigating bias against non-native accents. 3168-3172
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiSXWY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiSXWY22
Jin Li, Rongfeng Su, Xurong Xie, Lan Wang, Nan Yan:
A Multi-level Acoustic Feature Extraction Framework for Transformer Based End-to-End Speech Recognition. 3173-3177
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TianYZZ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TianYZZ022
Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Yuexian Zou, Dong Yu:
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR. 3178-3182
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PattanayakP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PattanayakP22
Biswaranjan Pattanayak, Gayadhar Pradhan:
Significance of single frequency filter for the development of children's KWS system. 3183-3187
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiSPCXSCLLHHB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiSPCXSCLLHHB22
Bo Li, Tara N. Sainath, Ruoming Pang, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani:
A Language Agnostic Multilingual Streaming On-Device ASR System. 3188-3192
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Yang0WZ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Yang0WZ022
Zhanheng Yang, Hang Lv, Xiong Wang, Ao Zhang, Lei Xie:
Minimizing Sequential Confusion Error in Speech Command Recognition. 3193-3197
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SchupplerBKP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SchupplerBKP22
Barbara Schuppler, Emil Berger, Xenia Kogler, Franz Pernkopf:
Homophone Disambiguation Profits from Durational Information. 3198-3202
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZuoLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZuoLL22
Chu-Xiao Zuo, Jia-Yi Leng, Wu-Jun Li:
Speaker-Specific Utterance Ensemble based Transfer Attack on Speaker Identification. 3203-3207
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SadhuH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SadhuH22
Samik Sadhu, Hynek Hermansky:
Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech. 3208-3212
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SinghSBP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SinghSBP22
Vishwanath Pratap Singh, Hardik B. Sailor, Supratik Bhattacharya, Abhishek Pandey:
Spectral Modification Based Data Augmentation For Improving End-to-End ASR For Children's Speech. 3213-3217
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MasumuraYMMIUST22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MasumuraYMMIUST22
Ryo Masumura, Yoshihiro Yamazaki, Saki Mizuno, Naoki Makishima, Mana Ihori, Mihiro Uchida, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Shota Orihashi, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando:
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training. 3218-3222
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangLSSMCH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangLSSMCH22
Chao Zhang, Bo Li, Tara N. Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-Yiin Chang, Parisa Haghani:
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification. 3223-3227

Spoken Language Processing III

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoTYL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoTYL22
Zhiyuan Zhao, Chuanxin Tang, Chengdong Yao, Chong Luo:
An Anchor-Free Detector for Continuous Speech Keyword Spotting. 3228-3232
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0022CLQZZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0022CLQZZ22
Xiao Wang, Song Cheng, Jun Li, Shushan Qiao, Yumei Zhou, Yi Zhan:
Low-complex and Highly-performed Binary Residual Neural Network for Small-footprint Keyword Spotting. 3233-3237
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DinkelWYZW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DinkelWYZW22
Heinrich Dinkel, Yongqing Wang, Zhiyong Yan, Junbo Zhang, Yujun Wang:
UniKW-AT: Unified Keyword Spotting and Audio Tagging. 3238-3242
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/009022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/009022
Jun Wang:
ESSumm: Extractive Speech Summarization from Untranscribed Meeting. 3243-3247
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ConneauBZMPLCJR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ConneauBZMPLCJR22
Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan H. Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson:
XTREME-S: Evaluating Cross-lingual Speech Representations. 3248-3252
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuZXLMDL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuZXLMDL22
Junpeng Liu, Yanyan Zou, Yuxuan Xi, Shengjie Li, Mian Ma, Zhuoye Ding, Bo Long:
Negative Guided Abstractive Dialogue Summarization. 3253-3257
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CuiGWGW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CuiGWGW22
Fan Cui, Liyong Guo, Quandong Wang, Peng Gao, Yujun Wang:
Exploring representation learning for small-footprint keyword spotting. 3258-3262
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XueW0PG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XueW0PG22
Jian Xue, Peidong Wang, Jinyu Li, Matt Post, Yashesh Gaur:
Large-Scale Streaming End-to-End Speech Translation with Neural Transducers. 3263-3267
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouBC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouBC22
Xiaozhou Zhou, Ruying Bao, William M. Campbell:
Phonetic Embedding for ASR Robustness in Entity Resolution. 3268-3272
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiSWWD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiSWWD22
Xiao Wei, Yuke Si, Shiquan Wang, Longbiao Wang, Jianwu Dang:
Hierarchical Tagger with Multi-task Learning for Cross-domain Slot Filling. 3273-3277
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuLLZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuLLZ22
Menglong Xu, Shengqiang Li, Chengdong Liang, Xiao-Lei Zhang:
Multi-class AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data. 3278-3282
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MartinekCKLB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MartinekCKLB22
Jirí Martínek, Christophe Cerisara, Pavel Král, Ladislav Lenc, Josef Baloun:
Weak supervision for Question Type Detection with large language models. 3283-3287

Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuWXZLX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuWXZLX22
Miao Liu, Jing Wang, Liang Xu, Jianqian Zhang, Shicong Li, Fei Xiang:
BIT-MI Deep Learning-based Model to Non-intrusive Speech Quality Assessment Challenge in Online Conferencing Applications. 3288-3292
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuX22
Wenjing Liu, Chuan Xie:
MOS Prediction Network for Non-intrusive Speech Quality Assessment in Online Conferencing. 3293-3297
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShuCSZZZHW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShuCSZZZHW22
Xiaofeng Shu, Yanjie Chen, Chuxiang Shang, Yan Zhao, Chengshuai Zhao, Yehang Zhu, Chuanzeng Huang, Yuxuan Wang:
Non-intrusive Speech Quality Assessment with a Multi-Task Learning based Subband Adaptive Attention Temporal Convolutional Neural Network. 3298-3302
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HaoYLDLP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HaoYLDLP22
Junyong Hao, Shunzhou Ye, Cheng Lu, Fei Dong, Jingang Liu, Dong Pi:
Soft-label Learn for No-Intrusive Speech Quality Assessment. 3303-3307
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YiXXN0WMCZW0YS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YiXXN0WMCZW0YS22
Gaoxiong Yi, Wei Xiao, Yiming Xiao, Babak Naderi, Sebastian Möller, Wafaa Wardah, Gabriel Mittag, Ross Cutler, Zhuohuang Zhang, Donald S. Williamson, Fei Chen, Fuzheng Yang, Shidong Shang:
ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications. 3308-3312
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HajalCM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HajalCM22
Karl El Hajal, Milos Cernak, Pablo Mainar:
MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment. 3313-3317
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuYPS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuYPS22
Yuchen Liu, Li-Chia Yang, Alexander Pawlicki, Marko Stamenovic:
CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment. 3318-3322
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenRZZZGY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenRZZZGY22
Lianwu Chen, Xinlei Ren, Xu Zhang, Xiguang Zheng, Chen Zhang, Liang Guo, Bing Yu:
Impairment Representation Learning for Speech Quality Assessment. 3323-3327

Speech and Language in Health: From Remote Monitoring to Medical Conversations III

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWYMHWLM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWYMHWLM22
Yi Wang, Tianzi Wang, Zi Ye, Lingwei Meng, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng:
Exploring linguistic feature and model combination for speech recognition based automatic AD detection. 3328-3332
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangDZYTL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangDZYTL22
Dong Wang, Yanhui Ding, Qing Zhao, Peilin Yang, Shuping Tan, Ya Li:
ECAPA-TDNN Based Depression Detection from Clinical Speech. 3333-3337
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RaviWFA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RaviWFA22
Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan:
A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement. 3338-3342
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RutowskiHSLCO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RutowskiHSLCO22
Tomasz Rutowski, Amir Harati, Elizabeth Shriberg, Yang Lu, Piotr Chlebek, Ricardo Oliveira:
Toward Corpus Size Requirements for Training and Evaluating Depression Risk Models Using Spoken Language. 3343-3347
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AblimitSS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AblimitSS22
Ayimnisagul Ablimit, Karen Scholz, Tanja Schultz:
Deep Learning Approaches for Detecting Alzheimer's Dementia from Conversational Speech of ILSE Study. 3348-3352
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SeneviratneE22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SeneviratneE22
Nadee Seneviratne, Carol Y. Espy-Wilson:
Multimodal Depression Severity Score Prediction Using Articulatory Coordination Features and Hierarchical Attention Based Text Embeddings. 3353-3357
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MeripoK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MeripoK22
Nimshi Venkat Meripo, Sandeep Konam:
ASR Error Detection via Audio-Transcript entailment. 3358-3362

Speech Synthesis: Prosody Modeling

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KarlapatiKLAMML22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KarlapatiKLAMML22
Sri Karlapati, Penny Karanasou, Mateusz Lajszczak, Syed Ammar Abbas, Alexis Moinet, Peter Makarov, Ray Li, Arent van Korlaar, Simon Slangen, Thomas Drugman:
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer. 3363-3367
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MakarovALJKMDK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MakarovALJKMDK22
Peter Makarov, Syed Ammar Abbas, Mateusz Lajszczak, Arnaud Joly, Sri Karlapati, Alexis Moinet, Thomas Drugman, Penny Karanasou:
Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody. 3368-3372
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NishimuraSTTS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NishimuraSTTS22
Yuto Nishimura, Yuki Saito, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari:
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History. 3373-3377
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SeshadriRCL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SeshadriRCL22
Shreyas Seshadri, Tuomo Raitio, Dan Castellani, Jiangchuan Li:
Emphasis Control for Parallel Neural TTS. 3378-3382
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/StephensonBGH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/StephensonBGH22
Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber:
BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model. 3383-3387
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OMahonyLK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OMahonyLK22
Johannah O'Mahony, Catherine Lai, Simon King:
Combining conversational speech with read speech to improve prosody in Text-to-Speech synthesis. 3388-3392

Self-supervised, Semi-supervised, Adaptation and Data Augmentation for ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuWZHCH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuWZHCH22
Zhiyun Lu, Yongqiang Wang, Yu Zhang, Wei Han, Zhehuai Chen, Parisa Haghani:
Unsupervised Data Selection via Discrete Speech Representation for ASR. 3393-3397
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeLCCSL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeLCCSL22
Jae-Hong Lee, Chae Won Lee, Jin-Seong Choi, Joon-Hyuk Chang, Woo Kyeong Seong, Jeonghan Lee:
CTRL: Continual Representation Learning to Transfer Information of Pre-trained for WAV2VEC 2.0. 3398-3402
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaskarHNDPBC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaskarHNDPBC22
Murali Karthick Baskar, Tim Herzig, Diana Nguyen, Mireia Díez, Tim Polzehl, Lukás Burget, Jan Cernocký:
Speaker adaptation for Wav2vec2 based dysarthric ASR. 3403-3407
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangRRBEHM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangRRBEHM22
Gary Wang, Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Jesse Emond, Yinghui Huang, Pedro J. Moreno:
Non-Parallel Voice Conversion for ASR Augmentation. 3408-3412
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QiNS022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QiNS022
Heli Qi, Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura:
Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing. 3413-3417
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ArunkumarU22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ArunkumarU22
A. Arunkumar, Srinivasan Umesh:
Joint Encoder-Decoder Self-Supervised Pre-training for ASR. 3418-3422

Phonetics and Phonology

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zellers22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zellers22
Margaret Zellers:
An overview of discourse clicks in Central Swedish. 3423-3427
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NoguchiMWHHMK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NoguchiMWHHMK22
Hiroto Noguchi, Sanae Matsui, Naoya Watabe, Chuyu Huang, Ayako Hashimoto, Ai Mizoguchi, Mafuyu Kitahara:
VOT and F0 perturbations for the realization of voicing contrast in Tohoku Japanese. 3428-3432
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RidouaneB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RidouaneB22
Rachid Ridouane, Philipp Buech:
Complex sounds and cross-language influence: The case of ejectives in Omani Mehri. 3433-3437
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HutinALV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HutinALV22
Mathilde Hutin, Martine Adda-Decker, Lori Lamel, Ioana Vasilescu:
When Phonetics Meets Morphology: Intervocalic Voicing Within and Across Words in Romance Languages. 3438-3442
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuangCRLD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuangCRLD22
Jianjing Kuang, May Pik Yu Chan, Nari Rhee, Mark Y. Liberman, Hongwei Ding:
The mapping between syntactic and prosodic phrasing in English and Mandarin. 3443-3447
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BuechRH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BuechRH22
Philipp Buech, Rachid Ridouane, Anne Hermes:
Pharyngealization in Amazigh: Acoustic and articulatory marking over time. 3448-3452

Spoken Language Understanding II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PelloinDHFCLB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PelloinDHFCLB22
Valentin Pelloin, Franck Dary, Nicolas Hervé, Benoît Favre, Nathalie Camelin, Antoine Laurent, Laurent Besacier:
ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks. 3453-3457
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangC22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangC22a
Ya-Hsin Chang, Yun-Nung Chen:
Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding. 3458-3462
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KumarSIVNVDG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KumarSIVNVDG22
Anoop Kumar, Pankaj Kumar Sharma, Aravind Illa, Sriram Venkatapathy, Subhrangshu Nandi, Pritam Varma, Anurag Dwarakanath, Aram Galstyan:
Learning Under Label Noise for Robust Spoken Language Understanding systems. 3463-3467
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeSTKLKS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeSTKLKS22
Duc Le, Akshat Shrivastava, Paden D. Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer:
Deliberation Model for On-Device Spoken Language Understanding. 3468-3472
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YadavGRBS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YadavGRBS22
Hemant Yadav, Akshat Gupta, Sai Krishna Rallabandi, Alan W. Black, Rajiv Ratn Shah:
Intent classification using pre-trained language agnostic embeddings for low resource languages. 3473-3477
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AroraDCYB022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AroraDCYB022
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe:
Two-Pass Low Latency End-to-End Spoken Language Understanding. 3478-3482

Speech Intelligibility Prediction for Hearing-Impaired Listeners I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CloseHGH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CloseHGH22
George Close, Samuel Hollands, Stefan Goetze, Thomas Hain:
Non-intrusive Speech Intelligibility Metric Prediction for Hearing Impaired Individuals. 3483-3487
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Tu0B22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Tu0B22
Zehai Tu, Ning Ma, Jon Barker:
Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners. 3488-3492
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Tu0B22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Tu0B22a
Zehai Tu, Ning Ma, Jon Barker:
Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction. 3493-3497
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RossbachHRHBBMR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RossbachHRHBBMR22
Jana Roßbach, Rainer Huber, Saskia Röttges, Christopher F. Hauth, Thomas Biberger, Thomas Brand, Bernd T. Meyer, Jan Rennies:
Speech Intelligibility Prediction for Hearing-Impaired Listeners with the LEAP Model. 3498-3502
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CardinaleN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CardinaleN22
Franklin Alvarez Cardinale, Waldo Nogueira:
Predicting Speech Intelligibility using the Spike Acativity Mutual Information Index. 3503-3507
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BarkerACCFGGHNP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BarkerACCFGGHNP22
Jon Barker, Michael Akeroyd, Trevor J. Cox, John F. Culling, Jennifer Firth, Simone Graetzer, Holly Griffiths, Lara Harris, Graham Naylor, Zuzanna Podwinska, Eszter Porter, Rhoddy Viveros Muñoz:
The 1st Clarity Prediction Challenge: A machine learning challenge for hearing aid intelligibility prediction. 3508-3512

Low-Resource ASR Development I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaasK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaasK22
Matthew Baas, Herman Kamper:
Voice Conversion Can Improve ASR in Very Low-Resource Settings. 3513-3517
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZevallosBCFL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZevallosBCFL22
Rodolfo Zevallos, Núria Bel, Guillermo Cámbara, Mireia Farrús, Jordi Luque:
Data Augmentation for Low-Resource Quechua ASR Improvement. 3518-3522
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FatehiTK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FatehiTK22
Kavan Fatehi, Mercedes Torres Torres, Ayse Küçükyilmaz:
ScoutWav: Two-Step Fine-Tuning on Self-Supervised Automatic Speech Recognition for Low-Resource Environments. 3523-3527
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BandarupalliRSO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BandarupalliRSO22
Tarun Sai Bandarupalli, Shakti Rath, Nirmesh Shah, Naoyuki Onoe, Sriram Ganapathy:
Semi-supervised Acoustic and Language Modeling for Hindi ASR. 3528-3532
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BerrebbiSYLA022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BerrebbiSYLA022
Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan D. Amith, Shinji Watanabe:
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation. 3533-3537
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RobinsonOGM022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RobinsonOGM022
Nathaniel Romney Robinson, Perez Ogayo, Swetha R. Gangu, David R. Mortensen, Shinji Watanabe:
When Is TTS Augmentation Through a Pivot Language Useful? 3538-3542
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RouheV0K22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RouheV0K22
Aku Rouhe, Anja Virkkunen, Juho Leinonen, Mikko Kurimo:
Low Resource Comparison of Attention-based and Hybrid ASR Exploiting wav2vec 2.0. 3543-3547
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BhanushaliBGGKK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BhanushaliBGGKK22
Anish Bhanushali, Grant Bridgman, Deekshitha G, Prasanta Kumar Ghosh, Pratik Kumar, Saurabh Kumar, Adithya Raj Kolladath, Nithya Ravi, Aaditeshwar Seth, Ashish Seth, Abhayjeet Singh, Vrunda N. Sukhadia, Srinivasan Umesh, Sathvik Udupa, Lodagala V. S. V. Durga Prasad:
Gram Vaani ASR Challenge on spontaneous telephone speech recordings in regional variations of Hindi. 3548-3552

Speech representation I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ManochaJF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ManochaJF22
Pranay Manocha, Zeyu Jin, Adam Finkelstein:
Audio Similarity is Unreliable as a Proxy for Audio Quality. 3553-3557
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiKO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiKO22
Sunmook Choi, Il-Youp Kwak, Seungsang Oh:
Overlapped Frequency-Distributed Network: Frequency-Aware Voice Spoofing Countermeasure. 3558-3562
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShremKK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShremKK22
Yosi Shrem, Felix Kreuk, Joseph Keshet:
Formant Estimation and Tracking using Probabilistic Heat-Maps. 3563-3567
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EomLUK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EomLUK22
Youngsik Eom, Yeonghyeon Lee, Ji Sub Um, Hoi Rin Kim:
Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck. 3568-3572
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BatraJR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BatraJR22
Mudit D. Batra, M. K. Jayesh, C. S. Ramalingam:
Robust Pitch Estimation Using Multi-Branch CNN-LSTM and 1-Norm LP Residual. 3573-3577
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChernyakSSSC0K22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChernyakSSSC0K22
Bronya Roni Chernyak, Talia Ben Simon, Yael Segal, Jeremy Steffman, Eleanor Chodroff, Jennifer Cole, Joseph Keshet:
DeepFry: Identifying Vocal Fry Using Deep Neural Networks. 3578-3582
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WellsTR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WellsTR22
Dan Wells, Hao Tang, Korin Richmond:
Phonetic Analysis of Self-supervised Representations of English Speech. 3583-3587
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeJGJK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeJGJK22
Yeonghyeon Lee, Kangwook Jang, Jahyun Goo, Youngmoon Jung, Hoi Rin Kim:
FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Models. 3588-3592
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DumpalaSUO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DumpalaSUO22
Sri Harsha Dumpala, Chandramouli Shama Sastry, Rudolf Uher, Sageev Oore:
On Combining Global and Localized Self-Supervised Models of Speech. 3593-3597
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DissanayakeSSWN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DissanayakeSSWN22
Vipula Dissanayake, Sachith Seneviratne, Hussel Suriyaarachchi, Elliott Wen, Suranga Nanayakkara:
Self-supervised Representation Fusion for Speech and Wearable Based Emotion Recognition. 3598-3602
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PeyserHRSPC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PeyserHRSPC22
Cal Peyser, W. Ronny Huang, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho:
Towards Disentangled Speech Representations. 3603-3607

Pathological Speech Assessment

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QuintasMWP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QuintasMWP22
Sebastião Quintas, Julie Mauclair, Virginie Woisard, Julien Pinquier:
Automatic Assessment of Speech Intelligibility using Consonant Similarity for Head and Neck Cancer. 3608-3612
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NeijmanHOPRSB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NeijmanHOPRSB22
Marise Neijman, Femke Hof, Noelle Oosterom, Roland Pfau, Bertus van Rooy, Rob J. J. H. van Son, Michiel W. M. van den Brekel:
Compensation in Verbal and Nonverbal Communication after Total Laryngectomy. 3613-3617
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GetmanAVGKSSS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GetmanAVGKSSS22
Yaroslav Getman, Ragheb Al-Ghezi, Katja Voskoboinik, Tamás Grósz, Mikko Kurimo, Giampiero Salvi, Torbjørn Svendsen, Sofia Strömbergsson:
wav2vec2-based Speech Rating System for Children with Speech Sound Disorder. 3618-3622
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Triantafyllopoulos22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Triantafyllopoulos22a
Andreas Triantafyllopoulos, Markus Fendler, Anton Batliner, Maurice Gerczuk, Shahin Amiriparian, Thomas M. Berghaus, Björn W. Schuller:
Distinguishing between pre- and post-treatment in the speech of patients with chronic obstructive pulmonary disease. 3623-3627
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeKC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeKC22
Seonwoo Lee, Sunhee Kim, Minhwa Chung:
A Study on the Phonetic Inventory Development of Children with Cochlear Implants for 5 Years after Implantation. 3628-3632
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MenezesDWMBSPB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MenezesDWMBSPB22
João Vítor Menezes, Pouriya Amini Digehsara, Christoph Wagner, Marco Mütze, Michael Bärhold, Petr Schaffer, Dirk Plettemeier, Peter Birkholz:
Evaluation of different antenna types and positions in a stepped frequency continuous-wave radar-based silent speech interface. 3633-3637
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AbderrazekFGLMW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AbderrazekFGLMW22
Sondes Abderrazek, Corinne Fredouille, Alain Ghio, Muriel Lalain, Christine Meunier, Virginie Woisard:
Validation of the Neuro-Concept Detector framework for the characterization of speech disorders: A comparative study including Dysarthria and Dysphonia. 3638-3642
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaumannWBB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaumannWBB22
Ilja Baumann, Dominik Wagner, Sebastian P. Bayerl, Tobias Bocklet:
Nonwords Pronunciation Classification in Language Development Tests for Preschool Children. 3643-3647
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BenwayPHSSB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BenwayPHSSB22
Nina Benway, Jonathan L. Preston, Elaine Hitchcock, Asif Salekin, Harshit Sharma, Tara McAllister Byun:
PERCEPT-R: An Open-Access American English Child/Clinical Speech Corpus Specialized for the Audio Classification of /ɹ/. 3648-3652
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CaoTSBISM022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CaoTSBISM022
Beiming Cao, Kristin Teplansky, Nordine Sebkhi, Arpan Bhavsar, Omer T. Inan, Robin Samlan, Ted Mau, Jun Wang:
Data Augmentation for End-to-end Silent Speech Recognition for Laryngectomees. 3653-3657
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KothareNLRBESCH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KothareNLRBESCH22
Hardik Kothare, Michael Neumann, Jackson Liscombe, Oliver Roesler, William Burke, Andrew Exner, Sandy Snyder, Andrew Cornish, Doug Habberstad, David Pautler, David Suendermann-Oeft, Jessica Huber, Vikram Ramanarayanan:
Statistical and clinical utility of multimodal dialogue-based speech and facial metrics for Parkinson's disease assessment. 3658-3662

Show and Tell III

- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/ArcoMBQL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ArcoMBQL22
Leticia Arco, Carlos Mosquera, Fabjola Braho, Yisel Clavel Quintero, Johan Loeckx:
Evaluation of call centre conversations based on a high-level symbolic representation. 3663-3664
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/XuXNGBKPH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuXNGBKPH22
Yi Xu, Anqi Xu, Daniel R. van Niekerk, Branislav Gerazov, Peter Birkholz, Paul Konstantin Krug, Santitham Prom-on, Lorna F. Halliday:
Evoc-Learn - High quality simulation of early vocal learning. 3665-3666
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/SiddarthUG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SiddarthUG22
C. Siddarth, Sathvik Udupa, Prasanta Kumar Ghosh:
Watch Me Speak: 2D Visualization of Human Mouth during Speech. 3667-3668

Speaker and Language Recognition II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LesnichaiaMBL0P22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LesnichaiaMBL0P22
Mariia Lesnichaia, Veranika Mikhailava, Natalia Bogach, Yurii Lezhenin, John Blake, Evgeny Pyshkin:
Classification of Accented English Using CNN Model Trained on Amplitude Mel-Spectrograms. 3669-3673
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KangAF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KangAF22
Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan:
MIM-DG: Mutual information minimization-based domain generalization for speaker verification. 3674-3678
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiangCYZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiangCYZ22
Chengdong Liang, Yijiang Chen, Jiadi Yao, Xiao-Lei Zhang:
Multi-Channel Far-Field Speaker Verification with Large-Scale Ad-hoc Microphone Arrays. 3679-3683
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LyuWZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LyuWZ22
Anqi Lyu, Zhiming Wang, Huijia Zhu:
Ant Multilingual Recognition System for OLR 2021 Challenge. 3684-3688
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Hu000L22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Hu000L22
Hang-Rui Hu, Yan Song, Li-Rong Dai, Ian McLoughlin, Lin Liu:
Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification. 3689-3693
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiLL22
Jingyu Li, Wei Liu, Tan Lee:
EDITnet: A Lightweight Network for Unsupervised Domain Adaptation in Speaker Verification. 3694-3698
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Chen0000WL00YW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Chen0000WL00YW22
Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei:
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? 3699-3703
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoWLGLXW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoWLGLXW22
Jinzheng Zhao, Peipei Wu, Xubo Liu, Shidrokh Goudarzi, Haohe Liu, Yong Xu, Wenwu Wang:
Audio Visual Multi-Speaker Tracking with Improved GCF and PMBM Filter. 3704-3708
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiXCZZW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiXCZZW22
Zhuo Li, Runqiu Xiao, Hangting Chen, Zhenduo Zhao, Zihan Zhang, Wenchao Wang:
The HCCL System for the NIST SRE21. 3709-3713
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoML22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoML22
Zhenke Gao, Man-Wai Mak, Weiwei Lin:
UNet-DenseNet for Robust Far-Field Speaker Verification. 3714-3718
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShaoY0GSH022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShaoY0GSH022
Qijie Shao, Jinghao Yan, Jian Kang, Pengcheng Guo, Xian Shi, Pengfei Hu, Lei Xie:
Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition. 3719-3723
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShenLK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShenLK22
Peng Shen, Xugang Lu, Hisashi Kawai:
Transducer-based language embedding for spoken language identification. 3724-3728
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWHWLWLH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWHWLWLH22
Binling Wang, Feng Wang, Wenxuan Hu, Qiulin Wang, Jing Li, Dong Wang, Lin Li, Qingyang Hong:
Oriental Language Recognition (OLR) 2021: Summary and Analysis. 3729-3733

Speech Segmentation II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KangAF22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KangAF22a
Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan:
Mixup regularization strategies for spoofing countermeasure system. 3734-3738
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GhoshFBFW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GhoshFBFW22
Arindam Ghosh, Mark C. Fuhs, Deblin Bagchi, Bahman Farahani, Monika Woszczyna:
Low-resource Low-footprint Wake-word Detection using Knowledge Distillation. 3739-3743
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DingRLHWNOM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DingRLHWNOM22
Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw:
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition. 3744-3748
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FanLDLZCZMX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FanLDLZCZMX22
Zhiyun Fan, Zhenlin Liang, Linhao Dong, Yi Liu, Shiyu Zhou, Meng Cai, Jun Zhang, Zejun Ma, Bo Xu:
Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire. 3749-3753
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RhoPK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RhoPK22
Daniel Rho, Jinhyeok Park, Jong Hwan Ko:
NAS-VAD: Neural Architecture Search for Voice Activity Detection. 3754-3758
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Larsen0T22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Larsen0T22
Claus M. Larsen, Peter Koch, Zheng-Hua Tan:
Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic Delay. 3759-3763
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XiaoHC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XiaoHC22
Yang Xiao, Nana Hou, Eng Siong Chng:
Rainbow Keywords: Efficient Incremental Learning for Online Spoken Keyword Spotting. 3764-3768
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuCS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuCS22
Ge Zhu, Juan Pablo Cáceres, Justin Salamon:
Filler Word Detection and Classification: A Dataset and Benchmark. 3769-3773

Robust ASR, and Far-field/Multi-talker ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KandaWWXMWGC0Y22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KandaWWXMWGC0Y22
Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Multi-Talker ASR with Token-Level Serialized Output Training. 3774-3778
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParadaDSO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParadaDSO22
Pablo Peso Parada, Agnieszka Dobrowolska, Karthikeyan Saravanan, Mete Ozay:
pMCT: Patched Multi-Condition Training for Robust Speech Recognition. 3779-3783
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NovitasariFK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NovitasariFK22
Sashi Novitasari, Takashi Fukuda, Gakuto Kurata:
Improving ASR Robustness in Noisy Condition Through VAD Integration. 3784-3788
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TakedaSNK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TakedaSNK22
Ryu Takeda, Yui Sudo, Kazuhiro Nakadai, Kazunori Komatani:
Empirical Sampling from Latent Utterance-wise Evidence Model for Missing Data ASR based on Neural Encoder-Decoder Model. 3789-3793
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuangZZQW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuangZZQW22
Xuyi Zhuang, Lu Zhang, Zehua Zhang, Yukun Qian, Mingjiang Wang:
Coarse-Grained Attention Fusion With Joint Training Framework for Complex Speech Enhancement and End-to-End Speech Recognition. 3794-3798
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoCC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoCC22
Zixun Guo, Chen Chen, Eng Siong Chng:
DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition. 3799-3803
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiGJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiGJ22
Kun Wei, Pengcheng Guo, Ning Jiang:
Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism. 3804-3808
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GaoFPML22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GaoFPML22
Yan Gao, Javier Fernández-Marqués, Titouan Parcollet, Abhinav Mehrotra, Nicholas D. Lane:
Federated Self-supervised Speech Representations: Are We There Yet? 3809-3813
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWKEY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWKEY22
Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka:
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation. 3814-3818
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangMFW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangMFW22
Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe:
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation. 3819-3823
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BandoAIN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BandoAIN22
Yoshiaki Bando, Takahiro Aizawa, Katsutoshi Itoyama, Kazuhiro Nakadai:
Weakly-Supervised Neural Full-Rank Spatial Covariance Analysis for a Front-End System of Distant Speech Recognition. 3824-3828
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OMalleyNW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OMalleyNW22
Thomas R. O'Malley, Arun Narayanan, Quan Wang:
A universally-deployable ASR frontend for joint acoustic echo cancellation, speech enhancement, and voice separation. 3829-3833
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChetupalliG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChetupalliG22
Srikanth Raj Chetupalli, Sriram Ganapathy:
Speaker conditioned acoustic modeling for multi-speaker conversational ASR. 3834-3838
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DasC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DasC22
Nilaksh Das, Polo Chau:
Hear No Evil: Towards Adversarial Robustness of Automatic Speech Recognition via Multi-Task Learning. 3839-3843
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengZW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengZW22
Xianrui Zheng, Chao Zhang, Philip C. Woodland:
Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription. 3844-3848

ASR: Linguistic Components

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FarooqH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FarooqH22
Muhammad Umar Farooq, Thomas Hain:
Investigating the Impact of Crosslingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition. 3849-3853
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShenG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShenG22
Zhijie Shen, Wu Guo:
An Improved Deliberation Network with Text Pre-training for Code-Switching Automatic Speech Recognition. 3854-3858
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangHQMSW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangHQMSW22
Zhihan Wang, Feng Hou, Yuanhang Qiu, Zhizhong Ma, Satwinder Singh, Ruili Wang:
CyclicAugment: Speech Data Random Augmentation with Cosine Annealing Scheduler for Auotmatic Speech Recognition. 3859-3863
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NieYG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NieYG22
Mengxi Nie, Ming Yan, Caixia Gong:
Prompt-based Re-ranking Language Model for ASR. 3864-3868
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiTSLLSGZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiTSLLSGZ22
Xin-Chun Li, Jin-Lin Tang, Shaoming Song, Bingshuai Li, Yinchuan Li, Yunfeng Shao, Le Gan, De-Chuan Zhan:
Avoid Overfitting User Specific Information in Federated Keyword Spotting. 3869-3873
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangLP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangLP22
Jingyuan Yang, Rongjun Li, Wei Peng:
ASR Error Correction with Constrained Decoding on Operation Prediction. 3874-3878
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PhamWN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PhamWN22
Ngoc-Quan Pham, Alexander Waibel, Jan Niehues:
Adaptive multilingual speech recognition with pretrained models. 3879-3883
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UyenTH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UyenTH22
Hoang Thi Thu Uyen, Nguyen Anh Tu, Ta Duc Huy:
Vietnamese Capitalization and Punctuation Recovery Models. 3884-3888
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FutamiIUMSK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FutamiIUMSK22
Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara:
Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM. 3889-3893
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0014YTTYD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0014YTTYD22
Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Jianhua Tao, Yu Ting Yeung, Liqun Deng:
reducing multilingual context confusion for end-to-end code-switching automatic speech recognition. 3894-3898
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TsunooKN022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TsunooKN022
Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Prasad Narisetty, Shinji Watanabe:
Residual Language Model for End-to-end Speech Recognition. 3899-3903
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengAOHDW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengAOHDW22
Huahuan Zheng, Keyu An, Zhijian Ou, Chen Huang, Ke Ding, Guanglu Wan:
An Empirical Study of Language Model Integration for Transducer based Speech Recognition. 3904-3908
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangGGJSN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangGGJSN22
Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf Schlüter, Hermann Ney:
Self-Normalized Importance Sampling for Neural Language Modeling. 3909-3913
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FoxD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FoxD22
Jennifer Drexler Fox, Natalie Delworth:
Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model. 3914-3918
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/UdagawaSKIS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/UdagawaSKIS22
Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, George Saon:
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems. 3919-3923
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SongXGWSLLD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SongXGWSLLD22
Tongtong Song, Qiang Xu, Meng Ge, Longbiao Wang, Hao Shi, Yongjie Lv, Yuqin Lin, Jianwu Dang:
Language-specific Characteristic Assistance for Code-switching Speech Recognition. 3924-3928

Speech Intelligibility Prediction for Hearing-Impaired Listeners II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IrinoTY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IrinoTY22
Toshio Irino, Honoka Tamaru, Ayako Yamamoto:
Speech intelligibility of simulated hearing loss sounds and its prediction using the Gammachirp Envelope Similarity Index (GESI). 3929-3933
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuckvaleH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuckvaleH22
Mark A. Huckvale, Gaston Hilkhuysen:
ELO-SPHERES intelligibility prediction model for the Clarity Prediction Challenge 2022. 3934-3938
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangWGLS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangWGLS22
Samuel J. Yang, Scott Wisdom, Chet Gnegy, Richard F. Lyon, Sagar Savla:
Listening with Googlears: Low-Latency Neural Multiframe Beamforming and Equalization for Hearing Aids. 3939-3943
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zezario0FWT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zezario0FWT22
Ryandhimas Edo Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao:
MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids. 3944-3948

Show and Tell III(VR)

- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/TanDHZD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanDHZD22
Kye Min Tan, Richeng Duan, Xin Huang, Bowei Zou, Xuan Long Do:
A Deep Learning Platform for Language Education Research and Development. 3949-3950
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/JinX022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JinX022
Yujia Jin, Yanlu Xie, Jinsong Zhang:
A VR Interactive 3D Mandarin Pronunciation Teaching Model. 3951-3952

Summarization, Entity Extraction, Evaluation and Others

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/StromKH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/StromKH22
Nikko Strom, Haidar Khan, Wael Hamza:
Squashed Weight Distribution for Low Bit Quantization of Deep Models. 3953-3957
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HollandsBC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HollandsBC22
Samuel Hollands, Daniel Blackburn, Heidi Christensen:
Evaluating the Performance of State-of-the-Art ASR Systems on Non-Native English using Corpora with Extensive Language Background Variation. 3958-3962
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SinglaJKPPB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SinglaJKPPB22
Karan Singla, Shahab Jalalvand, Yeon-Jun Kim, Ryan Price, Daniel Pressel, Srinivas Bangalore:
Seq-2-Seq based Refinement of ASR Output for Spoken Name Capture. 3963-3967
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RouxRWD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RouxRWD22
Thibault Bañeras Roux, Mickael Rouvier, Jane Wottawa, Richard Dufour:
Qualitative Evaluation of Language Model Rescoring in Automatic Speech Recognition. 3968-3972
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FariaJAR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FariaJAR22
Arlo Faria, Adam Janin, Sidhi Adkoli, Korbinian Riedhammer:
Toward Zero Oracle Word Error Rate on the Switchboard Benchmark. 3973-3977
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimLZSAZFKS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimLZSAZFKS22
Suyoun Kim, Duc Le, Weiyi Zheng, Tarun Singh, Abhinav Arora, Xiaoyu Zhai, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer:
Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric. 3978-3982

Automatic Analysis of Paralinguistics

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoonHBJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoonHBJ22
Jeewoo Yoon, Jinyoung Han, Erik P. Bucy, Jungseock Joo:
Predicting Emotional Intensity in Political Debates via Non-verbal Signals. 3983-3987
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaekiMFSOKM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaekiMFSOKM22
Mao Saeki, Kotoka Miyagi, Shinya Fujie, Shungo Suzuki, Tetsuji Ogawa, Tetsunori Kobayashi, Yoichi Matsuyama:
Confusion Detection for Adaptive Conversational Strategies of An Oral Proficiency Assessment Interview Agent. 3988-3992
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GentATS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GentATS22
Helen Gent, Chase Adams, Yan Tang, Chilin Shih:
Deep Learning for Prosody-Based Irony Classification in Spontaneous Speech. 3993-3997
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GhoshKKSU22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GhoshKKSU22
Sreyan Ghosh, Sonal Kumar, Yaman Kumar, Rajiv Ratn Shah, Srinivasan Umesh:
Span Classification with Structured Information for Disfluency Detection in Spoken Utterances. 3998-4002
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangRNNS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangRNNS22
Yi Chang, Zhao Ren, Thanh Tam Nguyen, Wolfgang Nejdl, Björn W. Schuller:
Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis. 4003-4007
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RenniePV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RenniePV22
Gordon Rennie, Olga Perepelkina, Alessandro Vinciarelli:
Which Model is Best: Comparing Methods and Metrics for Automatic Laughter Detection in a Naturalistic Conversational Dataset. 4008-4012

Self Supervision and Anti-Spoofing

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DissenKK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DissenKK22
Yehoshua Dissen, Felix Kreuk, Joseph Keshet:
Self-supervised Speaker Diarization. 4013-4017
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LepageD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LepageD22
Théo Lepage, Réda Dehak:
Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning. 4018-4022
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KawaPS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KawaPS22
Piotr Kawa, Marcin Plata, Piotr Syga:
Attack Agnostic Dataset: Towards Generalization and Stabilization of Audio DeepFake Detection. 4023-4027
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoPZMVD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoPZMVD22
Jaejin Cho, Raghavendra Pappagari, Piotr Zelasko, Laureano Moro-Velázquez, Jesús Villalba, Najim Dehak:
Non-contrastive self-supervised learning of utterance-level speech representations. 4028-4032
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MohammadAminiMB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MohammadAminiMB22
Mohammad MohammadAmini, Driss Matrouf, Jean-François Bonastre, Sandipana Dowerah, Romain Serizel, Denis Jouvet:
Barlow Twins self-supervised learning for robust speaker recognition. 4033-4037

Speech Articulation & Neural Processing

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PuffayCVhF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PuffayCVhF22
Corentin Puffay, Jana Van Canneyt, Jonas Vanthornhout, Hugo Van hamme, Tom Francart:
Relating the fundamental frequency of speech with EEG using a dilated convolutional network. 4038-4042
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FinoFPFD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FinoFPFD22
Verdiana De Fino, Lionel Fontan, Julien Pinquier, Isabelle Ferrané, Sylvain Detey:
Prediction of L2 speech proficiency based on multi-level linguistic features. 4043-4047
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RanaPS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RanaPS22
Fareeha S. Rana, Daniel Pape, Elisabet Service:
The effect of increasing acoustic and linguistic complexity on auditory processing: an EEG study. 4048-4052
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiSSZBPM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiSSZBPM22
Katrina Kechun Li, Julia Schwarz, Jasper Hong Sim, Yixin Zhang, Elizabeth Buchanan-Worster, Brechtje Post, Kirsty McDougall:
Recording and timing vocal responses in online experimentation. 4053-4057
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CorderoDSM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CorderoDSM22
Maria del Mar Cordero, Ambre Denis-Noël, Elsa Spinelli, Fanny Meunier:
Neural correlates of acoustic and semantic cues during speech segmentation in French. 4058-4062
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MengLGVCLG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MengLGVCLG22
Kevin Meng, Seo-Hyun Lee, Farhad Goodarzy, Simon J. Vogrin, Mark J. Cook, Seong-Whan Lee, David B. Grayden:
Evidence of Onset and Sustained Neural Responses to Isolated Phonemes from Intracranial Recordings in a Voice-based Cursor Control Task. 4063-4067

Low Resource Spoken Language Understanding

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MdhaffarDPE22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MdhaffarDPE22
Salima Mdhaffar, Jarod Duret, Titouan Parcollet, Yannick Estève:
End-to-end model for named entity recognition from speech without paired training data. 4068-4072
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MeeusMh22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MeeusMh22
Quentin Meeus, Marie-Francine Moens, Hugo Van hamme:
Multitask Learning for Low Resource Spoken Language Understanding. 4073-4077

Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JayeshSVSG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JayeshSVSG22
M. K. Jayesh, Mukesh Sharma, Praneeth Vonteddu, Mahaboob Ali Basha Shaik, Sriram Ganapathy:
Transformer Networks for Non-Intrusive Speech Quality Prediction. 4078-4082
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TammBVh22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TammBVh22
Bastiaan Tamm, Helena Balabin, Rik Vandenberghe, Hugo Van hamme:
Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications. 4083-4087
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MartinezRH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MartinezRH22
Helard Becerra Martinez, Alessandro Ragano, Andrew Hines:
Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction. 4088-4092

Novel Models and Training Methods for ASR I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenZRRMBZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenZRRMBZ22
Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno, Ankur Bapna, Heiga Zen:
MAESTRO: Matched Speech Text Representations through Modality Matching. 4093-4097
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangC22
Da-Hee Yang, Joon-Hyuk Chang:
FiLM Conditioning with Enhanced Feature to the Transformer-based End-to-End Noisy Speech Recognition. 4098-4102
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RisteaIK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RisteaIK22
Nicolae-Catalin Ristea, Radu Tudor Ionescu, Fahad Shahbaz Khan:
SepTr: Separable Transformer for Audio Spectrogram Processing. 4103-4107
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoriiFONOK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoriiFONOK22
Koharu Horii, Meiko Fukuda, Kengo Ohta, Ryota Nishimura, Atsunori Ogawa, Norihide Kitaoka:
End-to-End Spontaneous Speech Recognition Using Disfluency Labeling. 4108-4112
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/OlivierR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/OlivierR22
Raphaël Olivier, Bhiksha Raj:
Recent improvements of ASR models in the face of adversarial attacks. 4113-4117
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShimS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShimS22
Kyuhong Shim, Wonyong Sung:
Similarity and Content-based Phonetic Self Attention for Speech Recognition. 4118-4122
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimL22
Juntae Kim, Jeehye Lee:
Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention Layers. 4123-4127
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SongLY022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SongLY022
Zeyang Song, Qi Liu, Qu Yang, Haizhou Li:
Knowledge distillation for In-memory keyword spotting model. 4128-4132
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MeyerMZSN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MeyerMZSN22
Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney:
Automatic Learning of Subword Dependent Model Scales. 4133-4136
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BittarG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BittarG22
Alexandre Bittar, Philip N. Garner:
Bayesian Recurrent Units and the Forward-Backward Algorithm. 4137-4141

Acoustic scene analysis

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MeiLSPW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MeiLSPW22
Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang:
On Metric Learning for Audio-Text Cross-Modal Retrieval. 4142-4146
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HouLKWB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HouLKWB22
Yuanbo Hou, Zhaoyi Liu, Bo Kang, Yun Wang, Dick Botteldooren:
CT-SAT: Contextual Transformer for Sequential Audio Tagging. 4147-4151
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangJHC0Y22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangJHC0Y22
Zi Huang, Shulei Ji, Zhilan Hu, Chuangjian Cai, Jing Luo, Xinyu Yang:
ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition. 4152-4156
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeiC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeiC22
Han Lei, Ning Chen:
Audio-Visual Scene Classification Based on Multi-modal Graph Fusion. 4157-4161
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ReddyGDCMA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ReddyGDCMA22
Chandan K. A. Reddy, Vishak Gopal, Harishchandra Dubey, Ross Cutler, Sergiy Matusevych, Robert Aichner:
MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection. 4162-4166
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Chen0DW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Chen0DW22
Kun Chen, Jun Wang, Feng Deng, Xiaorui Wang:
iCNN-Transformer: An improved CNN-Transformer with Channel-spatial Attention and Keyword Prediction for Automated Audio Captioning. 4167-4171
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiL22
Xian Li, Xiaofei Li:
ATST: Audio Representation Learning with Teacher-Student Transformer. 4172-4176
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangDCWL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangDCWL22
Yajian Wang, Jun Du, Hang Chen, Qing Wang, Chin-Hui Lee:
Deep Segment Model for Acoustic Scene Classification. 4177-4181
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SonowalT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SonowalT22
Sukanya Sonowal, Anish Tamse:
Novel Augmentation Schemes for Device Robust Acoustic Scene Classification. 4182-4186
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuZLJWKZJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuZLJWKZJ22
Shichao Hu, Bin Zhang, Jinhong Lu, Yiliang Jiang, Wucheng Wang, Lingcheng Kong, Weifeng Zhao, Tao Jiang:
WideResNet with Joint Representation Learning and Data Augmentation for Cover Song Identification. 4187-4191
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParikhSSWM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParikhSSWM22
Rahil Parikh, Harshavardhan Sundar, Ming Sun, Chao Wang, Spyros Matsoukas:
Impact of Acoustic Event Tagging on Scene Classification in a Multi-Task Learning Framework. 4192-4196
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TakeuchiONHK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TakeuchiONHK22
Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino:
Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval. 4197-4201

Speech Coding and Privacy

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PanNZHZL0T22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PanNZHZL0T22
Jiahui Pan, Shuai Nie, Hui Zhang, Shulin He, Kanghao Zhang, Shan Liang, Xueliang Zhang, Jianhua Tao:
Speaker recognition-assisted robust audio deepfake detection. 4202-4206
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuKW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuKW22
Yuchen Liu, Apu Kapadia, Donald S. Williamson:
Preventing sensitive-word recognition using self-supervised learning to preserve user-privacy for automatic speech recognition. 4207-4211
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PiaGKMF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PiaGKMF22
Nicola Pia, Kishan Gupta, Srikanth Korse, Markus Multrus, Guillaume Fuchs:
NESC: Robust Neural End-2-End Speech Coding with GANs. 4212-4216
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XuePJL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XuePJL22
Huaying Xue, Xiulian Peng, Xue Jiang, Yan Lu:
Towards Error-Resilient Neural Speech Coding. 4217-4221
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JiangPXZL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JiangPXZL22
Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu:
Cross-Scale Vector Quantization for Scalable Neural Speech Coding. 4222-4226
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuCLKTW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuCLKTW22
Haohe Liu, Woosung Choi, Xubo Liu, Qiuqiang Kong, Qiao Tian, DeLiang Wang:
Neural Vocoder is All You Need for Speech Super-resolution. 4227-4231
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLKTZWHW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLKTZWHW22
Haohe Liu, Xubo Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao, DeLiang Wang, Chuanzeng Huang, Yuxuan Wang:
VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration. 4232-4236
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/StoidisC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/StoidisC22
Dimitrios Stoidis, Andrea Cavallaro:
Generating gender-ambiguous voices for privacy-preserving speech recognition. 4237-4241

Speech Synthesis: Singing, Multimodal, Crosslingual Synthesis

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangWZWLXZXB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangWZWLXZXB22
Yu Wang, Xinsheng Wang, Pengcheng Zhu, Jie Wu, Hanzhao Li, Heyang Xue, Yongmao Zhang, Lei Xie, Mengxiao Bi:
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis. 4242-4246
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhanYZZL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhanYZZL22
Haoyue Zhan, Xinyuan Yu, Haitong Zhang, Yang Zhang, Yue Lin:
Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech. 4247-4251
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangZLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangZLL22
Zewang Zhang, Yibin Zheng, Xinhui Li, Li Lu:
WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses. 4252-4256
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengL22
Yukun Peng, Zhenhua Ling:
Decoupled Pronunciation and Prosody Modeling in Meta-Learning-based Multilingual Speech Synthesis. 4257-4261
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuangYZJH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuangYZJH22
Xiaobin Zhuang, Huiran Yu, Weifeng Zhao, Tao Jiang, Peng Hu:
KaraTuner: Towards End-to-End Natural Pitch Correction for Singing Voice in Karaoke. 4262-4266
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/XueWZXZB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/XueWZXZB22
Heyang Xue, Xinsheng Wang, Yongmao Zhang, Lei Xie, Pengcheng Zhu, Mengxiao Bi:
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher. 4267-4271
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuoSQ0J22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuoSQ0J22
Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe, Qin Jin:
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy. 4272-4276
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiGQHWXCLW0J22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiGQHWXCLW0J22
Jiatong Shi, Shuai Guo, Tao Qian, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe, Qin Jin:
Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis. 4277-4281
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuLC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuLC22
Chang Liu, Zhen-Hua Ling, Ling-Hui Chen:
Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations. 4282-4286
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLT0WYM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLT0WYM22
Chao Wang, Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Yibiao Yu, Zejun Ma:
Towards high-fidelity singing voice conversion with acoustic reference and contrastive predictive coding. 4287-4291
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouLYTY0KM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouLYTY0KM22
Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng:
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information. 4292-4296
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ManghatMS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ManghatMS22
Sreeram Manghat, Sreeja Manghat, Tanja Schultz:
Normalization of code-switched text for speech synthesis. 4297-4301
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChungM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChungM22
Raymond Chung, Brian Mak:
Synthesizing Near Native-accented Speech for a Non-native Speaker by Imitating the Pronunciation and Prosody of a Native Speaker. 4302-4306
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiLS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiLS22
Xu Li, Shansong Liu, Ying Shan:
A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion. 4307-4311

Applications in Transcription, Education and Learning II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Yang0S22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Yang0S22
Longfei Yang, Jinsong Zhang, Takahiro Shinozaki:
Self-Supervised Learning with Multi-Target Contrastive Coding for Non-Native Acoustic Modeling of Mispronunciation Verification. 4312-4316
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangGCK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangGCK22
Daniel Zhang, Ashwinkumar Ganesan, Sarah Campbell, Daniel Korzekwa:
L2-GEN: A Neural Phoneme Paraphrasing Approach to L2 Speech Synthesis for Mispronunciation Diagnosis. 4317-4321
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DuttaTRHIBH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DuttaTRHIBH22
Satwik Dutta, Sarah Anne Tao, Jacob C. Reyna, Rebecca Elizabeth Hacker, Dwight W. Irvin, Jay F. Buzhardt, John H. L. Hansen:
Challenges remain in Building ASR for Spontaneous Preschool Children Speech in Naturalistic Educational Environments. 4322-4326
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangWY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangWY22
Zhan Zhang, Yuehai Wang, Jianyi Yang:
End-to-end Mispronunciation Detection with Simulated Error Distance. 4327-4331
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangWY22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangWY22a
Zhan Zhang, Yuehai Wang, Jianyi Yang:
BiCAPT: Bidirectional Computer-Assisted Pronunciation Training with Normalizing Flows. 4332-4336
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FuGT0M22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FuGT0M22
Kaiqi Fu, Shaojun Gao, Xiaohai Tian, Wei Li, Zejun Ma:
Using Fluency Representation Learned from Sequential Raw Features for Improving Non-native Fluency Scoring. 4337-4341
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenLX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenLX22
Qi Chen, BingHuai Lin, YanLu Xie:
An Alignment Method Leveraging Articulatory Features for Mispronunciation Detection and Diagnosis in L2 English. 4342-4346
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NidadavoluXJGDS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NidadavoluXJGDS22
Phani Sankar Nidadavolu, Na Xu, Nick Jutila, Ravi Teja Gadde, Aswarth Abhilash Dara, Joseph Savold, Sapan Patel, Aaron Hoff, Veerdhawal Pande, Kevin Crews, Ankur Gandhe, Ariya Rastrow, Roland Maas:
RefTextLAS: Reference Text Biased Listen, Attend, and Spell Model For Accurate Reading Evaluation. 4347-4351
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengDHYXGWC0022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengDHYXGWC0022
Nianzu Zheng, Liqun Deng, Wenyong Huang, Yu Ting Yeung, Baohua Xu, Yuanyuan Guo, Yasheng Wang, Xiao Chen, Xin Jiang, Qun Liu:
CoCA-MDD: A Coupled Cross-Attention based Framework for Streaming Mispronunciation Detection and Diagnosis. 4352-4356

Spoofing-Aware Automatic Speaker Verification (SASV) II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuMKLLWLM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuMKLLWLM22
Haibin Wu, Lingwei Meng, Jiawen Kang, Jinchao Li, Xu Li, Xixin Wu, Hung-yi Lee, Helen Meng:
Spoofing-Aware Speaker Verification by Multi-Level Fusion. 4357-4361
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KangAF22b
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KangAF22b
Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan:
End-to-end framework for spoof-aware speaker verification. 4362-4366
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinCHFYYSHM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinCHFYYSHM22
Jucai Lin, Tingwei Chen, Jingbiao Huang, Ruidong Fang, Jun Yin, Yuanping Yin, Wei Shi, Weizhen Huang, Yapeng Mao:
The CLIPS System for 2022 Spoofing-Aware Speaker Verification Challenge. 4367-4370
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangHZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangHZ22
Peng Zhang, Peng Hu, Xueliang Zhang:
Norm-constrained Score-level Ensemble for Spoofing Aware Speaker Verification. 4371-4375
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangLWZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangLWZ22
Yuxiang Zhang, Zhuo Li, Wenchao Wang, Pengyuan Zhang:
SASV Based on Pre-trained ASV System and Integrated Scoring Module. 4376-4380
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangLZWX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangLZWX22
Li Zhang, Yue Li, Huan Zhao, Qing Wang, Lei Xie:
Backend Ensemble for Speaker Verification and Spoofing Countermeasure. 4381-4385
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TanZZWQG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TanZZWQG22
Hao Tan, Junjian Zhang, Huan Zhang, Le Wang, Yaguan Qian, Zhaoquan Gu:
NRI-FGSM: An Efficient Transferable Adversarial Attack for Speaker Recognition Systems. 4386-4390
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TengFWPS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TengFWPS22
Zhongwei Teng, Quchen Fu, Jules White, Maria E. Powell, Douglas C. Schmidt:
SA-SASV: An End-to-End Spoof-Aggregated Spoofing-Aware Speaker Verification System. 4391-4395
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangQWXL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangQWXL22
Xingming Wang, Xiaoyi Qin, Yikang Wang, Yunfei Xu, Ming Li:
The DKU-OPPO System for the 2022 Spoofing-Aware Speaker Verification Challenge. 4396-4400

Speech Coding and Restoration

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HanL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HanL22
Seungu Han, Junhyeok Lee:
NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates. 4401-4405
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaekiTNTS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaekiTNTS22
Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, Hiroshi Saruwatari:
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling. 4406-4410
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ByunSSBP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ByunSSBP22
Joon Byun, Seungmin Shin, Jongmo Sung, Seungkwon Beack, Youngcheol Park:
Optimization of Deep Neural Network (DNN) Speech Coder Using a Multi Time Scale Perceptual Loss Function. 4411-4415
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimL22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimL22a
Donghyeon Kim, Bowon Lee:
Phase Vocoder For Time Stretch Based On Center Frequency Estimation. 4416-4420
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SiahkoohiCDKS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SiahkoohiCDKS22
Ali Siahkoohi, Michael Chinen, Tom Denton, W. Bastiaan Kleijn, Jan Skoglund:
Ultra-Low-Bitrate Speech Coding with Pretrained Transformers. 4421-4425
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MiaoWCYT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MiaoWCYT22
Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia A. Tomashenko:
Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions. 4426-4430

Streaming ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RadfarBSCSSM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RadfarBSCSSM22
Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju Chang, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris:
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition. 4431-4435
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0002NJSMGZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0002NJSMGZ22
Kaiqi Zhao, Hieu Nguyen, Animesh Jain, Nathan Susanj, Athanasios Mouchtaris, Lokesh Gupta, Ming Zhao:
Knowledge Distillation via Module Replacing for Automatic Speech Recognition with Recurrent Neural Network Transducer. 4436-4440
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeL022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeL022
Jaesong Lee, Lukas Lee, Shinji Watanabe:
Memory-Efficient Training of RNN-Transducer with Sampled Softmax. 4441-4445
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DoLD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DoLD22
Cong-Thanh Do, Mohan Li, Rama Doddipatla:
Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer. 4446-4450
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SklyarPO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SklyarPO22
Ilya Sklyar, Anna Piunova, Christian Osendorfer:
Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech. 4451-4455

Applications in Transcription, Education and Learning I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WongZC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WongZC22
Jeremy Heng Meng Wong, Huayun Zhang, Nancy F. Chen:
Variations of multi-task learning for spoken language assessment. 4456-4460
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuniharaZSMN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuniharaZSMN22
Takuya Kunihara, Chuanbo Zhu, Daisuke Saito, Nobuaki Minematsu, Noriko Nakanishi:
Detection of Learners' Listening Breakdown with Oral Dictation and Its Use to Model Listening Skill Improvement Exclusively Through Shadowing. 4461-4465
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuzukiKT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuzukiKT22
Yu Suzuki, Tsuneo Kato, Akihiro Tamura:
Automatic Prosody Evaluation of L2 English Read Speech in Reference to Accent Dictionary with Transformer Encoder. 4466-4470
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BannoBGKK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BannoBGKK22
Stefano Bannò, Bhanu Balusu, Mark J. F. Gales, Kate M. Knill, Konstantinos Kyriakopoulos:
View-Specific Assessment of L2 Spoken English. 4471-4475
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaiHCHS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaiHCHS22
Yu Bai, Ferdy Hubers, Catia Cucchiarini, Roeland van Hout, Helmer Strik:
The Effects of Implicit and Explicit Feedback in an ASR-based Reading Tutor for Dutch First-graders. 4476-4480
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangHLKH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangHLKH22
Mu Yang, Kevin Hirschi, Stephen Daniel Looney, Okim Kang, John H. L. Hansen:
Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment. 4481-4485

Spoken Dialogue Systems

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SakumaFK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SakumaFK22
Jin Sakuma, Shinya Fujie, Tetsunori Kobayashi:
Response Timing Estimation for Spoken Dialog System using Dialog Act Estimation. 4486-4490
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JabeenB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JabeenB22
Farhat Jabeen, Simon Betz:
Hesitations in Urdu/Hindi: Distribution and Properties of Fillers & Silences. 4491-4495
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GirishKV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GirishKV22
K. V. Vijay Girish, Srikanth Konjeti, Jithendra Vepa:
Interpretabilty of Speech Emotion Recognition modelled using Self-Supervised Speech and Text Pre-Trained Embeddings. 4496-4500
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KumarMV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KumarMV22
Ayush Kumar, Vijit Malik, Jithendra Vepa:
Does Utterance entails Intent?: Evaluating Natural Language Inference Based Setup for Few-Shot Intent Detection. 4501-4505
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WallbridgeL022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WallbridgeL022
Sarenne Carrol Wallbridge, Catherine Lai, Peter Bell:
Investigating perception of spoken dialogue acceptability through surprisal. 4506-4510
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HoriHR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HoriHR22
Chiori Hori, Takaaki Hori, Jonathan Le Roux:
Low-Latency Online Streaming VideoQA Using Audio-Visual Transformers. 4511-4515

The VoiceMOS Challenge

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Stan22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Stan22
Adriana Stan:
The ZevoMOS entry to VoiceMOS Challenge 2022. 4516-4520
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaekiXNKTS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaekiXNKTS22
Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari:
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022. 4521-4525
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NguyenLU22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenLU22
Huy Nguyen, Kai Li, Masashi Unoki:
Automatic Mean Opinion Score Estimation with Temporal Modulation Features on Gammatone Filterbank for Speech Assessment. 4526-4530
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChinenSRRH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChinenSRRH22
Michael Chinen, Jan Skoglund, Chandan K. A. Reddy, Alessandro Ragano, Andrew Hines:
Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset. 4531-4535
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangC0WTY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangC0WTY22
Wen-Chin Huang, Erica Cooper, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi:
The VoiceMOS Challenge 2022. 4536-4540
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TsengKL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TsengKL22
Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee:
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores. 4541-4545

Speech Synthesis: Speaking Style, Emotion and Accents I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AbbasMMKMSGD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AbbasMMKMSGD22
Syed Ammar Abbas, Thomas Merritt, Alexis Moinet, Sri Karlapati, Ewa Muszynska, Simon Slangen, Elia Gatti, Thomas Drugman:
Expressive, Variable, and Controllable Duration Modelling in TTS. 4546-4550
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NakataKTSIMS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NakataKTSIMS22
Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari:
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis. 4551-4555
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimC22
Min-Kyung Kim, Joon-Hyuk Chang:
Adversarial and Sequential Training for Cross-lingual Prosody Transfer TTS. 4556-4560
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimUYK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimUYK22
Changhwan Kim, Seyun Um, Hyungchan Yoon, Hong-Goo Kang:
FluentTTS: Text-dependent Fine-grained Style Control for Multi-style TTS. 4561-4565
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangCHL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangCHL22
Wei-Ping Huang, Po-Chun Chen, Sung-Feng Huang, Hung-yi Lee:
Few Shot Cross-Lingual TTS Using Transferable Phoneme Embedding. 4566-4570
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FinkelsteinZCCJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FinkelsteinZCCJ22
Lev Finkelstein, Heiga Zen, Norman Casagrande, Chun-an Chan, Ye Jia, Tom Kenter, Alexey Petelin, Jonathan Shen, Vincent Wan, Yu Zhang, Yonghui Wu, Rob Clark:
Training Text-To-Speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks. 4571-4575
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoshiokaYMOT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoshiokaYMOT22
Daiki Yoshioka, Yusuke Yasuda, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda:
Spoken-Text-Style Transfer with Conditional Variational Autoencoder and Content Word Storage. 4576-4580
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KulkarniCJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KulkarniCJ22
Ajinkya Kulkarni, Vincent Colotte, Denis Jouvet:
Analysis of expressivity transfer in non-autoregressive end-to-end multispeaker TTS systems. 4581-4585
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RattcliffeWMKMC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RattcliffeWMKMC22
Dino Rattcliffe, You Wang, Alex Mansbridge, Penny Karanasou, Alexis Moinet, Marius Cotescu:
Cross-lingual Style Transfer with Conditional Prior VAE and Style Loss. 4586-4590
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZaidiSNC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZaidiSNC22
Julian Zaïdi, Hugo Seuté, Benjamin van Niekerk, Marc-André Carbonneau:
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis. 4591-4595
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoonKLYSKH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoonKLYSKH22
Hyun-Wook Yoon, Ohsung Kwon, Hoyeon Lee, Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim, Min-Jae Hwang:
Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems. 4596-4600
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MukherjeeBSM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MukherjeeBSM22
Arijit Mukherjee, Shubham Bansal, Sandeepkumar Satpal, Rupesh K. Mehta:
Text aware Emotional Text-to-speech with BERT. 4601-4605

Speech Segmentation I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MatejuKCMZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MatejuKCMZ22
Lukás Mateju, Frantisek Kynych, Petr Cerva, Jirí Málek, Jindrich Zdánský:
Overlapped Speech Detection in Broadcast Streams Using X-vectors. 4606-4610
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SegalHGBRK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SegalHGBRK22
Yael Segal, Kasia Hitczenko, Matthew Goldrick, Adam Buchwald, Angela Roberts, Joseph Keshet:
DDKtor: Automatic Diadochokinetic Speech Analysis. 4611-4615
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MenesesHPR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MenesesHPR22
Michel Cardoso Meneses, Rafael Bérgamo Holanda, Luis Vasconcelos Peres, Gabriela Dantas Rocha:
SiDi KWS: A Large-Scale Multilingual Dataset for Keyword Spotting. 4616-4620
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimYCC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimYCC22
Byeonggeun Kim, Seunghan Yang, Inseop Chung, Simyung Chang:
Dummy Prototypical Networks for Few-Shot Open-Set Keyword Spotting. 4621-4625
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SarkarPM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SarkarPM22
Eklavya Sarkar, RaviShankar Prasad, Mathew Magimai-Doss:
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering. 4626-4630
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SharonSMG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SharonSMG22
Rini A. Sharon, Heet Shah, Debdoot Mukherjee, Vikram Gupta:
Multilingual and Multimodal Abuse Detection. 4631-4635
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MariotteLMT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MariotteLMT22
Théo Mariotte, Anthony Larcher, Silvio Montrésor, Jean-Hugh Thomas:
Microphone Array Channel Combination Algorithms for Overlapped Speech Detection. 4636-4640
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Sudo0NS022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Sudo0NS022
Yui Sudo, Muhammad Shakeel, Kazuhiro Nakadai, Jiatong Shi, Shinji Watanabe:
Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection. 4641-4645
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FuchsHK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FuchsHK22
Tzeviya Fuchs, Yedid Hoshen, Yossi Keshet:
Unsupervised Word Segmentation using K Nearest Neighbors. 4646-4650

Human Speech & Signal Processing

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhangW022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhangW022
Zhuohuang Zhang, Donald S. Williamson, Yi Shen:
Investigation on the Band Importance of Phase-aware Speech Enhancement. 4651-4655
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunHW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunHW22
Yifan Sun, Qinlong Huang, Xihong Wu:
Unsupervised Acoustic-to-Articulatory Inversion with Variable Vocal Tract Anatomy. 4656-4660
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunHW22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunHW22a
Yifan Sun, Qinlong Huang, Xihong Wu:
Unsupervised Inference of Physiologically Meaningful Articulatory Trajectories with VocalTractLab. 4661-4665
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoYLZN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoYLZN22
Running Zhao, Jiangtao Yu, Tingle Li, Hang Zhao, Edith C. H. Ngai:
Radio2Speech: High Quality Speech Recovery from Radio Frequency Signals. 4666-4670
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NabeDS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NabeDS22
Mamady Nabé, Julien Diard, Jean-Luc Schwartz:
Isochronous is beautiful? Syllabic event detection in a neuro-inspired oscillatory model is facilitated by isochrony in speech. 4671-4675
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiebigWMB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiebigWMB22
Leon Liebig, Christoph Wagner, Alexander Mainka, Peter Birkholz:
An investigation of regression-based prediction of the femininity or masculinity in speech of transgender people. 4676-4680
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParikhSSSE22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParikhSSSE22
Rahil Parikh, Nadee Seneviratne, Ganesh Sivaraman, Shihab A. Shamma, Carol Y. Espy-Wilson:
Acoustic To Articulatory Speech Inversion Using Multi-Resolution Spectro-Temporal Representations Of Speech Signals. 4681-4685
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LianBGA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LianBGA22
Jiachen Lian, Alan W. Black, Louis Goldstein, Gopala Krishna Anumanchipalli:
Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition. 4686-4690
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhang0WHK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhang0WHK22
Zhao Zhang, Ju Zhang, Jianguo Wei, Kiyoshi Honda, Tatsuya Kitamura:
Vocal-Tract Area Functions with Articulatory Reality for Tract Opening. 4691-4694

Speech Emotion Recognition II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Li0ZZZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Li0ZZZ22
Shaokai Li, Peng Song, Keke Zhao, Wenjing Zhang, Wenming Zheng:
Coupled Discriminant Subspace Alignment for Cross-database Speech Emotion Recognition. 4695-4699
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SantosoYIHM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SantosoYIHM22
Jennifer Santoso, Takeshi Yamada, Kenkichi Ishizuka, Taiichi Hashimoto, Shoji Makino:
Performance Improvement of Speech Emotion Recognition by Neutral Speech Detection Using Autoencoder and Intermediate Representation. 4700-4704
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuTHH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuTHH22
Ying Hu, Yuwu Tang, Hao Huang, Liang He:
A Graph Isomorphism Network with Weighted Multiple Aggregators for Speech Emotion Recognition. 4705-4709
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BaruahB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BaruahB22
Murchana Baruah, Bonny Banerjee:
Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network. 4710-4714
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MitraCKCA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MitraCKCA22
Vikramjit Mitra, Hsiang-Yun Sherry Chien, Vasudha Kowtha, Joseph Yitan Cheng, Erdrin Azemi:
Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation. 4715-4719
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuHX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuHX22
Desheng Hu, Xinhui Hu, Xinkang Xu:
Multiple Enhancements to LSTM for Learning Emotion-Salient Features in Speech Emotion Recognition. 4720-4724
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoWW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoWW22
Zihan Zhao, Yanfeng Wang, Yu Wang:
Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion Recognition. 4725-4729
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenZ22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenZ22a
Chengxin Chen, Pengyuan Zhang:
CTA-RNN: Channel and Temporal-wise Attention RNN leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition. 4730-4734
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VelichkoMK022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VelichkoMK022
Alena Velichko, Maxim Markitantov, Heysem Kaya, Alexey Karpov:
Complex Paralinguistic Analysis of Speech: Predicting Gender, Emotions and Deception in a Hierarchical Framework. 4735-4739
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TakashimaMAYUO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TakashimaMAYUO22
Akihiko Takashima, Ryo Masumura, Atsushi Ando, Yoshihiro Yamazaki, Mihiro Uchida, Shota Orihashi:
Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition. 4740-4744
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KangPW022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KangPW022
Zuheng Kang, Junqing Peng, Jianzong Wang, Jing Xiao:
SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning. 4745-4749
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuSGX022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuSGX022
Yang Liu, Haoqin Sun, Wenbo Guan, Yuqi Xia, Zhen Zhao:
Discriminative Feature Representation Based on Cascaded Attention Network with Adversarial Joint Loss for Speech Emotion Recognition. 4750-4754
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AudibertF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AudibertF22
Nicolas Audibert, Cécile Fougeron:
Intra-speaker phonetic variation in read speech: comparison with inter-speaker variability in a controlled population. 4755-4759

Speaker Recognition and Anti-Spoofing

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/VaessenL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VaessenL22
Nik Vaessen, David A. van Leeuwen:
Training speaker recognition systems with limited data. 4760-4764
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LouPZQDZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LouPZQDZ22
Yijie Lou, Shiliang Pu, Jianfeng Zhou, Xin Qi, Qinbo Dong, Hongwei Zhou:
A Deep One-Class Learning Method for Replay Attack Detection. 4765-4769
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoDGL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoDGL22
Haodong Zhao, Wei Du, Junjie Guo, Gongshen Liu:
A Universal Identity Backdoor Attack against Speaker Verification based on Siamese Network. 4770-4774
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangXWZW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangXWZW22
Xin Wang, Chuan Xie, Qiang Wu, Huayi Zhan, Ying Wu:
A Novel Phoneme-based Modeling for Text-independent Speaker Identification. 4775-4779
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HanCQ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HanCQ22
Bing Han, Zhengyang Chen, Yanmin Qian:
Self-Supervised Speaker Verification Using Dynamic Loss-Gate and Label Correction. 4780-4784
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiMH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiMH22
Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu:
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT. 4785-4789
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiFCGSD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiFCGSD22
Jin Li, Xin Fang, Fan Chu, Tian Gao, Yan Song, Rong Li Dai:
Acoustic Feature Shuffling Network for Text-independent Speaker Verification. 4790-4794
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WenLYLM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WenLYLM22
Yan Wen, Zhenchun Lei, Yingen Yang, Changhong Liu, Minglei Ma:
Multi-Path GMM-MobileNet Based on Attack Algorithms and Codecs for Synthetic Speech and Deepfake Detection. 4795-4799
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JinJCLDS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JinJCLDS22
Minho Jin, Chelsea Ju, Zeya Chen, Yi-Chieh Liu, Jasha Droppo, Andreas Stolcke:
Adversarial Reweighting for Speaker Verification Fairness. 4800-4804
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChenMRS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChenMRS22
Long Chen, Yixiong Meng, Venkatesh Ravichandran, Andreas Stolcke:
Graph-based Multi-View Fusion and Local Adaptation: Mitigating Within-Household Confusability for Speaker Identification. 4805-4809

Miscellaneous Topics in Speech, Voice and Hearing Disorders

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZuoM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZuoM22
Ronglai Zuo, Brian Mak:
Local Context-aware Self-attention for Continuous Sign Language Recognition. 4810-4814
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WeiseKMNHSY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WeiseKMNHSY22
Tobias Weise, Philipp Klumpp, Andreas K. Maier, Elmar Nöth, Björn Heismann, Maria Schuster, Seung Hee Yang:
Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment. 4815-4819
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SongWWJQXJLYLWQ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SongWWJQXJLYLWQ22
Kaitao Song, Teng Wan, Bixia Wang, Huiqiang Jiang, Luna Qiu, Jiahang Xu, Liping Jiang, Qun Lou, Yuqing Yang, Dongsheng Li, Xudong Wang, Lili Qiu:
Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech. 4820-4824
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangDGYHWCJLM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangDGYHWCJLM22
Tianzi Wang, Jiajun Deng, Mengzhe Geng, Zi Ye, Shoukang Hu, Yi Wang, Mingyu Cui, Zengrui Jin, Xunying Liu, Helen Meng:
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection. 4825-4829
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FanSPW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FanSPW22
Zixia Fan, Jing Shao, Weigong Pan, Lan Wang:
Revisiting visuo-spatial processing in individuals with congenital amusia. 4830-4834
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DigehsaraMWBSPB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DigehsaraMWBSPB22
Pouriya Amini Digehsara, João Vítor Possamai de Menezes, Christoph Wagner, Michael Bärhold, Petr Schaffer, Dirk Plettemeier, Peter Birkholz:
A user-friendly headset for radar-based silent speech recognition. 4835-4839
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChengYGFW022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChengYGFW022
Jingwen Cheng, Yuchen Yan, Yingming Gao, Xiaoli Feng, Yannan Wang, Jinsong Zhang:
A study of production error analysis for Mandarin-speaking Children with Hearing Impairment. 4840-4844

Low-Resource ASR Development II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuoHSGMSSB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuoHSGMSSB22
Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays:
Incremental Layer-Wise Self-Supervised Learning for Efficient Unsupervised Speech Domain Adaptation On Device. 4845-4849
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FarooqNH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FarooqNH22
Muhammad Umar Farooq, Darshan Adiga Haniya Narayana, Thomas Hain:
Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion. 4850-4854
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoWLCWCZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoWLCWCZ22
Jing Zhao, Haoyu Wang, Jinpeng Li, Shuzhou Chai, Guanbo Wang, Guoguo Chen, Wei-Qiang Zhang:
The THUEE System Description for the IARPA OpenASR21 Challenge. 4855-4859
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhongSWSLPFDZD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhongSWSLPFDZD22
Guolong Zhong, Hongyu Song, Ruoyu Wang, Lei Sun, Diyuan Liu, Jia Pan, Xin Fang, Jun Du, Jie Zhang, Lirong Dai:
External Text Based Data Augmentation for Low-Resource Speech Recognition in the Constrained Condition of OpenASR21 Challenge. 4860-4864
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LonerganQCGC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LonerganQCGC22
Liam Lonergan, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl, Ailbhe Ní Chasaide:
Cross-dialect lexicon optimisation for an endangered language ASR system: the case of Irish. 4865-4869
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhuWCWZ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhuWCWZ022
Han Zhu, Li Wang, Gaofeng Cheng, Jindong Wang, Pengyuan Zhang, Yonghong Yan:
Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR. 4870-4874
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SchranerSPNV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SchranerSPNV22
Yanick Schraner, Christian Scheller, Michel Plüss, Lukas Neukom, Manfred Vogel:
Comparison of Unsupervised Learning and Supervised Learning with Noisy Labels for Low-Resource Speech Recognition. 4875-4879
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PatelS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PatelS22
Tanvina Patel, Odette Scharenborg:
Using cross-model learnings for the Gram Vaani ASR Challenge 2022. 4880-4884
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiMMB022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiMMB022
Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe:
ASR2K: Speech Recognition for Around 2000 Languages without Audio. 4885-4889
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DamaniaHP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DamaniaHP22
Ronit Damania, Christopher Homan, Emily Prud'hommeaux:
Combining Simple but Novel Data Augmentation Methods for Improving Conformer ASR. 4890-4894
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PetersonTY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PetersonTY22
Kay Peterson, Audrey Tong, Yan Yu:
OpenASR21: The Second Open Challenge for Automatic Speech Recognition of Low-Resource Languages. 4895-4899
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FanA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FanA22
Ruchao Fan, Abeer Alwan:
DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR. 4900-4904
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GuillaumeWGNFJM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GuillaumeWGNFJM22
Séverine Guillaume, Guillaume Wisniewski, Benjamin Galliot, Minh Chau Nguyen, Maxime Fily, Guillaume Jacques, Alexis Michaud:
Plugging a neural phoneme recognizer into a simple language model: a workflow for low-resource setting. 4905-4909

Voice Conversion and Adaptation I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChoiXT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChoiXT22
Yeonjong Choi, Chao Xie, Tomoki Toda:
An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions. 4910-4914
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0007JTSAS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0007JTSAS22
Zijiang Yang, Xin Jing, Andreas Triantafyllopoulos, Meishu Song, Ilhan Aslan, Björn W. Schuller:
An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion. 4915-4919
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QuamerDLCG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QuamerDLCG22
Waris Quamer, Anurag Das, John Levis, Evgeny Chukharev-Hudilainen, Ricardo Gutierrez-Osuna:
Zero-Shot Foreign Accent Conversion without a Native Reference. 4920-4924
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MeyerLDKTV22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MeyerLDKTV22
Sarina Meyer, Florian Lux, Pavel Denisov, Julia Koch, Pascal Tilli, Ngoc Thang Vu:
Speaker Anonymization with Phonetic Intermediate Representations. 4925-4929
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuhlmannSEWH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuhlmannSEWH22
Michael Kuhlmann, Fritz Seebauer, Janek Ebbers, Petra Wagner, Reinhold Haeb-Umbach:
Investigation into Target Speaking Rate Adaptation for Voice Conversion. 4930-4934
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KlapsasENVKMRSJ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KlapsasENVKMRSJ22
Konstantinos Klapsas, Nikolaos Ellinas, Karolos Nikitaras, Georgios Vamvoukakis, Panagiotis Kakoulidis, Konstantinos Markopoulos, Spyros Raptis, June Sig Sung, Gunu Jho, Aimilios Chalamandaris, Pirros Tsiakoulis:
Self supervised learning for robust voice cloning. 4935-4939

Search/Decoding Algorithms for ASR

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuSHPSMW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuSHPSMW22
Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang:
Improving Deliberation by Text-Only and Semi-Supervised Training. 4940-4944
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kim022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kim022
Jounghee Kim, Pilsung Kang:
K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables. 4945-4949
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SriramAB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SriramAB22
Anuroop Sriram, Michael Auli, Alexei Baevski:
Wav2Vec-Aug: Improved self-supervised training with limited data. 4950-4954
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KocourZOSDOBC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KocourZOSDOBC22
Martin Kocour, Katerina Zmolíková, Lucas Ondel, Jan Svec, Marc Delcroix, Tsubasa Ochiai, Lukás Burget, Jan Cernocký:
Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model. 4955-4959
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NovakP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NovakP22
Mirek Novak, Pavlos Papadopoulos:
RNN-T lattice enhancement by grafting of pruned paths. 4960-4964
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KomatsuFLL0K22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KomatsuFLL0K22
Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, Yusuke Kida:
Better Intermediates Improve CTC Inference. 4965-4969

Emotional Speech Production and Perception

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GessingerCZM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GessingerCZM22
Iona Gessinger, Michelle Cohn, Georgia Zellou, Bernd Möbius:
Cross-Cultural Comparison of Gradient Emotion Perception: Human vs. Alexa TTS Voices. 4970-4974
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KasunARLH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KasunARLH22
L. L. Chamara Kasun, Chung Soo Ahn, Jagath C. Rajapakse, Zhiping Lin, Guang-Bin Huang:
Discriminative Adversarial Learning for Speaker Independent Emotion Recognition. 4975-4979
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuzukiN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuzukiN22
Naoaki Suzuki, Satoshi Nakamura:
Representing 'how you say' with 'what you say': English corpus of focused speech and text reflecting corresponding implications. 4980-4984
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SalaisAMRTOR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SalaisAMRTOR22
Léane Salais, Pablo Arias, Clément Le Moine, Victor Rosi, Yann Teytaut, Nicolas Obin, Axel Roebel:
Production Strategies of Vocal Attitudes. 4985-4989
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KirklandLSG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KirklandLSG22
Ambika Kirkland, Harm Lameris, Éva Székely, Joakim Gustafson:
Where's the uh, hesitation? The interplay between filled pause location, speech rate and fundamental frequency in perception of confidence. 4990-4994

Speech Analysis

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangCRSPPLA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangCRSPPLA22
W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Tara N. Sainath, Rohit Prabhavalkar, Cal Peyser, Zhiyun Lu, Cyril Allauzen:
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR. 4995-4999
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YehT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YehT22
Sung-Lin Yeh, Hao Tang:
Autoregressive Co-Training for Learning Discrete Speech Representation. 5000-5004
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangT0L22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangT0L22
Kai-Wei Chang, Wei-Cheng Tseng, Shang-Wen Li, Hung-yi Lee:
An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks. 5005-5009
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LebourdaisTLM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LebourdaisTLM22
Martin Lebourdais, Marie Tahon, Antoine Laurent, Sylvain Meignier:
Overlapped speech and gender detection with WavLM pre-trained features. 5010-5014
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TeytautBR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TeytautBR22
Yann Teytaut, Baptiste Bouvier, Axel Roebel:
A study on constraining Connectionist Temporal Classification for temporal audio alignment. 5015-5019
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SiriwardenaSE22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SiriwardenaSE22
Yashish M. Siriwardena, Ganesh Sivaraman, Carol Y. Espy-Wilson:
Acoustic-to-articulatory Speech Inversion with Multi-task Learning. 5020-5024

Trustworthy Speech Processing

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaoucheSVBT022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaoucheSVBT022
Mohamed Maouche, Brij Mohan Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent:
Enhancing Speech Privacy with Slicing. 5025-5029
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangSHL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangSHL22
Yu-Lin Huang, Bo-Hao Su, Y.-W. Peter Hong, Chi-Chun Lee:
An Attention-Based Method for Guiding Attribute-Aligned Speech Representation Learning. 5030-5034
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JoshiKSZVKD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JoshiKSZVKD22
Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesús Villalba, Sanjeev Khudanpur, Najim Dehak:
Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser. 5035-5039
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TsengKL22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TsengKL22a
Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee:
Membership Inference Attacks Against Self-supervised Speech Models. 5040-5044
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShaoVJKKD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShaoVJKKD22
Yiwen Shao, Jesús Villalba, Sonal Joshi, Saurabh Kataria, Sanjeev Khudanpur, Najim Dehak:
Chunking Defense for Adversarial Attacks on ASR. 5045-5049
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FengN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FengN22
Tiantian Feng, Shrikanth Narayanan:
Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-Labeling. 5050-5054
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FengPN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FengPN22
Tiantian Feng, Raghuveer Peri, Shrikanth Narayanan:
User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition on Federated Learning. 5055-5059
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JoshiKVD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JoshiKVD22
Sonal Joshi, Saurabh Kataria, Jesús Villalba, Najim Dehak:
AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification. 5060-5064

Speaker Recognition and Diarization

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YooSKL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YooSKL22
Eunkyung Yoo, Hyeonseop Song, Taehyeong Kim, Chul Lee:
Online Learning of Open-set Speaker Identification by Active User-registration. 5065-5069
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SalimSA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SalimSA22
Shinimol Salim, Syed Shahnawazuddin, Waquar Ahmad:
Automatic Speaker Verification System for Dysarthria Patients. 5070-5074
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FlemotomosN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FlemotomosN22
Nikolaos Flemotomos, Shrikanth Narayanan:
Multimodal Clustering with Role Induced Constraints for Speaker Diarization. 5075-5079
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParkKBG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParkKBG22
Taejin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg:
Multi-scale Speaker Diarization with Dynamic Scale Weighting. 5080-5084
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChaubeySG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChaubeySG22
Ashutosh Chaubey, Sparsh Sinha, Susmita Ghose:
Improved Relation Networks for End-to-End Speaker Verification and Identification. 5085-5089
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RybickaVDK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RybickaVDK22
Magdalena Rybicka, Jesús Villalba, Najim Dehak, Konrad Kowalczyk:
End-to-End Neural Speaker Diarization with an Iterative Refinement of Non-Autoregressive Attention-based Attractors. 5090-5094
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LandiniLDB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LandiniLDB22
Federico Landini, Alicia Lozano-Diez, Mireia Díez, Lukás Burget:
From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization. 5095-5099
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IdeSNO22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IdeSNO22
Yuta Ide, Susumu Saito, Teppei Nakano, Tetsuji Ogawa:
Can Humans Correct Errors From System? Investigating Error Tendencies in Speaker Identification Using Crowdsourcing. 5100-5104
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KimPULJLK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KimPULJLK22
Miseul Kim, Zhenyu Piao, Seyun Um, Ran Lee, Jaemin Joh, Seungshin Lee, Hong-Goo Kang:
Light-Weight Speaker Verification with Global Context Information. 5105-5109
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PengGMPBC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PengGMPBC22
Junyi Peng, Rongzhi Gu, Ladislav Mosner, Oldrich Plchot, Lukás Burget, Jan Cernocký:
Learnable Sparse Filterbank for Speaker Verification. 5110-5114

Self-supervised, Semi-supervised, Adaptation and Data Augmentation for ASR II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Sapru22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Sapru22
Ashtosh Sapru:
Using Data Augmentation and Consistency Regularization to Improve Semi-supervised Speech Recognition. 5115-5119
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MaiC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MaiC22
Long Mai, Julie Carson-Berndsen:
Unsupervised domain adaptation for speech recognition with unsupervised error correction. 5120-5124
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BiadsyCZRRM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BiadsyCZRRM22
Fadi Biadsy, Youzheng Chen, Xia Zhang, Oleg Rybakov, Andrew Rosenberg, Pedro J. Moreno:
A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization. 5125-5129
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DieckPANK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DieckPANK22
Teena tom Dieck, Paula Andrea Pérez-Toro, Tomas Arias, Elmar Nöth, Philipp Klumpp:
Wav2vec behind the Scenes: How end2end Models learn Phonetics. 5130-5134
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhengXKL0FKSM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhengXKL0FKSM22
Weiyi Zheng, Alex Xiao, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed:
Scaling ASR Improves Zero and Few Shot Learning. 5135-5139
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NakagomeKFIK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NakagomeKFIK22
Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida:
InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR. 5140-5144
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ArunkumarSU22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ArunkumarSU22
A. Arunkumar, Vrunda Nileshkumar Sukhadia, Srinivasan Umesh:
Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition. 5145-5149

Spoken Language Processing I

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuC22
Zhengyuan Liu, Nancy F. Chen:
Dynamic Sliding Window Modeling for Abstractive Meeting Summarization. 5150-5154
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaitoNTTS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaitoNTTS22
Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari:
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent. 5155-5159
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RumbergGEWBOL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RumbergGEWBOL22
Lars Rumberg, Christopher Gebauer, Hanna Ehlert, Maren Wallbaum, Lena Bornholt, Jörn Ostermann, Ulrike Lüdtke:
kidsTALC: A Corpus of 3- to 11-year-old German Children's Connected Natural Speech. 5160-5164
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LinCCYCD0MLL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LinCCYCD0MLL22
Guan-Ting Lin, Yung-Sung Chuang, Ho-Lam Chung, Shu-Wen Yang, Hsuan-Jui Chen, Shuyan Annie Dong, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-Shan Lee:
DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering. 5165-5169
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JungK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JungK22
Myunghun Jung, Hoi Rin Kim:
Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings. 5170-5174
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChangL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChangL22
Chih-Chiang Chang, Hung-yi Lee:
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation. 5175-5179
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NguyenDVP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenDVP22
Thi Thu Trang Nguyen, Trung Duc Anh Dang, Quoc Viet Vu, Woomyoung Park:
Building Vietnamese Conversational Smart Home Dataset and Natural Language Understanding Model. 5180-5184
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GhoshLSSU22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GhoshLSSU22
Sreyan Ghosh, Samden Lepcha, Sakshi Singh, Rajiv Ratn Shah, Srinivasan Umesh:
DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances. 5185-5189
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/EkstedtS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/EkstedtS22
Erik Ekstedt, Gabriel Skantze:
Voice Activity Projection: Self-supervised Learning of Turn-taking Events. 5190-5194
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PopuriCWPAGHL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PopuriCWPAGHL22
Sravya Popuri, Peng-Jen Chen, Changhan Wang, Juan Pino, Yossi Adi, Jiatao Gu, Wei-Ning Hsu, Ann Lee:
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation. 5195-5199
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HuangGWSL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HuangGWSL22
Jinmiao Huang, Waseem Gharbieh, Qianhui Wan, Han Suk Shim, Hyun Chul Lee:
QbyE-MLPMixer: Query-by-Example Open-Vocabulary Keyword Spotting using MLPMixer. 5200-5204
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GharbiehHWSL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GharbiehHWSL22
Waseem Gharbieh, Jinmiao Huang, Qianhui Wan, Han Suk Shim, Hyun Chul Lee:
DyConvMixer: Dynamic Convolution Mixer Architecture for Open-Vocabulary Keyword Spotting. 5205-5209
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BelitzH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BelitzH22
Chelzy Belitz, John H. L. Hansen:
Challenges in Metadata Creation for Massive Naturalistic Team-Based Audio Data. 5210-5214

Show and Tell IV

- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/NicmanisS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NicmanisS22
Davis Nicmanis, Askars Salimbajevs:
Spoken Dialogue System for Call Centers with Expressive Speech Synthesis. 5215-5216
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/DraxlerP22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DraxlerP22
Christoph Draxler, Julian Pomp:
OCTRA - An Innovative Approach to Orthographic Transcription. 5217-5218
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/VrekenRL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/VrekenRL22
Emelie Van De Vreken, Korin Richmond, Catherine Lai:
Voice Puppetry with FastPitch. 5219-5220
- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/PaulPCZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PaulPCZ22
Debjyoti Paul, Yutong Pang, Szu-Jui Chen, Xuedong Zhang:
Improving Data Driven Inverse Text Normalization using Data Augmentation and Machine Translation. 5221-5222

Phonetics II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangBBM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangBBM22
Yizhou Wang, Rikke L. Bundgaard-Nielsen, Brett Baker, Olga Maxwell:
Native phonotactic interference in L2 vowel processing: Mouse-tracking reveals cognitive conflicts during identification. 5223-5227
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Luo22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Luo22
Mingqiong Luo:
Mandarin nasal place assimilation revisited: an acoustic study. 5228-5232
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Kaland22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Kaland22
Constantijn Kaland:
Bending the string: intonation contour length as a correlate of macro-rhythm. 5233-5237
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HughesLK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HughesLK22
Vincent Hughes, Carmen Llamas, Thomas Kettig:
Eliciting and evaluating likelihood ratios for speaker recognition by human listeners under forensically realistic channel-mismatched conditions. 5238-5242
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangH22
Bruce Xiao Wang, Vincent Hughes:
Reducing uncertainty at the score-to-LR stage in likelihood ratio-based forensic voice comparison using automatic speaker recognition systems. 5243-5247
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeeTCLLLF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeeTCLLLF22
Jonathan Him Nok Lee, Dehua Tao, Harold Chui, Tan Lee, Sarah Luk, Nicolette Wing Tung Lee, Koonkan Fung:
Durational Patterning at Discourse Boundaries in Relation to Therapist Empathy in Psychotherapy. 5248-5252
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KadiriJA22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KadiriJA22
Sudarsana Reddy Kadiri, Farhad Javanmardi, Paavo Alku:
Convolutional Neural Networks for Classification of Voice Qualities from Speech and Neck Surface Accelerometer Signals. 5253-5257
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FurukawaKN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FurukawaKN22
Kei Furukawa, Takeshi Kishiyama, Satoshi Nakamura:
Applying Syntax-Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis. 5258-5262
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiCZCW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiCZCW22
Yan Li, Ying Chen, Xinya Zhang, Yanyang Chen, Jiazheng Wang:
Effects of Language Contact on Vowel Nasalization in Wenzhou and Rugao Dialects. 5263-5267
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YoungBL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YoungBL22
Nathan Joel Young, David Britain, Adrian Leemann:
A blueprint for using deepfakes in sociolinguistic matched-guise experiments. 5268-5272
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TianDGWL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TianDGWL22
Zuoyu Tian, Xiao Dong, Feier Gao, Haining Wang, Chien-Jer Charles Lin:
Mandarin Tone Sandhi Realization: Evidence from Large Speech Corpora. 5273-5277
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeST22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeST22
Giang Le, Chilin Shih, Yan Tang:
A Laryngographic Study on the Voice Quality of Northern Vietnamese Tones under the Lombard Effect. 5278-5282
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZygisWHK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZygisWHK22
Marzena Zygis, Sarah Wesolek, Nina Hosseini-Kivanani, Manfred Krifka:
The Prosody of Cheering in Sport Events. 5283-5287
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangG22
Zihan Wang, Christer Gobl:
Contribution of the glottal flow residual in affect-related voice transformation. 5288-5292
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CarneKI22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CarneKI22
Michael Carne, Yuko Kinoshita, Shunichi Ishihara:
High level feature fusion in forensic voice comparison. 5293-5297
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BegusZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BegusZ22
Gasper Begus, Alan Zhou:
Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data. 5298-5302
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JunZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JunZ22
Sun-Ah Jun, Maria Luisa Zubizarreta:
Paraguayan Guarani: Tritonal pitch accent and Accentual Phrase. 5303-5307
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZengCZY22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZengCZY22
Qingcheng Zeng, Dading Chong, Peilin Zhou, Jie Yang:
Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective. 5308-5312

Source Separation III

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuoWCX0022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuoWCX0022
Jian Luo, Jianzong Wang, Ning Cheng, Edward Xiao, Xulong Zhang, Jing Xiao:
Tiny-Sepformer: A Tiny Time-Domain Transformer Network For Speech Separation. 5313-5317
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoGYTZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoGYTZ22
Zifeng Zhao, Rongzhi Gu, Dongchao Yang, Jinchuan Tian, Yuexian Zou:
Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction. 5318-5322
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LutatiNW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LutatiNW22
Shahar Lutati, Eliya Nachmani, Lior Wolf:
SepIt: Approaching a Single Channel Speech Separation Bound. 5323-5327
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Li0L22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Li0L22
Kai Li, Xiaolin Hu, Yi Luo:
On the Use of Deep Mask Estimation Module for Neural Source Separation Systems. 5328-5332
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoYGZZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoYGZZ22
Zifeng Zhao, Dongchao Yang, Rongzhi Gu, Haoran Zhang, Yuexian Zou:
Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches. 5333-5337
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangB22
Xue Yang, Changchun Bao:
Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation. 5338-5342
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLTW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLTW22
Fan-Lin Wang, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang:
Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks. 5343-5347
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IvryCB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IvryCB22
Amir Ivry, Israel Cohen, Baruch Berdugo:
Objective Metrics to Evaluate Residual-Echo Suppression During Double-Talk in the Stereophonic Case. 5348-5352
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RixenR22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RixenR22
Joel Rixen, Matthias Renz:
QDPN - Quasi-dual-path Network for single-channel Speech Separation. 5353-5357
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuWYLTDWS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuWYLTDWS22
Shun Lu, Yang Wang, Peng Yao, Chenxing Li, Jianchao Tan, Feng Deng, Xiaorui Wang, Chengru Song:
Conformer Space Neural Architecture Search for Multi-Task Audio Separation. 5358-5362
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KopukluT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KopukluT22
Okan Köpüklü, Maja Taseska:
ResectNet: An Efficient Architecture for Voice Activity Detection on Mobile Devices. 5363-5367
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuX22a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuX22a
Wenjing Liu, Chuan Xie:
Gated Convolutional Fusion for Time-Domain Target Speaker Extraction Network. 5368-5372
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WangLDLYTSW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WangLDLYTSW22
Yang Wang, Chenxing Li, Feng Deng, Shun Lu, Peng Yao, Jianchao Tan, Chengru Song, Xiaorui Wang:
WA-Transformer: Window Attention-based Transformer with Two-stage Strategy for Multi-task Audio Source Separation. 5373-5377
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/QuanL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/QuanL22
Changsheng Quan, Xiaofei Li:
Multichannel Speech Separation with Narrow-band Conformer. 5378-5382
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Zhang0K00EYXMQW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Zhang0K00EYXMQW22
Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei:
Separating Long-Form Speech with Group-wise Permutation Invariant Training. 5383-5387
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/PaturiSKG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/PaturiSKG22
Rohit Paturi, Sundararajan Srinivasan, Katrin Kirchhoff, Daniel Garcia-Romero:
Directed speech separation for automatic speech recognition of long form conversational speech. 5388-5392
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChetupalliH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChetupalliH22
Srikanth Raj Chetupalli, Emanuël A. P. Habets:
Speech Separation for an Unknown Number of Speakers Using Transformers With Encoder-Decoder Attractors. 5393-5397
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CoreyMSS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CoreyMSS22
Ryan M. Corey, Manan Mittal, Kanad Sarkar, Andrew C. Singer:
Cooperative Speech Separation With a Microphone Array and Asynchronous Wearable Devices. 5398-5402
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KilgourGHJWT22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KilgourGHJWT22
Kevin Kilgour, Beat Gfeller, Qingqing Huang, Aren Jansen, Scott Wisdom, Marco Tagliasacchi:
Text-Driven Separation of Arbitrary Sounds. 5403-5407
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ParikhRES22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ParikhRES22
Rahil Parikh, Gaspar Rochette, Carol Y. Espy-Wilson, Shihab A. Shamma:
An Empirical Analysis on the Vulnerabilities of End-to-End Speech Segregation Models. 5408-5412

Speech Enhancement and Intelligibility

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiYZL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiYZL22
Andong Li, Guochen Yu, Chengshi Zheng, Xiaodong Li:
TaylorBeamformer: Learning All-Neural Beamformer for Multi-Channel Speech Enhancement from Taylor's Approximation Theory. 5413-5417
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/IwamotoODISAK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/IwamotoODISAK22
Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri:
How bad are artifacts?: Analyzing the impact of speech enhancement errors on ASR. 5418-5422
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouB22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouB22
Jing Zhou, Changchun Bao:
Multi-source wideband DOA estimation method by frequency focusing and error weighting. 5423-5427
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NossierWMGC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NossierWMGC22
Soha A. Nossier, Julie A. Wall, Mansour Moniri, Cornelius Glackin, Nigel Cannings:
Convolutional Recurrent Smart Speech Enhancement Architecture for Hearing Aids. 5428-5432
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MengZ022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MengZ022
Weixin Meng, Chengshi Zheng, Xiaodong Li:
Fully Automatic Balance between Directivity Factor and White Noise Gain for Large-scale Microphone Arrays in Diffuse Noise Fields. 5433-5437
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/TianFGGWLM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/TianFGGWLM22
Xiaohai Tian, Kaiqi Fu, Shaojun Gao, Yiwei Gu, Kai Wang, Wei Li, Zejun Ma:
A Transfer and Multi-Task Learning based Approach for MOS Prediction. 5438-5442
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/YangZC0DRZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/YangZC0DRZ22
Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino, Yi Zhao:
Fusion of Self-supervised Learned Models for MOS Prediction. 5443-5447
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChaoYFL022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChaoYFL022
Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao:
Perceptual Contrast Stretching on Target Feature for Speech Enhancement. 5448-5452
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GengWZZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GengWZZ22
Yanzhang Geng, Heng Wang, Tao Zhang, Xin Zhao:
A speech enhancement method for long-range speech acquisition task. 5453-5457
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LuCLZCNMYSW0Q022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LuCLZCNMYSW0Q022
Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe:
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding. 5458-5462
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZezarioFCFW022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZezarioFCFW022
Ryandhimas Edo Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao:
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model. 5463-5467
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/BuZZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/BuZZ22
Suliang Bu, Yunxin Zhao, Tuo Zhao:
Steering vector correction in MVDR beamformer for speech enhancement. 5468-5472
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SabaH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SabaH22
Juliana N. Saba, John H. L. Hansen:
Speech Modification for Intelligibility in Cochlear Implant Listeners: Individual Effects of Vowel- and Consonant-Boosting. 5473-5477
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RenM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RenM22
Jigang Ren, Qirong Mao:
DCTCN: Deep Complex Temporal Convolutional Network for Long Time Speech Enhancement. 5478-5482
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhaoZYW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhaoZYW22
Ding Zhao, Zhan Zhang, Bin Yu, Yuehai Wang:
Improve Speech Enhancement using Perception-High-Related Time-Frequency Loss. 5483-5487

Speech Synthesis: Speaking Style, Emotion and Accents II

- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FernandezHLSS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FernandezHLSS22
Raul Fernandez, David Haws, Guy Lorberbom, Slava Shechtman, Alexander Sorin:
Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis. 5488-5492
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0008SSG022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0008SSG022
Rui Liu, Berrak Sisman, Björn W. Schuller, Guanglai Gao, Haizhou Li:
Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning. 5493-5497
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiWXWJX22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiWXWJX22
Tao Li, Xinsheng Wang, Qicong Xie, Zhichao Wang, Mingqi Jiang, Lei Xie:
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis. 5498-5502
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/WuWZHSN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/WuWZHSN22
Yihan Wu, Xi Wang, Shaofei Zhang, Lei He, Ruihua Song, Jian-Yun Nie:
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis. 5503-5507
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiuWZL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiuWZL22
Zhaoci Liu, Ning-Qian Wu, Yajie Zhang, Zhenhua Ling:
Integrating Discrete Word-Level Style Variations into Non-Autoregressive Acoustic Models for Speech Synthesis. 5508-5512
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DaiYWCBL0022
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DaiYWCBL0022
Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li, Deng Cai, Dong Yu:
Automatic Prosody Annotation with Pre-Trained Text-Speech Model. 5513-5517
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ZhouSL0B0M22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ZhouSL0B0M22
Yixuan Zhou, Changhe Song, Jingbei Li, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng:
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis. 5518-5522
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LeiZCH0KM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LeiZCH0KM22
Shun Lei, Yixuan Zhou, Liyang Chen, Jiankun Hu, Zhiyong Wu, Shiyin Kang, Helen Meng:
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis. 5523-5527
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/LiSW0JM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/LiSW0JM22
Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng:
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset. 5528-5532
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MengL0LSXSZM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MengL0LSXSZM22
Yi Meng, Xiang Li, Zhiyong Wu, Tingtian Li, Zixun Sun, Xinyu Xiao, Chi Sun, Hui Zhan, Helen Meng:
CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis. 5533-5537
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/HeGW0WXD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/HeGW0WXD22
Jiaxu He, Cheng Gong, Longbiao Wang, Di Jin, Xiaobao Wang, Junhai Xu, Jianwu Dang:
Improve emotional speech synthesis quality by learning explicit and implicit representations with semi-supervised training. 5538-5542

Show & Tell IV(VR)

- view
  - electronic edition @ isca-speech.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/interspeech/NguyenPNTLVBPN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenPNTLVBPN22
Tuan-Duy H. Nguyen, Duy Phung, Duy Tran-Cong Nguyen, Hieu Minh Tran, Manh Luong, Tin Duy Vo, Hung Hai Bui, Dinh Q. Phung, Dat Quoc Nguyen:
A Vietnamese-English Neural Machine Translation System. 5543-5544

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.