


default search action
IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 28
Volume 28, 2020
- Jamal Amini
, Richard Christian Hendriks
, Richard Heusdens
, Meng Guo
, Jesper Jensen
:
Rate-Constrained Noise Reduction in Wireless Acoustic Sensor Networks. 1-12 - Chitralekha Gupta
, Haizhou Li
, Ye Wang
:
Automatic Leaderboard: Evaluation of Singing Quality Without a Standard Reference. 13-26 - Sefik Emre Eskimez
, Ross K. Maddox
, Chenliang Xu
, Zhiyao Duan
:
Noise-Resilient Training Method for Face Landmark Generation From Speech. 27-38 - Peidong Wang
, Ke Tan
, DeLiang Wang
:
Bridging the Gap Between Monaural Speech Enhancement and Recognition With Distortion-Independent Acoustic Modeling. 39-48 - Yuki Mitsufuji
, Stefan Uhlich
, Norihiro Takamune, Daichi Kitamura
, Shoichi Koyama
, Hiroshi Saruwatari
:
Multichannel Non-Negative Matrix Factorization Using Banded Spatial Covariance Matrices in Wavenumber Domain. 49-60 - Yaron Laufer, Sharon Gannot
:
Scoring-Based ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in a Spatially Homogeneous Noise Field. 61-76 - Naveen Kumar Desiraju
, Simon Doclo
, Markus Buck, Tobias Wolff:
Online Estimation of Reverberation Parameters For Late Residual Echo Suppression. 77-91 - Mehdi Zohourian
, Rainer Martin
:
Binaural Direct-to-Reverberant Energy Ratio and Speaker Distance Estimation. 92-104 - Youhyun Shin
, Sang-goo Lee:
Learning Context Using Segment-Level LSTM for Neural Sequence Labeling. 105-115 - Gongping Huang
, Jingdong Chen
, Jacob Benesty
:
Design of Planar Differential Microphone Arrays With Fractional Orders. 116-130 - Ming-Hsiang Su
, Chung-Hsien Wu
, Liang-Yu Chen:
Attention-Based Response Generation Using Parallel Double Q-Learning for Dialog Policy Decision in a Conversational System. 131-143 - Satoru Emura
:
Wave-Domain Residual Echo Reduction Using Subspace Tracking. 144-156 - Xin Wang
, Shinji Takaki, Junichi Yamagishi
, Simon King, Keiichi Tokuda:
A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis. 157-170 - Falk-Martin Hoffmann
, Philip Arthur Nelson, Filippo Maria Fazi
:
DOA Estimation Performance With Circular Arrays in Sound Fields With Finite Rate of Innovation. 171-184 - Rongfeng Su
, Xunying Liu
, Lan Wang, Jingzhou Yang
:
Cross-Domain Deep Visual Feature Generation for Mandarin Audio-Visual Speech Recognition. 185-197 - Titouan Parcollet
, Mohamed Morchid
, Xavier Bost, Georges Linarès, Renato De Mori
:
Real to H-Space Autoencoders for Theme Identification in Telephone Conversations. 198-210 - Antonio Canclini
, Fabio Antonacci, Stefano Tubaro, Augusto Sarti
:
A Methodology for the Robust Estimation of the Radiation Pattern of Acoustic Sources. 211-224 - Yi Yu
, Hongsen He
, Badong Chen
, Jianghui Li
, Youwen Zhang
, Lu Lu
:
M-Estimate Based Normalized Subband Adaptive Filter Algorithm: Performance Analysis and Improvements. 225-239 - Haoxiang Wen
, Senquan Yang, Yuanquan Hong, Huan Luo:
A Partial Update Adaptive Algorithm for Sparse System Identification. 240-255 - Martin Bo Møller
, Jan Østergaard
:
A Moving Horizon Framework for Sound Zones. 256-265 - Stylianos Ioannis Mimilakis
, Konstantinos Drossos
, Estefanía Cano
, Gerald Schuller
:
Examining the Mapping Functions of Denoising Autoencoders in Singing Voice Separation. 266-278 - Lachlan Birnie
, Thushara D. Abhayapala
, Prasanga N. Samarasinghe
:
Reflection Assisted Sound Source Localization Through a Harmonic Domain MUSIC Framework. 279-293 - Wenhao Ding
, Liang He
:
Adaptive Multi-Scale Detection of Acoustic Events. 294-306 - Weijian Zhang, Peng Song
:
Transfer Sparse Discriminant Subspace Learning for Cross-Corpus Speech Emotion Recognition. 307-318 - Bidisha Sharma
, Ye Wang
:
Automatic Evaluation of Song Intelligibility Using Singing Adapted STOI and Vocal-Specific Features. 319-331 - Hai Morgenstern
, Boaz Rafaely
:
Perceptually-Transparent Online Estimation of Two-Channel Room Transfer Function for Sound Calibration. 332-342 - Shaojin Ding
, Guanlong Zhao
, Christopher Liberatore
, Ricardo Gutierrez-Osuna:
Learning Structured Sparse Representations for Voice Conversion. 343-354 - Mireia Díez
, Lukás Burget
, Federico Landini
, Jan Cernocký
:
Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors. 355-368 - Jia-Chen Gu
, Zhen-Hua Ling
, Quan Liu:
Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots. 369-379 - Ke Tan
, DeLiang Wang
:
Learning Complex Spectral Mapping With Gated Convolutional Recurrent Networks for Monaural Speech Enhancement. 380-390 - Richeng Duan
, Tatsuya Kawahara
, Masatake Dantsuji, Hiroaki Nanjo:
Cross-Lingual Transfer Learning of Non-Native Acoustic Modeling for Pronunciation Error Detection and Diagnosis. 391-401 - Xin Wang
, Shinji Takaki, Junichi Yamagishi
:
Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis. 402-415 - Sanjeel Parekh
, Slim Essid, Alexey Ozerov
, Ngoc Q. K. Duong
, Patrick Pérez, Gaël Richard
:
Weakly Supervised Representation Learning for Audio-Visual Scene Analysis. 416-428 - Jianfei Yu
, Jing Jiang, Rui Xia:
Entity-Sensitive Attention and Fusion Network for Entity-Level Multimodal Sentiment Classification. 429-439 - John G. Beerends
, Niels M. P. Neumann, Egon L. van den Broek
, Anna Llagostera Casanovas, Jovana Torres Menendez, Christian Schmidmer, Jens Berger:
Subjective and Objective Assessment of Full Bandwidth Speech Quality. 440-449 - Vikram C. Mathad
, S. R. Mahadeva Prasanna:
Vowel Onset Point Based Screening of Misarticulated Stops in Cleft Lip and Palate Speech. 450-460 - Minh Nguyen
, Gia H. Ngo
, Nancy F. Chen
:
Hierarchical Character Embeddings: Learning Phonological and Semantic Representations in Languages of Logographic Origin Using Recursive Neural Networks. 461-473 - Dani Cherkassky, Sharon Gannot
:
Successive Relative Transfer Function Identification Using Blind Oblique Projection. 474-486 - Ivo Trowitzsch
, Christopher Schymura
, Dorothea Kolossa
, Klaus Obermayer:
Joining Sound Event Detection and Localization Through Spatial Segregation. 487-502 - Shinichi Mogami
, Norihiro Takamune, Daichi Kitamura
, Hiroshi Saruwatari
, Yu Takahashi, Kazunobu Kondo, Nobutaka Ono
:
Independent Low-Rank Matrix Analysis Based on Time-Variant Sub-Gaussian Source Model for Determined Blind Source Separation. 503-518 - Hamzeh Ghasemzadeh
, Meisam Khalil Arjmandi
:
Toward Optimum Quantification of Pathology-Induced Noises: An Investigation of Information Missed by Human Auditory System. 519-528 - Fei Ma
, Wen Zhang
, Thushara Dheemantha Abhayapala
:
Active Control of Outgoing Broadband Noise Fields in Rooms. 529-539 - Jing-Xuan Zhang
, Zhen-Hua Ling
, Li-Rong Dai:
Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations. 540-552 - Tao Dai
, Li Zhu
, Yaxiong Wang, Kathleen M. Carley
:
Attentive Stacked Denoising Autoencoder With Bi-LSTM for Personalized Context-Aware Citation Recommendation. 553-568 - Yuta Nishimura
, Katsuhito Sudoh
, Graham Neubig, Satoshi Nakamura
:
Multi-Source Neural Machine Translation With Missing Data. 569-580 - Jin Wang
, Liang-Chih Yu
, K. Robert Lai
, Xuejie Zhang:
Tree-Structured Regional CNN-LSTM Model for Dimensional Sentiment Analysis. 581-591 - Abul Azad
, Lamine Mili
:
Robust Speech Filter and Voice Encoder Parameter Estimation Using the Phase-Phase Correlator. 592-604 - Abdullah Fahim
, Prasanga N. Samarasinghe
, Thushara D. Abhayapala
:
Multi-Source DOA Estimation Through Pattern Recognition of the Modal Coherence of a Reverberant Soundfield. 605-618 - Yaron Laufer
, Bracha Laufer-Goldshtein
, Sharon Gannot
:
ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in Rank-Deficient Noise Field. 619-634 - Zhongqing Wang
, Qingying Sun, Shoushan Li, Qiaoming Zhu, Guodong Zhou
:
Neural Stance Detection With Hierarchical Linguistic Representations. 635-645 - Ruizhi Li
, Xiaofei Wang
, Sri Harish Mallidi, Shinji Watanabe
, Takaaki Hori
, Hynek Hermansky
:
Multi-Stream End-to-End Speech Recognition. 646-655 - Yu Maeno
, Yuki Mitsufuji
, Prasanga N. Samarasinghe
, Naoki Murata
, Thushara D. Abhayapala
:
Spherical-Harmonic-Domain Feedforward Active Noise Control Using Sparse Decomposition of Reference Signals from Distributed Sensor Arrays. 656-670 - Qingyu Zhou
, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, Tiejun Zhao:
A Joint Sentence Scoring and Selection Framework for Neural Extractive Document Summarization. 671-681 - Ivan Kukanov
, Trung Ngo Trong, Ville Hautamäki
, Sabato Marco Siniscalchi
, Valerio Mario Salerno
, Kong Aik Lee
:
Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition. 682-695 - Shoichi Koyama
, Gilles Chardon
, Laurent Daudet
:
Optimizing Source and Sensor Placement for Sound Field Control: An Overview. 696-714 - Atsushi Ando
, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono, Tomoki Toda
:
Customer Satisfaction Estimation in Contact Center Calls Based on a Hierarchical Multi-Task Model. 715-728 - Thomas Dietzen
, Simon Doclo
, Marc Moonen
, Toon van Waterschoot:
Integrated Sidelobe Cancellation and Linear Prediction Kalman Filter for Joint Multi-Microphone Speech Dereverberation, Interfering Speech Cancellation, and Noise Reduction. 740-754 - Thomas Dietzen
, Simon Doclo
, Marc Moonen
, Toon van Waterschoot:
Square Root-Based Multi-Source Early PSD Estimation and Recursive RETF Update in Reverberant Environments by Means of the Orthogonal Procrustes Problem. 755-769 - Liwen Zhang
, Ziqiang Shi
, Jiqing Han
:
Pyramidal Temporal Pooling With Discriminative Mapping for Audio Classification. 770-784 - Mengfan Zhang
, Zhongshu Ge, Tiejun Liu, Xihong Wu, Tianshu Qu
:
Modeling of Individual HRTFs Based on Spatial Principal Component Analysis. 785-797 - Laureano Moro-Velázquez
, Estefanía Hernández-García
, Jorge Andrés Gómez García
, Juan Ignacio Godino-Llorente
, Najim Dehak
:
Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance. 798-812 - Yijia Liu
, Wanxiang Che, Bing Qin, Ting Liu:
Exploring Segment Representations for Neural Semi-Markov Conditional Random Fields. 813-824 - Morten Kolbæk
, Zheng-Hua Tan
, Søren Holdt Jensen, Jesper Jensen:
On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement. 825-838 - Yang Ai
, Zhen-Hua Ling
:
A Neural Vocoder With Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis. 839-851 - Dongyan Yu, Huiping Duan
, Jun Fang
, Bing Zeng
:
Predominant Instrument Recognition Based on Deep Neural Network With Auxiliary Classification. 852-861 - Ali Aroudi
, Simon Doclo
:
Cognitive-Driven Binaural Beamforming Using EEG-Based Auditory Attention Decoding. 862-875 - Christopher Gribben
, Hyunkook Lee
:
The Perception of Band-Limited Decorrelation Between Vertically Oriented Loudspeakers. 876-888 - Olivier Perrotin
, Ian Vince McLoughlin
:
Glottal Flow Synthesis for Whisper-to-Speech Conversion. 889-900 - Gongping Huang
, Jacob Benesty
, Israel Cohen
, Jingdong Chen
:
Differential Beamforming on Graphs. 901-913 - Bracha Laufer-Goldshtein
, Ronen Talmon
, Sharon Gannot
:
Global and Local Simplex Representations for Multichannel Source Separation. 914-928 - Henning F. Schepker
, Sven Nordholm
, Simon Doclo
:
Acoustic Feedback Suppression for Multi-Microphone Hearing Devices Using a Soft-Constrained Null-Steering Beamformer. 929-940 - Zhong-Qiu Wang
, DeLiang Wang
:
Deep Learning Based Target Cancellation for Speech Dereverberation. 941-950 - Yeongseok Kim
, Youngjin Park
:
Blockwise Weighted Least Square Active Noise Control for CPU-GPU Architecture. 951-963 - Odette Scharenborg
, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx
, Rachid Riad, Liming Wang
, Emmanuel Dupoux, Laurent Besacier, Alan W. Black
, Mark Hasegawa-Johnson
, Florian Metze
, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller:
Speech Technology for Unwritten Languages. 964-975 - Andros Tjandra
, Sakriani Sakti
, Satoshi Nakamura
:
Machine Speech Chain. 976-989 - M. Khadem-hosseini
, Shahrokh Ghaemmaghami
, Azra Abtahi
, Saeed Gazor
, Farrokh Marvasti
:
Error Correction in Pitch Detection Using a Deep Learning Based Classification. 990-999 - Enzo De Sena
, Zoran Cvetkovic
, Hüseyin Hacihabiboglu
, Marc Moonen
, Toon van Waterschoot
:
Localization Uncertainty in Time-Amplitude Stereophonic Reproduction. 1000-1015 - Vera Erbes
, Sascha Spors
:
Localisation Properties of Wave Field Synthesis in a Listening Room. 1016-1024 - Jia Pan
, Genshun Wan, Jun Du
, Zhongfu Ye
:
Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition. 1025-1037 - Weicheng Cai, Jinkun Chen
, Jun Zhang, Ming Li
:
On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition. 1038-1051 - George Sterpu
, Christian Saam
, Naomi Harte
:
How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition. 1052-1064 - Christopher Schymura
, Dorothea Kolossa
:
Audiovisual Speaker Tracking Using Nonlinear Dynamical Systems With Dynamic Stream Weights. 1065-1078 - Gongping Huang
, Jacob Benesty
, Israel Cohen
, Jingdong Chen
:
A Simple Theory and New Method of Differential Beamforming With Uniform Linear Microphone Arrays. 1079-1093 - Chung-Ying Ho, Kuo-Kai Shyu, Cheng-Yuan Chang
, Sen M. Kuo:
Efficient Narrowband Noise Cancellation System Using Adaptive Line Enhancer. 1094-1103 - Aditya Arie Nugraha
, Kouhei Sekiguchi
, Kazuyoshi Yoshii
:
A Flow-Based Deep Latent Variable Model for Speech Spectrogram Modeling and Enhancement. 1104-1117 - Beat Gfeller
, Christian Havnø Frank
, Dominik Roblek
, Matthew Sharifi
, Marco Tagliasacchi
, Mihajlo Velimirovic
:
SPICE: Self-Supervised Pitch Estimation. 1118-1128 - Christoph Urbanietz
, Gerald Enzner
:
Direct Spatial-Fourier Regression of HRIRs from Multi-Elevation Continuous-Azimuth Recordings. 1129-1142 - Yaakov Buchris
, Israel Cohen
, Jacob Benesty
, Alon Amar
:
Joint Sparse Concentric Array Design for Frequency and Rotationally Invariant Beampattern. 1143-1158 - Tharindu Fernando
, Sridha Sridharan
, Mitchell McLaren, Darshana Priyasad
, Simon Denman
, Clinton Fookes
:
Temporarily-Aware Context Modeling Using Generative Adversarial Networks for Speech Activity Detection. 1159-1169 - Haipeng Sun
, Rui Wang
, Kehai Chen
, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao:
Unsupervised Neural Machine Translation With Cross-Lingual Language Representation Agreement. 1170-1182 - Qiaoling Zhang
, WeiQiang Xu
, Weiwei Zhang
, Jie Feng, Zhiyong Chen
:
Multi-Hypothesis Square-Root Cubature Kalman Particle Filter for Speaker Tracking in Noisy and Reverberant Environments. 1183-1197 - Yinhe Zheng
, Guanyi Chen
, Minlie Huang
:
Out-of-Domain Detection for Natural Language Understanding in Dialog Systems. 1198-1209 - Ina Kodrasi
, Hervé Bourlard:
Spectro-Temporal Sparsity Characterization for Dysarthric Speech Detection. 1210-1222 - Bharat Padi, Anand Mohan
, Sriram Ganapathy
:
Towards Relevance and Sequence Modeling in Language Recognition. 1223-1232 - Iván López-Espejo
, Zheng-Hua Tan
, Jesper Jensen
:
Improved External Speaker-Robust Keyword Spotting for Hearing Assistive Devices. 1233-1247 - Vishnuvardhan Varanasi
, Harshit Gupta
, Rajesh M. Hegde
:
A Deep Learning Framework for Robust DOA Estimation Using Spherical Harmonic Decomposition. 1248-1259 - Sahar Hashemgeloogerdi
, Mark F. Bocko
:
Adaptive Feedback Cancellation in Hearing Aids Based on Orthonormal Basis Functions With Prediction-Error Method Based Prewhitening. 1260-1269 - Maximo Cobos
, Fabio Antonacci, Luca Comanducci
, Augusto Sarti
:
Frequency-Sliding Generalized Cross-Correlation: A Sub-Band Time Delay Estimation Approach. 1270-1281 - Yingying Zhu, Haiquan Zhao
, Xiangping Zeng, Badong Chen
:
Robust Generalized Maximum Correntropy Criterion Algorithms for Active Noise Control. 1282-1292 - Hassan Taherian
, Zhong-Qiu Wang
, Jorge Chang, DeLiang Wang
:
Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement. 1293-1302 - Cunhang Fan
, Jianhua Tao
, Bin Liu
, Jiangyan Yi
, Zhengqi Wen, Xuefei Liu:
End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features. 1303-1314 - T. Lavanya
, T. Nagarajan, P. Vijayalakshmi
:
Multi-Level Single-Channel Speech Enhancement Using a Unified Framework for Estimating Magnitude and Phase Spectra. 1315-1327 - Adrien Ycart
, Emmanouil Benetos
:
Learning and Evaluation Methodologies for Polyphonic Music Sequence Prediction With LSTMs. 1328-1341 - Takatomo Kano
, Sakriani Sakti
, Satoshi Nakamura
:
End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs. 1342-1355 - Huanyu Zuo
, Prasanga N. Samarasinghe
, Thushara D. Abhayapala
:
Intensity Based Spatial Soundfield Reproduction Using an Irregular Loudspeaker Array. 1356-1369 - Chenglin Xu
, Wei Rao
, Eng Siong Chng
, Haizhou Li
:
SpEx: Multi-Scale Time Domain Speaker Extraction Network. 1370-1384 - Wangyou Zhang
, Xuankai Chang
, Yanmin Qian
, Shinji Watanabe
:
Improving End-to-End Single-Channel Multi-Talker Speech Recognition. 1385-1394 - Alakananda Vempala
, Eduardo Blanco
:
Extracting Biographical Spatial Timelines: Corpus and Experiments. 1395-1403 - Qiquan Zhang
, Aaron Nicolson
, Mingjiang Wang
, Kuldip K. Paliwal
, Chenxu Wang:
DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation. 1404-1415 - Dhananjay Ram
, Lesly Miculicich, Hervé Bourlard:
Neural Network Based End-to-End Query by Example Spoken Term Detection. 1416-1427 - Enea Ceolini
, Ilya Kiselev, Shih-Chii Liu
:
Evaluating Multi-Channel Multi-Device Speech Separation Algorithms in the Wild: A Hardware-Software Solution. 1428-1439 - Su Zhu
, Zijian Zhao, Rao Ma
, Kai Yu
:
Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding. 1440-1451 - Haoran Miao
, Gaofeng Cheng
, Pengyuan Zhang
, Yonghong Yan
:
Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture. 1452-1465 - Liwei Lin
, Xiangdong Wang
, Hong Liu, Yueliang Qian:
Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection. 1466-1478 - Dong-Yuan Shi
, Woon-Seng Gan
, Bhan Lam
, Shulin Wen
:
Feedforward Selective Fixed-Filter Active Noise Control: Algorithm and Implementation. 1479-1492 - Zhihao Du
, Xueliang Zhang
, Jiqing Han
:
A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement. 1493-1505 - Yue Zhang
, Yile Wang
, Jie Yang
:
Lattice LSTM for Chinese Sentence Representation. 1506-1519 - Zhuo Tang
, Boyan Wan
, Li Yang
:
Word-Character Graph Convolution Network for Chinese Named Entity Recognition. 1520-1532 - Zhongxin Bai
, Xiao-Lei Zhang
, Jingdong Chen
:
Speaker Verification by Partial AUC Optimization With Mahalanobis Distance Metric Learning. 1533-1548 - Mrinmoy Bhattacharjee
, S. R. Mahadeva Prasanna, Prithwijit Guha
:
Speech/Music Classification Using Features From Spectral Peaks. 1549-1559 - Liming Wang
, Mark Hasegawa-Johnson
:
Multimodal Word Discovery and Retrieval With Spoken Descriptions and Visual Concepts. 1560-1573 - Yang Fan
, Fei Tian, Yingce Xia
, Tao Qin
, Xiang-Yang Li
, Tie-Yan Liu:
Searching Better Architectures for Neural Machine Translation. 1574-1585 - Kehai Chen
, Rui Wang
, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao, Muyun Yang
, Hai Zhao:
Towards More Diverse Input Representation for Neural Machine Translation. 1586-1597 - Yan Zhao
, DeLiang Wang
, Buye Xu
, Tao Zhang:
Monaural Speech Dereverberation Using Temporal Convolutional Networks With Self Attention. 1598-1607 - Yanhui Tu
, Jun Du
, Tian Gao
, Chin-Hui Lee
:
A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement. 1608-1619 - Christine Evers
, Heinrich W. Löllmann, Heinrich Mellmann, Alexander Schmidt
, Hendrik Barfuss
, Patrick A. Naylor
, Walter Kellermann:
The LOCATA Challenge: Acoustic Source Localization and Tracking. 1620-1643 - Hiroaki Tsushima, Eita Nakamura
, Kazuyoshi Yoshii
:
Bayesian Melody Harmonization Based on a Tree-Structured Generative Model of Chord Sequences and Melodies. 1644-1655 - Keunhyoung Luke Kim
, Jongpil Lee
, Sangeun Kum, Chae Lin Park, Juhan Nam
:
Semantic Tagging of Singing Voices in Popular Music Recordings. 1656-1668 - Liner Yang
, Cunliang Kong
, Yun Chen, Yang Liu
, Qinan Fan, Erhong Yang:
Incorporating Sememes into Chinese Definition Modeling. 1669-1677 - Ryo Nishikimi
, Eita Nakamura
, Masataka Goto
, Katsutoshi Itoyama
, Kazuyoshi Yoshii
:
Bayesian Singing Transcription Based on a Hierarchical Generative Model of Keys, Musical Notes, and F0 Trajectories. 1678-1691 - Byeongho Jo
, Franz Zotter
, Jung-Woo Choi
:
Extended Vector-Based EB-ESPRIT Method. 1692-1705 - Andros Tjandra
, Sakriani Sakti
, Satoshi Nakamura
:
Corrections to "Machine Speech Chain". 1706 - Zaixiang Zheng
, Shujian Huang, Rongxiang Weng, Xin-Yu Dai, Jiajun Chen:
Improving Self-Attention Networks With Sequential Relations. 1707-1716 - Parvaneh Janbakhshi
, Ina Kodrasi
, Hervé Bourlard:
Automatic Pathological Speech Intelligibility Assessment Exploiting Subspace-Based Analyses. 1717-1728 - Cagdas Tuna
, Antonio Canclini
, Federico Borra
, Philipp Götz, Fabio Antonacci, Andreas Walther, Augusto Sarti
, Emanuël A. P. Habets
:
3D Room Geometry Inference Using a Linear Loudspeaker Array and a Single Microphone. 1729-1744 - Xianjun Xia
, Roberto Togneri
, Ferdous Sohel
, Yuanjun Zhao
, Defeng David Huang
:
Sound Event Detection Using Multiple Optimized Kernels. 1745-1754 - Federico Borra
, Alberto Bernardini
, Fabio Antonacci, Augusto Sarti
:
Efficient Implementations of First-Order Steerable Differential Microphone Arrays With Arbitrary Planar Geometry. 1755-1766 - Moti Lugasi
, Boaz Rafaely
:
Speech Enhancement Using Masking for Binaural Reproduction of Ambisonics Signals. 1767-1777 - Zhong-Qiu Wang
, Peidong Wang
, DeLiang Wang
:
Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR. 1778-1787 - Mostafa Sadeghi
, Simon Leglaive
, Xavier Alameda-Pineda
, Laurent Girin
, Radu Horaud
:
Audio-Visual Speech Enhancement Using Conditional Variational Auto-Encoders. 1788-1800 - Kai Song
, Xiaoqing Zhou, Heng Yu, Zhongqiang Huang, Yue Zhang
, Weihua Luo
, Xiangyu Duan
, Min Zhang:
Towards Better Word Alignment in Transformer. 1801-1812 - Lian Huang
, Chi-Man Pun
:
Audio Replay Spoof Attack Detection by Joint Segment-Based Linear Filter Bank Feature Extraction and Attention-Enhanced DenseNet-BiLSTM Network. 1813-1825 - Yang Xiang
, Changchun Bao
:
A Parallel-Data-Free Speech Enhancement Method Using Multi-Objective Learning Cycle-Consistent Generative Adversarial Network. 1826-1838 - Hao Fei, Donghong Ji, Yue Zhang
, Yafeng Ren
:
Topic-Enhanced Capsule Network for Multi-Label Emotion Classification. 1839-1848 - Hirokazu Kameoka
, Kou Tanaka, Damian Kwasny
, Takuhiro Kaneko, Nobukatsu Hojo:
ConvS2S-VC: Fully Convolutional Sequence-to-Sequence Voice Conversion. 1849-1863 - Huayang Li
, Guoping Huang
, Deng Cai
, Lemao Liu
:
Neural Machine Translation With Noisy Lexical Constraints. 1864-1874 - Chien-Yao Wang
, Tzu-Chiang Tai, Jia-Ching Wang
, Andri Santoso
, Seksan Mathulaprangsan, Chin-Chin Chiang, Chung-Hsien Wu
:
Sound Events Recognition and Retrieval Using Multi-Convolutional-Channel Sparse Coding Convolutional Neural Networks. 1875-1887 - Chang-Le Liu, Sze-Wei Fu, You-Jin Li, Jen-Wei Huang
, Hsin-Min Wang
, Yu Tsao
:
Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks. 1888-1900 - Dhananjaya N. Gowda, Sudarsana Reddy Kadiri
, Brad H. Story, Paavo Alku
:
Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals. 1901-1914 - Sebastian J. Schlecht
, Emanuël A. P. Habets
:
Scattering in Feedback Delay Networks. 1915-1924 - Irene Martín-Morató
, Maximo Cobos
, Francesc J. Ferri
:
Adaptive Distance-Based Pooling in Convolutional Neural Networks for Audio Event Classification. 1925-1935 - Su Zhu
, Ruisheng Cao
, Kai Yu
:
Dual Learning for Semi-Supervised Natural Language Understanding. 1936-1947 - Yuki Kubo
, Norihiro Takamune, Daichi Kitamura
, Hiroshi Saruwatari
:
Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution. 1948-1963 - Vinayak Abrol
, Pulkit Sharma
:
Learning Hierarchy Aware Embedding From Raw Audio for Acoustic Scene Classification. 1964-1973 - Daniele Mirabilii
, Emanuël A. P. Habets
:
Spatial Coherence-Aware Multi-Channel Wind Noise Reduction. 1974-1987 - Yougen Yuan
, Lei Xie
, Cheung-Chi Leung, Hongjie Chen, Bin Ma:
Fast Query-by-Example Speech Search Using Attention-Based Deep Binary Embeddings. 1988-2000 - Daniele Salvati
, Carlo Drioli
, Gian Luca Foresti
:
Diagonal Unloading Beamforming in the Spherical Harmonic Domain for Acoustic Source Localization in Reverberant Environments. 2001-2012 - Youzhi Tu
, Man-Wai Mak
, Jen-Tzung Chien
:
Variational Domain Adversarial Learning With Mutual Information Maximization for Speaker Verification. 2013-2024 - Thomas Sgouros
, Nikolaos Mitianoudis
:
A novel Directional Framework for Source Counting and Source Separation in Instantaneous Underdetermined Audio Mixtures. 2025-2035 - Kenta Niwa
, Hironobu Chiba, Noboru Harada
, Guoqiang Zhang
, W. Bastiaan Kleijn
:
Microphone Array Wiener Post Filtering Using Monotone Operator Splitting. 2036-2046 - Hui Luo
, Jiqing Han
:
Nonnegative Matrix Factorization Based Transfer Subspace Learning for Cross-Corpus Speech Emotion Recognition. 2047-2060 - Ming-Hsiang Su
, Chung-Hsien Wu
, Hao-Tse Cheng:
A Two-Stage Transformer-Based Approach for Variable-Length Abstractive Summarization. 2061-2072 - Boqing Zhu
, Kele Xu
, Qiuqiang Kong
, Huaimin Wang, Yuxing Peng:
Audio Tagging by Cross Filtering Noisy Labels. 2073-2083 - Sangeeta Bagha, Debi Prasad Das
, Santosh Kumar Behera
:
An Efficient Narrowband Active Noise Control System for Accommodating Frequency Mismatch. 2084-2094 - Weiwei Zhang
, Zhe Chen
, Fuliang Yin
:
Multi-Pitch Estimation of Polyphonic Music Based on Pseudo Two-Dimensional Spectrum. 2095-2108 - Yuzhou Liu
, DeLiang Wang
:
Causal Deep CASA for Monaural Talker-Independent Speaker Separation. 2109-2118 - Huanyu Zuo
, Thushara D. Abhayapala
, Prasanga N. Samarasinghe
:
Particle Velocity Assisted Three Dimensional Sound Field Reproduction Using a Modal-Domain Approach. 2119-2133 - Shun Kiyono
, Jun Suzuki
, Tomoya Mizumoto, Kentaro Inui:
Massive Exploration of Pseudo Data for Grammatical Error Correction. 2134-2145 - Bin Wang
, C.-C. Jay Kuo
:
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-Based Word Models. 2146-2157 - Guillaume Carbajal
, Romain Serizel, Emmanuel Vincent
, Eric Humbert:
Joint NN-Supported Multichannel Reduction of Acoustic Echo, Reverberation and Noise. 2158-2173 - Qi Liu
, Zhehuai Chen
, Hao Li, Mingkun Huang, Yizhou Lu, Kai Yu
:
Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model. 2174-2183 - Hanan Beit-On
, Boaz Rafaely
:
Focusing and Frequency Smoothing for Arbitrary Arrays With Application to Speaker Localization. 2184-2193 - Christoph Pörschmann
, Johannes M. Arend
, Fabian Brinkmann
:
Correction to "Directional Equalization of Sparse Head-Related Transfer Function Sets for Spatial Upsampling". 2194 - Tomi Kinnunen
, Héctor Delgado, Nicholas W. D. Evans, Kong Aik Lee
, Ville Vestman
, Andreas Nautsch
, Massimiliano Todisco, Xin Wang
, Md. Sahidullah
, Junichi Yamagishi
, Douglas A. Reynolds:
Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals. 2195-2210 - Sheng-Hua Zhong
, Peiqi Liu
, Zhong Ming
, Yan Liu:
How to Evaluate Single-Round Dialogues Like Humans: An Information-Oriented Metric. 2211-2223 - Wilmer Lobato
, Márcio Holsbach Costa
:
Worst-Case-Optimization Robust-MVDR Beamformer for Stereo Noise Reduction in Hearing Aids. 2224-2237 - Luca Comanducci
, Federico Borra
, Paolo Bestagini
, Fabio Antonacci, Stefano Tubaro
, Augusto Sarti
:
Source Localization Using Distributed Microphones in Reverberant Environments Based on Deep Learning and Ray Space Transform. 2238-2251 - Feiran Yang
, Jianfeng Guo, Jun Yang
:
Stochastic Analysis of the Filtered-x LMS Algorithm for Active Noise Control. 2252-2266 - Tomohiro Nakatani
, Christoph Böddeker, Keisuke Kinoshita
, Rintaro Ikeshita
, Marc Delcroix
, Reinhold Haeb-Umbach
:
Jointly Optimal Denoising, Dereverberation, and Source Separation. 2267-2282 - Haytham M. Fayek
, Justin Johnson:
Temporal Reasoning via Audio Question Answering. 2283-2294 - Amulya Gupta, Zhu (Drew) Zhang
:
Swings and Roundabouts: Attention-Structure Interaction Effect in Deep Semantic Matching. 2295-2307 - Chang Huai You
, Jichen Yang
:
Device Feature Extraction Based on Parallel Neural Network Training for Replay Spoofing Detection. 2308-2318 - Santosh Kesiraju
, Oldrich Plchot, Lukás Burget
, Suryakanth V. Gangashetty:
Learning Document Embeddings Along With Their Uncertainties. 2319-2332 - Mirco Pezzoli
, Federico Borra
, Fabio Antonacci
, Stefano Tubaro
, Augusto Sarti
:
A Parametric Approach to Virtual Miking for Sources of Arbitrary Directivity. 2333-2348 - Shengbei Wang
, Weitao Yuan
, Masashi Unoki
:
Multi-Subspace Echo Hiding Based on Time-Frequency Similarities of Audio Signals. 2349-2363 - Yujia Qin
, Fanchao Qi
, Sicong Ouyang, Zhiyuan Liu
, Cheng Yang
, Yasheng Wang, Qun Liu, Maosong Sun:
Improving Sequence Modeling Ability of Recurrent Neural Networks via Sememes. 2364-2373 - Yuchen Dong, Jie Chen
, Wen Zhang
:
Distributed Wave-Domain Active Noise Control Based on the Diffusion Adaptation. 2374-2385 - Fatemeh Pishdadian, Gordon Wichern, Jonathan Le Roux:
Finding Strength in Weakness: Learning to Separate Sounds With Weak Supervision. 2386-2399 - Zhi Chen
, Lu Chen, Xiaoyuan Liu, Kai Yu
:
Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management. 2400-2411 - Taewoong Lee
, Jesper Kjær Nielsen
, Mads Græsbøll Christensen
:
Signal-Adaptive and Perceptually Optimized Sound Zones With Variable Span Trade-Off Filters. 2412-2426 - Hao Fei
, Meishan Zhang
, Fei Li, Donghong Ji:
Cross-Lingual Semantic Role Labeling With Model Transfer. 2427-2437 - Kai Yu
, Rao Ma
, Kaiyu Shi, Qi Liu
:
Neural Network Language Model Compression With Product Quantization and Soft Binarization. 2438-2449 - Qiuqiang Kong
, Yong Xu, Wenwu Wang
, Mark D. Plumbley
:
Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization. 2450-2460 - Adrian Herzog
, Emanuël A. P. Habets
:
Direction and Reverberation Preserving Noise Reduction of Ambisonics Signals. 2461-2475 - Yu Wang
, Yun Li, Ziye Zhu, Hanghang Tong
, Yue Huang:
Adversarial Learning for Multi-Task Sequence Labeling With Attention Mechanism. 2476-2488 - Ashutosh Pandey
, DeLiang Wang
:
On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement. 2489-2499 - Mantong Zhou
, Minlie Huang
, Xiaoyan Zhu:
Robust Reading Comprehension With Linguistic Constraints via Posterior Regularization. 2500-2510 - Michael Saxon
, Ayush Tripathi
, Yishan Jiao, Julie M. Liss, Visar Berisha:
Robust Estimation of Hypernasality in Dysarthria With Acoustic Model Likelihood Features. 2511-2522 - Lin Wang
, Andrea Cavallaro
:
A Blind Source Separation Framework for Ego-Noise Reduction on Multi-Rotor Drones. 2523-2537 - Bowen Zhang
, Xutao Li
, Xiaofei Xu, Ka-Cheong Leung
, Zhiyao Chen, Yunming Ye
:
Knowledge Guided Capsule Attention Network for Aspect-Based Sentiment Analysis. 2538-2551 - Qi Qi
, Xiaolu Wang, Haifeng Sun
, Jingyu Wang
, Xiao Liang, Jianxin Liao:
A Novel Multi-Task Learning Framework for Semi-Supervised Semantic Parsing. 2552-2560 - Haisong Ding
, Kai Chen
, Qiang Huo:
Improving Knowledge Distillation of CTC-Trained Acoustic Models With Alignment-Consistent Ensemble and Target Delay. 2561-2571 - Ayana
, Yun Chen, Cheng Yang
, Zhiyuan Liu
, Maosong Sun:
Reinforced Zero-Shot Cross-Lingual Neural Headline Generation. 2572-2584 - Mingming Yang
, Rui Wang
, Kehai Chen
, Xing Wang, Tiejun Zhao, Min Zhang:
A Novel Sentence-Level Agreement Architecture for Neural Machine Translation. 2585-2597 - Shuai Wang
, Yexin Yang
, Zhanghao Wu
, Yanmin Qian
, Kai Yu
:
Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition. 2598-2609 - Kouhei Sekiguchi
, Yoshiaki Bando
, Aditya Arie Nugraha
, Kazuyoshi Yoshii
, Tatsuya Kawahara
:
Fast Multichannel Nonnegative Matrix Factorization With Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation. 2610-2625 - Thi Ngoc Tho Nguyen
, Woon-Seng Gan
, Rishabh Ranjan, Douglas L. Jones:
Robust Source Counting and DOA Estimation Using Spatial Pseudo-Spectrum and Convolutional Neural Network. 2626-2637 - Ondrej Cífka
, Umut Simsekli, Gaël Richard
:
Groove2Groove: One-Shot Music Style Transfer With Supervision From Synthetic Data. 2638-2650 - Judy Najnudel
, Thomas Hélie
, David Roze, Henri Boutin
:
Simulation of an Ondes Martenot Circuit. 2651-2660 - R. Jyothi
, Prabhu Babu:
SOLVIT: A Reference-Free Source Localization Technique Using Majorization Minimization. 2661-2673 - Peng Shen
, Xugang Lu, Sheng Li
, Hisashi Kawai:
Knowledge Distillation-Based Representation Learning for Short-Utterance Spoken Language Identification. 2674-2683 - Vicent Molés-Cases
, Gema Piñero
, Maria de Diego
, Alberto González
:
Personal Sound Zones by Subband Filtering and Time Domain Optimization. 2684-2696 - Srinivas Parthasarathy
, Carlos Busso
:
Semi-Supervised Speech Emotion Recognition With Ladder Networks. 2697-2709 - Artuur Leeuwenberg
, Marie-Francine Moens
:
Towards Extracting Absolute Event Timelines From English Clinical Reports. 2710-2719 - Lin Sun
, Yuxuan Sun
, Fule Ji, Chi Wang
:
Joint Learning of Token Context and Span Feature for Span-Based Nested NER. 2720-2730 - Jamal Amini
, Richard Christian Hendriks
, Richard Heusdens
, Meng Guo
, Jesper Jensen
:
Spatially Correct Rate-Constrained Noise Reduction for Binaural Hearing Aids in Wireless Acoustic Sensor Networks. 2731-2742 - Zuchao Li
, Chaoyu Guan, Hai Zhao, Rui Wang
, Kevin Parnow, Zhuosheng Zhang
:
Memory Network for Linguistic Structure Parsing. 2743-2755 - Cheng Yu, Ryandhimas E. Zezario
, Syu-Siang Wang
, Jonathan Sherman, Yi-Yen Hsieh
, Xugang Lu, Hsin-Min Wang
, Yu Tsao
:
Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders. 2756-2769 - Xin Liu
, Qingcai Chen
, Xiangping Wu
, Yang Hua, Jing Chen, Dongfang Li, Buzhou Tang, Xiaolong Wang:
Gated Semantic Difference Based Sentence Semantic Equivalence Identification. 2770-2780 - Gilles Boulianne
:
A Study of Inductive Biases for Unsupervised Speech Representation Learning. 2781-2795 - Yu-Te Wu, Berlin Chen, Li Su
:
Multi-Instrument Automatic Music Transcription With Self-Attention-Based Instance Segmentation. 2796-2809 - Weiwei Lin
, Man-Wai Mak, Na Li, Dan Su, Dong Yu:
A Framework for Adapting DNN Speaker Embedding Across Languages. 2810-2822 - Purvi Agrawal
, Sriram Ganapathy
:
Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting. 2823-2836 - Xingwei Sun
, Ze-Feng Gao
, Zhong-Yi Lu, Junfeng Li, Yonghong Yan:
A Model Compression Method With Matrix Product Operators for Speech Enhancement. 2837-2847 - Koby Weisberg, Bracha Laufer-Goldshtein
, Sharon Gannot
:
Simultaneous Tracking and Separation of Multiple Sources Using Factor Graph Model. 2848-2864 - Chao Pan
, Jingdong Chen
, Guangming Shi
:
On Estimation of Time-Varying Variances of Source and Noise for Sensor Array Processing. 2865-2879 - Qiuqiang Kong
, Yin Cao, Turab Iqbal
, Yuxuan Wang, Wenwu Wang
, Mark D. Plumbley
:
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition. 2880-2894 - Shuyang Zhao
, Toni Heittola
, Tuomas Virtanen
:
Active Learning for Sound Event Detection. 2895-2905 - Ondrej Mokrý
, Pavel Rajmic
:
Audio Inpainting: Revisited and Reweighted. 2906-2918 - Christof Weiß
, Hendrik Schreiber
, Meinard Müller
:
Local Key Estimation in Music Recordings: A Case Study Across Songs, Versions, and Annotators. 2919-2932 - Leilei Gan
, Yue Zhang
:
Investigating Self-Attention Network for Chinese Word Segmentation. 2933-2941 - Nico Gößling
, Elior Hadad
, Sharon Gannot
, Simon Doclo
:
Binaural LCMV Beamforming With Partial Noise Estimation. 2942-2955 - Yiming Wu
, Tristan Carsault, Eita Nakamura
, Kazuyoshi Yoshii
:
Semi-Supervised Neural Chord Estimation Based on a Variational Autoencoder With Latent Chord Labels and Features. 2956-2966 - Hieu-Thi Luong
, Junichi Yamagishi
:
NAUTILUS: A Versatile Voice Cloning System. 2967-2981 - Hirokazu Kameoka
, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo:
Nonparallel Voice Conversion With Augmented Classifier Star Generative Adversarial Networks. 2982-2995 - Pierre Lecomte
, Manuel Melon
, Laurent Simon
:
Spherical Fraction Beamforming. 2996-3009 - Constantinos Papayiannis
, Christine Evers
, Patrick A. Naylor
:
End-to-End Classification of Reverberant Rooms Using DNNs. 3010-3017 - Bhusan Chettri
, Emmanouil Benetos
, Bob L. T. Sturm
:
Dataset Artefacts in Anti-Spoofing Systems: A Case Study on the ASVspoof 2017 Benchmark. 3018-3028 - Alexios Gidiotis
, Grigorios Tsoumakas
:
A Divide-and-Conquer Approach to the Summarization of Long Documents. 3029-3040 - Huiyuan Sun
, Thushara D. Abhayapala
, Prasanga N. Samarasinghe
:
A Realistic Multiple Circular Array System for Active Noise Control Over 3D Space. 3041-3052 - Sasan Asadiabadi
, Engin Erzin
:
Vocal Tract Contour Tracking in rtMRI Using Deep Temporal Regression Network. 3053-3064 - Hung-Shin Lee
, Yu Tsao
, Shyh-Kang Jeng
, Hsin-Min Wang
:
Subspace-Based Representation and Learning for Phonotactic Spoken Language Recognition. 3065-3079 - Juan M. Martín-Doñas
, Jesper Jensen
, Zheng-Hua Tan
, Angel M. Gomez
, Antonio M. Peinado
:
Online Multichannel Speech Enhancement Based on Recursive EM and DNN-Based Speech Presence Estimation. 3080-3094 - Rui Wang
, Zhe Chen
, Fuliang Yin
:
Active Sampling Rate Calibration Method for Acoustic Sensor Networks. 3095-3107 - Yonggang Hu
, Prasanga N. Samarasinghe
, Sharon Gannot
, Thushara D. Abhayapala
:
Semi-Supervised Multiple Source Localization Using Relative Harmonic Coefficients Under Noisy and Reverberant Environments. 3108-3123 - Ashwin Bellur, Mounya Elhilali
:
Audio Object Classification Using Distributed Beliefs and Attention. 729-739

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.