


Остановите войну!
for scientists:


default search action
22nd Interspeech 2021: Brno, Czechia
- Hynek Hermansky, Honza Cernocký, Lukás Burget, Lori Lamel, Odette Scharenborg, Petr Motlícek:
Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021. ISCA 2021
Speech Synthesis: Other Topics
- Michael Pucher, Thomas Woltron:
Conversion of Airborne to Bone-Conducted Speech with Deep Neural Networks. 1-5 - Markéta Rezácková, Jan Svec
, Daniel Tihelka:
T5G2P: Using Text-to-Text Transfer Transformer for Grapheme-to-Phoneme Conversion. 6-10 - Olivier Perrotin, Hussein El Amouri, Gérard Bailly, Thomas Hueber:
Evaluating the Extrapolation Capabilities of Neural Vocoders to Extreme Pitch Values. 11-15 - Phat Do
, Matt Coler
, Jelske Dijkstra, Esther Klabbers:
A Systematic Review and Analysis of Multilingual Data Strategies in Text-to-Speech for Low-Resource Languages. 16-20
Disordered Speech
- Tanya Talkar, Nancy Pearl Solomon, Douglas S. Brungart, Stefanie E. Kuchinsky, Megan M. Eitel, Sara M. Lippa, Tracey A. Brickell, Louis M. French
, Rael T. Lange, Thomas F. Quatieri:
Acoustic Indicators of Speech Motor Coordination in Adults With and Without Traumatic Brain Injury. 21-25 - Juan Camilo Vásquez-Correa, Julian Fritsch, Juan Rafael Orozco-Arroyave, Elmar Nöth, Mathew Magimai-Doss:
On Modeling Glottal Source Information for Phonation Assessment in Parkinson's Disease. 26-30 - Khalid Daoudi, Biswajit Das, Solange Milhé de Saint Victor, Alexandra Foubert-Samier, Anne Pavy-Le Traon, Olivier Rascol, Wassilios G. Meissner
, Virginie Woisard:
Distortion of Voiced Obstruents for Differential Diagnosis Between Parkinson's Disease and Multiple System Atrophy. 31-35 - Pu Wang
, Bagher BabaAli, Hugo Van hamme:
A Study into Pre-Training Strategies for Spoken Language Understanding on Dysarthric Speech. 36-40 - Rosanna Turrisi, Arianna Braccia
, Marco Emanuele, Simone Giulietti, Maura Pugliatti
, Mariachiara Sensi, Luciano Fadiga
, Leonardo Badino:
EasyCall Corpus: A Dysarthric Speech Dataset. 41-45
Speech Signal Analysis and Representation II
- Xiaoyu Bie
, Laurent Girin, Simon Leglaive
, Thomas Hueber, Xavier Alameda-Pineda:
A Benchmark of Dynamical Variational Autoencoders Applied to Speech Spectrogram Modeling. 46-50 - Metehan Yurt, Pavan Kantharaju, Sascha Disch, Andreas Niedermeier, Alberto N. Escalante-B., Veniamin I. Morgenshtern:
Fricative Phoneme Detection Using Deep Neural Networks and its Comparison to Traditional Methods. 51-55 - RaviShankar Prasad, Mathew Magimai-Doss:
Identification of F1 and F2 in Speech Using Modified Zero Frequency Filtering. 56-60 - Yann Teytaut, Axel Roebel:
Phoneme-to-Audio Alignment with Recurrent Neural Networks for Speaking and Singing Voice. 61-65
Feature, Embedding and Neural Architecture for Speaker Recognition
- Seong-Hu Kim, Yong-Hwa Park:
Adaptive Convolutional Neural Network for Text-Independent Speaker Recognition. 66-70 - Jiajun Qi, Wu Guo, Bin Gu:
Bidirectional Multiscale Feature Aggregation for Speaker Verification. 71-75 - Yu-Jia Zhang, Yih-Wen Wang, Chia-Ping Chen, Chung-Li Lu, Bo-Cheng Chan:
Improving Time Delay Neural Network Based Speaker Recognition with Convolutional Block and Feature Aggregation Methods. 76-80 - Yanfeng Wu, Junan Zhao, Chenkai Guo, Jing Xu:
Improving Deep CNN Architectures with Variable-Length Training Samples for Text-Independent Speaker Verification. 81-85 - Tinglong Zhu, Xiaoyi Qin, Ming Li:
Binary Neural Network for Speaker Verification. 86-90 - Youzhi Tu, Man-Wai Mak:
Mutual Information Enhanced Training for Speaker Embedding. 91-95 - Ge Zhu, Fei Jiang, Zhiyao Duan:
Y-Vector: Multiscale Waveform Encoder for Speaker Embedding. 96-100 - Yan Liu, Zheng Li, Lin Li, Qingyang Hong:
Phoneme-Aware and Channel-Wise Attentive Learning for Text Dependent Speaker Verification. 101-105 - Hongning Zhu, Kong Aik Lee
, Haizhou Li:
Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding. 106-110
Speech Synthesis: Toward End-to-End Synthesis II
- Cheng Gong, Longbiao Wang, Ju Zhang, Shaotong Guo, Yuguang Wang, Jianwu Dang:
TacoLPCNet: Fast and Stable TTS by Conditioning LPCNet on Mel Spectrogram Predictions. 111-115 - Taejun Bak, Jae-Sung Bae, Hanbin Bae, Young-Ik Kim, Hoon-Young Cho:
FastPitchFormant: Source-Filter Based Decomposed Modeling for Speech Synthesis. 116-120 - Taiki Nakamura, Tomoki Koriyama
, Hiroshi Saruwatari:
Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer. 121-125 - Naoto Kakegawa, Sunao Hara, Masanobu Abe, Yusuke Ijima:
Phonetic and Prosodic Information Estimation from Texts for Genuine Japanese End-to-End Text-to-Speech. 126-130 - Xudong Dai, Cheng Gong, Longbiao Wang, Kaili Zhang:
Information Sieve: Content Leakage Reduction in End-to-End Prosody Transfer for Expressive Speech Synthesis. 131-135 - Qingyun Dou, Xixin Wu, Moquan Wan, Yiting Lu, Mark J. F. Gales:
Deliberation-Based Multi-Pass Speech Synthesis. 136-140 - Isaac Elias, Heiga Zen, Jonathan Shen, Yu Zhang, Ye Jia, R. J. Skerry-Ryan, Yonghui Wu:
Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling. 141-145 - Chunyang Wu, Zhiping Xiu, Yangyang Shi, Ozlem Kalinli, Christian Fuegen, Thilo Köhler, Qing He:
Transformer-Based Acoustic Modeling for Streaming Speech Synthesis. 146-150 - Ye Jia, Heiga Zen, Jonathan Shen, Yu Zhang, Yonghui Wu:
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS. 151-155 - Zhenhao Ge, Lakshmish Kaushik, Masanori Omote, Saket Kumar:
Speed up Training with Variable Length Inputs by Efficient Batching Strategies. 156-160
Speech Enhancement and Intelligibility
- Yuhang Sun, Linju Yang, Huifeng Zhu, Jie Hao:
Funnel Deep Complex U-Net for Phase-Aware Speech Enhancement. 161-165 - Qiquan Zhang, Qi Song, Aaron Nicolson, Tian Lan, Haizhou Li:
Temporal Convolutional Network with Frequency Dimension Adaptive Attention for Speech Enhancement. 166-170 - Changjie Pan, Feng Yang, Fei Chen:
Perceptual Contributions of Vowels and Consonant-Vowel Transitions in Understanding Time-Compressed Mandarin Sentences. 171-175 - Ritujoy Biswas
, Karan Nathwani, Vinayak Abrol:
Transfer Learning for Speech Intelligibility Improvement in Noisy Environments. 176-180 - Ayako Yamamoto, Toshio Irino, Kenichi Arai
, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani:
Comparison of Remote Experiments Using Crowdsourcing and Laboratory Experiments on Speech Intelligibility. 181-185 - Wenzhe Liu
, Andong Li, Yuxuan Ke, Chengshi Zheng, Xiaodong Li:
Know Your Enemy, Know Yourself: A Unified Two-Stage Framework for Speech Enhancement. 186-190 - Qiuqiang Kong, Haohe Liu, Xingjian Du, Li Chen, Rui Xia, Yuxuan Wang:
Speech Enhancement with Weakly Labelled Data from AudioSet. 191-195 - Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao:
Improving Perceptual Quality by Phone-Fortified Perceptual Loss Using Wasserstein Distance for Speech Enhancement. 196-200 - Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao:
MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement. 201-205 - Amin Edraki, Wai-Yip Chan, Jesper Jensen, Daniel Fogerty:
A Spectro-Temporal Glimpsing Index (STGI) for Speech Intelligibility Prediction. 206-210 - Yuanhang Qiu, Ruili Wang, Satwinder Singh, Zhizhong Ma
, Feng Hou:
Self-Supervised Learning Based Phone-Fortified Speech Enhancement. 211-215 - Khandokar Md. Nayem
, Donald S. Williamson
:
Incorporating Embedding Vectors from a Human Mean-Opinion Score Prediction Model for Monaural Speech Enhancement. 216-220 - Jianwei Zhang, Suren Jayasuriya, Visar Berisha:
Restoring Degraded Speech via a Modified Diffusion Model. 221-225
Spoken Dialogue Systems I
- Hoang Long Nguyen, Vincent Renkens, Joris Pelemans, Srividya Pranavi Potharaju, Anil Kumar Nalamalapu, Murat Akbacak:
User-Initiated Repetition-Based Recovery in Multi-Utterance Dialogue Systems. 226-230 - Nuo Chen, Chenyu You, Yuexian Zou:
Self-Supervised Dialogue Learning for Spoken Conversational Question Answering. 231-235 - Ruolin Su, Ting-Wei Wu, Biing-Hwang Juang:
Act-Aware Slot-Value Predicting in Multi-Domain Dialogue State Tracking. 236-240 - Yuya Chiba, Ryuichiro Higashinaka:
Dialogue Situation Recognition for Everyday Conversation Using Multimodal Information. 241-245 - Yoshihiro Yamazaki, Yuya Chiba, Takashi Nose, Akinori Ito:
Neural Spoken-Response Generation Using Prosodic and Linguistic Context for Conversational Systems. 246-250 - Weiyuan Xu, Peilin Zhou, Chenyu You, Yuexian Zou:
Semantic Transportation Prototypical Network for Few-Shot Intent Detection. 251-255 - Li Tang, Yuke Si, Longbiao Wang, Jianwu Dang:
Domain-Specific Multi-Agent Dialog Policy Learning in Multi-Domain Task-Oriented Scenarios. 256-260 - Haoyu Wang, John Chen, Majid Laali, Kevin Durda, Jeff King, William Campbell, Yang Liu:
Leveraging ASR N-Best in Deep Entity Retrieval. 261-265
Topics in ASR: Robustness, Feature Extraction, and Far-Field ASR
- Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Xuefei Liu, Zhengqi Wen:
End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition. 266-270 - Kathleen Siminyu, Xinjian Li, Antonios Anastasopoulos, David R. Mortensen, Michael R. Marlo, Graham Neubig:
Phoneme Recognition Through Fine Tuning of Phonetic Representations: A Case Study on Luhya Language Varieties. 271-275 - Erfan Loweimi, Zoran Cvetkovic, Peter Bell, Steve Renals:
Speech Acoustic Modelling Using Raw Source and Filter Components. 276-280 - Masakiyo Fujimoto, Hisashi Kawai:
Noise Robust Acoustic Modeling for Single-Channel Speech Recognition Based on a Stream-Wise Transformer Architecture. 281-285 - Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha:
IR-GAN: Room Impulse Response Generator for Far-Field Speech Recognition. 286-290 - Junqi Chen, Xiao-Lei Zhang:
Scaling Sparsemax Based Channel Selection for Speech Recognition with ad-hoc Microphone Arrays. 291-295 - Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo:
Multi-Channel Transformer Transducer for Speech Recognition. 296-300 - Emiru Tsunoo, Kentaro Shibata, Chaitanya Narisetty, Yosuke Kashiwagi, Shinji Watanabe:
Data Augmentation Methods for End-to-End Speech Recognition on Distant-Talk Scenarios. 301-305 - Guodong Ma, Pengfei Hu, Jian Kang, Shen Huang, Hao Huang:
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition. 306-310 - Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Paden Tomasello, Jacob Kahn, Gilad Avidov, Ronan Collobert, Gabriel Synnaeve:
Rethinking Evaluation in ASR: Are Our Models Robust Enough? 311-315 - Max W. Y. Lam, Jun Wang, Chao Weng, Dan Su, Dong Yu:
Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition. 316-320
Voice Activity Detection and Keyword Spotting
- Yuanbo Hou, Zhesong Yu, Xia Liang, Xingjian Du, Bilei Zhu, Zejun Ma, Dick Botteldooren:
Attention-Based Cross-Modal Fusion for Audio-Visual Voice Activity Detection in Musical Video Streams. 321-325 - Ui-Hyun Kim:
Noise-Tolerant Self-Supervised Learning for Audio-Visual Voice Activity Detection. 326-330 - Hyun-Jin Park, Pai Zhu, Ignacio Lopez-Moreno, Niranjan Subrahmanya:
Noisy Student-Teacher Training for Robust Keyword Spotting. 331-335 - Osamu Ichikawa, Kaito Nakano, Takahiro Nakayama, Hajime Shirouzu:
Multi-Channel VAD for Transcription of Group Discussion. 336-340 - Hengshun Zhou, Jun Du, Hang Chen, Zijun Jing, Shifu Xiong, Chin-Hui Lee:
Audio-Visual Information Fusion Using Cross-Modal Teacher-Student Learning for Voice Activity Detection in Realistic Environments. 341-345 - Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura:
Enrollment-Less Training for Personalized Voice Activity Detection. 346-350 - Yuto Nonaka, Chee Siang Leow, Akio Kobayashi, Takehito Utsuro, Hiromitsu Nishizaki:
Voice Activity Detection for Live Speech of Baseball Game Based on Tandem Connection with Speech/Noise Separation Model. 351-355 - Young D. Kwon, Jagmohan Chauhan, Cecilia Mascolo:
FastICARL: Fast Incremental Classifier and Representation Learning with Efficient Budget Allocation in Audio Sensing Applications. 356-360 - Bo Wei, Meirong Yang, Tao Zhang, Xiao Tang, Xing Huang, Kyuhong Kim, Jaeyun Lee
, Kiho Cho, Sung-Un Park:
End-to-End Transformer-Based Open-Vocabulary Keyword Spotting with Location-Guided Local Attention. 361-365 - Saurabhchand Bhati, Jesús Villalba
, Piotr Zelasko, Laureano Moro-Velázquez, Najim Dehak:
Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation. 366-370 - Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu:
A Lightweight Framework for Online Voice Activity Detection in the Wild. 371-375
Voice and Voicing
- Aurélie Chlébowski, Nicolas Ballier:
"See what I mean, huh?" Evaluating Visual Inspection of F0 Tracking in Nasal Grunts. 376-380 - Bruce Xiao Wang
, Vincent Hughes:
System Performance as a Function of Calibration Methods, Sample Size and Sampling Variability in Likelihood Ratio-Based Forensic Voice Comparison. 381-385 - Anne Bonneau:
Voicing Assimilations by French Speakers of German in Stop-Fricative Sequences. 386-390 - Titas Chakraborty, Vaishali Patil, Preeti Rao:
The Four-Way Classification of Stops with Voicing and Aspiration for Non-Native Speech Evaluation. 391-395 - Saba Urooj, Benazir Mumtaz, Sarmad Hussain, Ehsan ul Haq:
Acoustic and Prosodic Correlates of Emotions in Urdu Speech. 396-400 - Nour Tamim, Silke Hamann:
Voicing Contrasts in the Singleton Stops of Palestinian Arabic: Production and Perception. 401-405 - Thomas Coy, Vincent Hughes, Philip Harrison, Amelia Jane Gully:
A Comparison of the Accuracy of Dissen and Keshet's (2016) DeepFormants and Traditional LPC Methods for Semi-Automatic Speaker Recognition. 406-410 - Michael Jessen:
MAP Adaptation Characteristics in Forensic Long-Term Formant Analysis. 411-415 - Justin J. H. Lo:
Cross-Linguistic Speaker Individuality of Long-Term Formant Distributions: Phonetic and Forensic Perspectives. 416-420 - Rachel Soo, Khia A. Johnson, Molly Babel:
Sound Change in Spontaneous Bilingual Speech: A Corpus Study on the Cantonese n-l Merger in Cantonese-English Bilinguals. 421-425 - Wendy Lalhminghlui, Priyankoo Sarmah:
Characterizing Voiced and Voiceless Nasals in Mizo. 426-430
The INTERSPEECH 2021 Computational Paralinguistics Challenge (ComParE) - COVID-19 Cough, COVID-19 Speech, Escalation & Primates
- Björn W. Schuller, Anton Batliner, Christian Bergler, Cecilia Mascolo, Jing Han, Iulia Lefter, Heysem Kaya
, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Maurice Gerczuk, Panagiotis Tzirakis, Chloë Brown
, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Léon J. M. Rothkrantz, Joeri A. Zwerts
, Jelle Treep
, Casper S. Kaandorp:
The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates. 431-435 - Rubén Solera-Ureña
, Catarina Botelho, Francisco Teixeira
, Thomas Rolland, Alberto Abad
, Isabel Trancoso:
Transfer Learning-Based Cough Representations for Automatic Detection of COVID-19. 436-440 - Philipp Klumpp, Tobias Bocklet, Tomas Arias-Vergara, Juan Camilo Vásquez-Correa, Paula Andrea Pérez-Toro, Sebastian P. Bayerl, Juan Rafael Orozco-Arroyave, Elmar Nöth:
The Phonetic Footprint of Covid-19? 441-445 - Edresson Casanova, Arnaldo Candido Jr., Ricardo Corso Fernandes Junior, Marcelo Finger, Lucas Rafael Stefanel Gris, Moacir Antonelli Ponti, Daniel Peixoto Pinto da Silva:
Transfer Learning and Data Augmentation Techniques to the COVID-19 Identification Tasks in ComParE 2021. 446-450 - Steffen Illium, Robert Müller, Andreas Sedlmeier, Claudia Linnhoff-Popien:
Visual Transformers for Primates Classification and Covid Detection. 451-455 - Thomas Pellegrini:
Deep-Learning-Based Central African Primate Species Classification with MixUp and SpecAugment. 456-460 - Robert Müller, Steffen Illium, Claudia Linnhoff-Popien:
A Deep and Recurrent Architecture for Primate Vocalization Classification. 461-465 - Joeri A. Zwerts
, Jelle Treep
, Casper S. Kaandorp, Floor Meewis, Amparo C. Koot, Heysem Kaya
:
Introducing a Central African Primate Vocalisation Dataset for Automated Species Classification. 466-470 - Georgios Rizos, Jenna Lawson, Zhuoda Han, Duncan Butler, James Rosindell, Krystian Mikolajczyk, Cristina Banks-Leite, Björn W. Schuller:
Multi-Attentive Detection of the Spider Monkey Whinny in the (Actual) Wild. 471-475 - José Vicente Egas López, Mercedes Vetráb
, László Tóth, Gábor Gosztolya:
Identifying Conflict Escalation and Primates by Using Ensemble X-Vectors and Fisher Vector Features. 476-480 - Oxana Verkholyak, Denis Dresvyanskiy, Anastasia Dvoynikova, Denis Kotov, Elena Ryumina
, Alena Velichko, Danila Mamontov, Wolfgang Minker, Alexey Karpov:
Ensemble-Within-Ensemble Classification for Escalation Prediction from Speech. 481-485 - Dominik Schiller, Silvan Mertes, Pol van Rijn, Elisabeth André:
Analysis by Synthesis: Using an Expressive TTS Model as Feature Extractor for Paralinguistic Speech Classification. 486-490
Survey Talk 1: Heidi Christensen
- Heidi Christensen:
Towards Automatic Speech Recognition for People with Atypical Speech.
Embedding and Network Architecture for Speaker Recognition
- Chau Luu, Peter Bell, Steve Renals:
Leveraging Speaker Attribute Information Using Multi Task Learning for Speaker Verification and Diarization. 491-495 - Magdalena Rybicka
, Jesús Villalba
, Piotr Zelasko, Najim Dehak
, Konrad Kowalczyk
:
Spine2Net: SpineNet with Res2Net and Time-Squeeze-and-Excitation Blocks for Speaker Recognition. 496-500 - Themos Stafylakis, Johan Rohdin, Lukás Burget:
Speaker Embeddings by Modeling Channel-Wise Correlations. 501-505 - Weipeng He, Petr Motlícek, Jean-Marc Odobez:
Multi-Task Neural Network for Robust Multiple Speaker Embedding Extraction. 506-510 - Junyi Peng, Xiaoyang Qu, Jianzong Wang
, Rongzhi Gu, Jing Xiao, Lukás Burget, Jan Cernocký:
ICSpk: Interpretable Complex Speaker Embedding Extractor from Raw Waveform. 511-515
Speech Perception I
- Xiao Xiao, Nicolas Audibert, Grégoire Locqueville, Christophe d'Alessandro, Barbara Kuhnert, Claire Pillot-Loiseau:
Prosodic Disambiguation Using Chironomic Stylization of Intonation with Native and Non-Native Speakers. 516-520 - Aleese Block, Michelle Cohn
, Georgia Zellou:
Variation in Perceptual Sensitivity and Compensation for Coarticulation Across Adult and Child Naturally-Produced and TTS Voices. 521-525 - Mohammad Jalilpour-Monesi
, Bernd Accou
, Tom Francart, Hugo Van hamme:
Extracting Different Levels of Speech Information from EEG Using an LSTM-Based Model. 526-530 - Louis ten Bosch, Lou Boves:
Word Competition: An Entropy-Based Approach in the DIANA Model of Human Word Comprehension. 531-535 - Louis ten Bosch, Lou Boves:
Time-to-Event Models for Analyzing Reaction Time Sequences. 536-540 - Sophie Brand, Kimberley Mulder, Louis ten Bosch, Lou Boves:
Models of Reaction Times in Auditory Lexical Decision: RTonset versus RToffset. 541-545
Acoustic Event Detection and Acoustic Scene Classification
- Gwantae Kim, David K. Han, Hanseok Ko:
SpecMix : A Mixed Sample Data Augmentation Method for Training with Time-Frequency Domain Features. 546-550 - Helin Wang, Yuexian Zou, Wenwu Wang:
SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification. 551-555 - Xu Zheng, Yan Song, Li-Rong Dai, Ian McLoughlin, Lin Liu:
An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection. 556-560 - Ritika Nandi, Shashank Shekhar, Manjunath Mulimani:
Acoustic Scene Classification Using Kervolution-Based SubSpectralNet. 561-565 - Harshavardhan Sundar, Ming Sun, Chao Wang:
Event Specific Attention for Polyphonic Sound Event Detection. 566-570 - Yuan Gong
, Yu-An Chung, James R. Glass:
AST: Audio Spectrogram Transformer. 571-575 - Soonshin Seo, Donghyun Lee, Ji-Hwan Kim:
Shallow Convolution-Augmented Transformer with Differentiable Neural Computer for Low-Complexity Classification of Variable-Length Acoustic Scene. 576-580 - Helen L. Bear, Veronica Morfi, Emmanouil Benetos
:
An Evaluation of Data Augmentation Methods for Sound Scene Geotagging. 581-585