


default search action
17th Interspeech 2016: San Francisco, CA, USA
- Nelson Morgan:
17th Annual Conference of the International Speech Communication Association, Interspeech 2016, San Francisco, CA, USA, September 8-12, 2016. ISCA 2016
Keynote 1: ISCA Medalist: John Makhoul
- John Makhoul:
A 50-Year Retrospective on Speech and Language Processing. 1
Neural Networks in Speech Recognition
- Ivan Medennikov
, Alexey Prudnikov, Alexander Zatvornitskiy
:
Improving English Conversational Telephone Speech Recognition. 2-6 - George Saon
, Tom Sercu, Steven J. Rennie, Hong-Kwang Jeff Kuo:
The IBM 2016 English Conversational Telephone Speech Recognition System. 7-11 - Liang Lu, Steve Renals
:
Small-Footprint Deep Neural Networks with Highway Connections for Speech Recognition. 12-16 - Dong Yu, Wayne Xiong, Jasha Droppo
, Andreas Stolcke, Guoli Ye, Jinyu Li
, Geoffrey Zweig:
Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention. 17-21 - Golan Pundak, Tara N. Sainath:
Lower Frame Rate Neural Network Acoustic Models. 22-26 - Gakuto Kurata, Brian Kingsbury:
Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling. 27-31
Special Session: Auditory-Visual Expressive Speech and Gesture in Humans and Machines
- Lei Chen, Gary Feng, Michelle P. Martin-Raugh, Chee Wee Leong, Christopher Kitchen, Su-Youn Yoon, Blair Lehman, Harrison Kell, Chong Min Lee:
Automatic Scoring of Monologue Video Interviews Using Multimodal Cues. 32-36 - Chee Seng Chong, Jeesun Kim
, Chris Davis
:
The Sound of Disgust: How Facial Expression May Influence Speech Production. 37-41 - Zhaojun Yang, Shrikanth S. Narayanan:
Analyzing Temporal Dynamics of Dyadic Synchrony in Affective Interactions. 42-46 - Attigodu C. Ganesh, Frédéric Berthommier, Jean-Luc Schwartz:
Audiovisual Speech Scene Analysis in the Context of Competing Sources. 47-51 - Najmeh Sadoughi, Carlos Busso
:
Head Motion Generation with Synthetic Speech: A Data Driven Approach. 52-56 - Jeesun Kim
, Chris Davis
:
The Consistency and Stability of Acoustic and Visual Cues for Different Prosodic Attitudes. 57-61 - Jeesun Kim, Gérard Bailly:
Introduction to Poster Presentation of Part II.
Prosody
- Irene Vogel, Laura Spinu:
The Unit of Speech Encoding: The Case of Romanian. 62-66 - Jeanin Jügler, Frank Zimmerer
, Jürgen Trouvain, Bernd Möbius:
The Perceptual Effect of L1 Prosody Transplantation on L2 Speech: The Case of French Accented German. 67-71 - Bijun Ling
, Jie Liang:
Organizing Syllables into Sandhi Domains - Evidence from F0 and Duration Patterns in Shanghai Chinese. 72-76 - Neville Ryant, Mark Y. Liberman
:
Automatic Analysis of Phonetic Speech Style Dimensions. 77-81 - Angeliki Athanasopoulou, Irene Vogel:
The Acoustic Manifestation of Prominence in Stressless Languages. 82-86 - Wei Lai
, Jiahong Yuan, Ya Li, Xiaoying Xu, Mark Y. Liberman
:
The Rhythmic Constraint on Prosodic Boundaries in Mandarin Chinese Based on Corpora of Silent Reading and Speech Perception. 87-91
Speech and Language Processing for Clinical Health Applications
- Fu-Sheng Tsai, Ya-Ling Hsu, Wei-Chen Chen, Yi-Ming Weng, Chip-Jin Ng, Chi-Chun Lee
:
Toward Development and Evaluation of Pain Level-Rating Scale for Emergency Triage based on Vocal Characteristics and Facial Expressions. 92-96 - Tan Lee
, Yuanyuan Liu
, Yu Ting Yeung, Thomas K. T. Law, Kathy Y. S. Lee
:
Predicting Severity of Voice Disorder from DNN-HMM Acoustic Posteriors. 97-101 - Klaske E. van Sluis, Michiel W. M. van den Brekel
, Frans J. M. Hilgers, Rob J. J. H. van Son
:
Long-Term Stability of Tracheoesophageal Voices. 102-106 - Gábor Gosztolya, László Tóth
, Tamás Grósz
, Veronika Vincze, Ildikó Hoffmann, Gréta Szatlóczki, Magdolna Pákáski, János Kálmán:
Detecting Mild Cognitive Impairment from Spontaneous Speech by Correlation-Based Phonetic Feature Selection. 107-111 - Jen J. Gong, Maryann Gong, Dina Levy-Lambert, Jordan R. Green, Tiffany P. Hogan, John V. Guttag:
Towards an Automated Screening Tool for Developmental Speech and Language Impairments. 112-116 - Vikram C. M., Nagaraj Adiga
, S. R. Mahadeva Prasanna:
Spectral Enhancement of Cleft Lip and Palate Speech. 117-121
Speech Coding and Audio Processing for Noise Reduction
- Tian Guan, Guangxing Chu, Fei Chen
, Feng Yang:
Assessing Level-Dependent Segmental Contribution to the Intelligibility of Speech Processed by Single-Channel Noise-Suppression Algorithms. 122-125 - Tudor-Catalin Zorila, Sheila Flanagan
, Brian C. J. Moore, Yannis Stylianou:
Effectiveness of Near-End Speech Enhancement Under Equal-Loudness and Equal-Level Constraints. 126-130 - Bidisha Sharma, S. R. Mahadeva Prasanna:
Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence. 131-135 - Lei Wang, Shufeng Zhu, Diliang Chen, Yong Feng
, Fei Chen
:
Relative Contributions of Amplitude and Phase to the Intelligibility Advantage of Ideal Binary Masked Sentences. 136-139 - Qingju Liu, Yan Tang
, Philip J. B. Jackson
, Wenwu Wang:
Predicting Binaural Speech Intelligibility from Signals Estimated by a Blind Source Separation Algorithm. 140-144 - Petko Nikolov Petkov, Norbert Braunschweiler, Yannis Stylianou:
Automated Pause Insertion for Improved Intelligibility Under Reverberation. 145-149
Speech Analysis
- Jean-Luc Rouas
, Leonidas Ioannidis:
Automatic Classification of Phonation Modes in Singing Voice: Towards Singing Style Characterisation and Application to Ethnomusicological Recordings. 150-154 - Himanshu N. Bhavsar, Tanvina B. Patel, Hemant A. Patil:
Novel Nonlinear Prediction Based Features for Spoofed Speech Detection. 155-159 - Sri Harsha Dumpala, Bhanu Teja Nellore, Raghu Ram Nevali, Suryakanth V. Gangashetty
, B. Yegnanarayana:
Robust Vowel Landmark Detection Using Epoch-Based Features. 160-164 - Johannes Töger, Yongwan Lim, Sajan Goud Lingala
, Shrikanth S. Narayanan, Krishna S. Nayak
:
Sensitivity of Quantitative RT-MRI Metrics of Vocal Tract Dynamics to Image Reconstruction Settings. 165-169 - Milos Cernak, Afsaneh Asaei, Pierre-Edouard Honnet, Philip N. Garner
, Hervé Bourlard:
Sound Pattern Matching for Automatic Prosodic Event Detection. 170-174 - Mostafa Ali Shahin
, Julien Epps
, Beena Ahmed
:
Automatic Classification of Lexical Stress in English and Arabic Languages Using Deep Learning. 175-179
First and Second Language Acquisition
- Fei Chen, Nan Yan, Xunan Huang
, Hao Zhang
, Lan Wang, Gang Peng
:
Development of Mandarin Onset-Rime Detection in Relation to Age and Pinyin Instruction. 180-184 - Xinyi Wen, Yuan Jia:
Joint Effect of Dialect and Mandarin on English Vowel Production: A Case Study in Changsha EFL Learners. 185-189 - Tamami Katayama:
Effects of L1 Phonotactic Constraints on L2 Word Segmentation Strategies. 190-194 - Jane Wottawa, Martine Adda-Decker, Frédéric Isel:
Putting German [ʃ] and [ç] in Two Different Boxes: Native German vs L2 German of French Learners. 195-199 - Dean Luo, Ruxin Luo, Lixin Wang:
Naturalness Judgement of L2 English Through Dubbing Practice. 200-203 - Yasuaki Shinohara
:
Audiovisual Training Effects for Japanese Children Learning English /r/-/l/. 204-207 - Sarah Harper
, Louis Goldstein, Shrikanth S. Narayanan:
L2 Acquisition and Production of the English Rhotic Pharyngeal Gesture. 208-212
Speech and Hearing Disorders & Perception
- Alexandre Hennequin, Amélie Rochet-Capellan, Marion Dohen:
Auditory-Visual Perception of VCVs Produced by People with Down Syndrome: Preliminary Results. 213-217 - Emre Yilmaz
, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik
:
Combining Non-Pathological Data of Different Language Varieties to Improve DNN-HMM Performance on Pathological Speech. 218-222 - Imed Laaridh, Corinne Fredouille, Christine Meunier
:
Evaluation of a Phone-Based Anomaly Detection Approach for Dysarthric Speech. 223-227 - Chitralekha Bhat, Bhavik Vachhani, Sunil Kumar Kopparapu:
Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-Taper Spectral Estimation. 228-232 - Fei Chen, Nan Yan, Xiaojie Pan, Feng Yang, Zhuanzhuan Ji, Lan Wang, Gang Peng
:
Impaired Categorical Perception of Mandarin Tones and its Relationship to Language Ability in Autism Spectrum Disorders. 233-237 - Kathleen F. Nagle
, James T. Heaton:
Perceived Naturalness of Electrolaryngeal Speech Produced Using sEMG-Controlled vs. Manual Pitch Modulation. 238-242 - Shamima Najnin, Bonny Banerjee
, Lisa Lucks Mendel, Masoumeh Heidari Kapourchali
, Jayanta Kumar Dutta, Sungmin Lee, Chhayakanta Patro, Monique Pousson:
Identifying Hearing Loss from Learned Speech Kernels. 243-247 - Panying Rong, Yana Yunusova
, Jordan R. Green:
Differential Effects of Velopharyngeal Dysfunction on Speech Intelligibility During Early and Late Stages of Amyotrophic Lateral Sclerosis. 248-252 - Véronique Delvaux, Virginie Roland, Kathy Huet, Myriam Piccaluga, Marie-Claire Haelewyck, Bernard Harmegnies:
The Production of Intervocalic Glides in Non Dysarthric Parkinsonian Speech. 253-256 - Yang Feng, Zhang Lu:
Auditory Processing Impairments Under Background Noise in Children with Non-Syndromic Cleft Lip and/or Palate. 257-261 - Zhi Zhu, Ryota Miyauchi, Yukiko Araki, Masashi Unoki
:
Modulation Spectral Features for Predicting Vocal Emotion Recognition by Simulated Cochlear Implants. 262-266 - Keiko Ochi
, Koichi Mori, Naomi Sakai, Nobutaka Ono
:
Automatic Discrimination of Soft Voice Onset Using Acoustic Features of Breathy Voicing. 267-271 - Jing Shao
, Caicai Zhang
, Gang Peng
, Yike Yang
, William S.-Y. Wang:
Effect of Noise on Lexical Tone Perception in Cantonese-Speaking Amusics. 272-276 - Yuki Takashima, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki, Nobuyuki Mitani, Kiyohiro Omori, Kaoru Nakazono:
Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss. 277-281 - Yuling Gu, Boon Pang Lim, Nancy F. Chen
:
Perception of Tone in Whispered Mandarin Sentences: The Case for Singapore Mandarin. 282-286
Speech Synthesis Poster
- Feng-Long Xie, Frank K. Soong, Haifeng Li:
A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences. 287-291 - Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki:
Parallel Dictionary Learning for Voice Conversion Using Discriminative Graph-Embedded Non-Negative Matrix Factorization. 292-296 - Yu Gu, Zhen-Hua Ling, Li-Rong Dai:
Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks. 297-301 - Yi Yang, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu:
Voice Conversion Based on Matrix Variate Gaussian Mixture Model Using Multiple Frame Features. 302-306 - Naoki Hosaka, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
Voice Conversion Based on Trajectory Model Training of Neural Networks Considering Global Variance. 307-311 - Sandesh Aryal, Ricardo Gutierrez-Osuna:
Comparing Articulatory and Acoustic Strategies for Reducing Non-Native Accents. 312-316 - Seyyed Saeed Sarfjoo, Cenk Demiroglu:
Cross-Lingual Speaker Adaptation for Statistical Speech Synthesis Using Limited Data. 317-321 - Lifa Sun, Hao Wang, Shiyin Kang, Kun Li, Helen M. Meng:
Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams. 322-326 - Anusha Prakash, Jeena J. Prakash, Hema A. Murthy:
Acoustic Analysis of Syllables Across Indian Languages. 327-331 - Teng Zhang, Zhipeng Chen, Ji Wu, Sam Lai, Wenhui Lei, Carsten Isert:
Objective Evaluation Methods for Chinese Text-To-Speech Systems. 332-336 - Yusuke Ijima, Taichi Asami, Hideyuki Mizuno:
Objective Evaluation Using Association Between Dimensions Within Spectral Features for Statistical Parametric Speech Synthesis. 337-341 - Takenori Yoshimura, Gustav Eje Henter, Oliver Watts, Mirjam Wester, Junichi Yamagishi, Keiichi Tokuda:
A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks. 342-346 - Monika Podsiadlo, Shweta Chahar:
Text-to-Speech for Individuals with Vision Loss: A User Study. 347-351 - Cassia Valentini-Botinhao, Xin Wang
, Shinji Takaki, Junichi Yamagishi:
Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks. 352-356 - Erica Cooper, Alison Chang, Yocheved Levitan, Julia Hirschberg:
Data Selection and Adaptation for Naturalness in HMM-Based Speech Synthesis. 357-361
Topics in Speech Processing
- Fei Tao, Louis Daudet, Christian Poellabauer
, Sandra L. Schneider, Carlos Busso
:
A Portable Automatic PA-TA-KA Syllable Detection System to Derive Biomarkers for Neurological Disorders. 362-366 - Omid Ghahabi, Antonio Bonafonte
, Javier Hernando, Asunción Moreno:
Deep Neural Networks for i-Vector Language Identification of Short Utterances in Cars. 367-371 - Abraham Woubie, Jordi Luque
, Javier Hernando:
Improving i-Vector and PLDA Based Speaker Clustering with Long-Term Features. 372-376
Show & Tell Session 1
- Aaron Lawson, Mitchell McLaren, Harry Bratt, Martin Graciarena, Horacio Franco, Christopher George, Allen R. Stauffer, Chris Bartels, Julien van Hout:
Open Language Interface for Voice Exploitation (OLIVE). 377-378 - Lubos Smídl, Adam Chýlek, Jan Svec:
A Multimodal Dialogue System for Air Traffic Control Trainees Based on Discrete-Event Simulation. 379-380 - Elodie Gauthier, David Blachon, Laurent Besacier, Guy-Noël Kouarata, Martine Adda-Decker, Annie Rialland, Gilles Adda, Grégoire Bachman:
Lig-Aikuma: A Mobile App to Collect Parallel Speech for Under-Resourced Language Studies. 381-382 - Martin Gruber, Jindrich Matousek, Zdenek Hanzlícek, Zdenek Krnoul, Zbynek Zajíc:
ARET - Automatic Reading of Educational Texts for Visually Impaired Students. 383-384
New Trends in Neural Networks for Speech Recognition
- Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals
:
Segmental Recurrent Neural Networks for End-to-End Speech Recognition. 385-389 - Markus Nußbaum-Thom, Jia Cui, Bhuvana Ramabhadran, Vaibhava Goel:
Acoustic Modeling Using Bidirectional Gated Recurrent Convolutional Units. 390-394 - Wei-Ning Hsu, Yu Zhang, Ann Lee, James R. Glass:
Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition. 395-399 - Chunyang Wu, Penny Karanasou, Mark J. F. Gales, Khe Chai Sim:
Stimulated Deep Neural Network for Speech Recognition. 400-404 - Leonardo Badino
:
Phonetic Context Embeddings for DNN-HMM Phone Recognition. 405-409 - Ying Zhang, Mohammad Pezeshki, Philémon Brakel, Saizheng Zhang, César Laurent, Yoshua Bengio, Aaron C. Courville:
Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks. 410-414
Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances
- Guangsen Wang, Kong-Aik Lee
, Trung Hieu Nguyen, Hanwu Sun, Bin Ma:
Joint Speaker and Lexical Modeling for Short-Term Characterization of Speaker. 415-419 - Md. Jahangir Alam, Patrick Kenny, Vishwa Gupta:
Tandem Features for Text-Dependent Speaker Verification on the RedDots Corpus. 420-424 - Achintya Kumar Sarkar
, Zheng-Hua Tan
:
Text Dependent Speaker Verification Using Un-Supervised HMM-UBM and Temporal GMM-UBM. 425-429 - Tomi Kinnunen, Md. Sahidullah
, Ivan Kukanov, Héctor Delgado
, Massimiliano Todisco
, Achintya Kumar Sarkar
, Nicolai Bæk Thomsen, Ville Hautamäki
, Nicholas W. D. Evans, Zheng-Hua Tan
:
Utterance Verification for Text-Dependent Speaker Recognition: A Comparative Assessment Using the RedDots Corpus. 430-434 - Jianbo Ma, Saad Irtza, Kaavya Sriskandaraja
, Vidhyasaharan Sethu
, Eliathamby Ambikairajah
:
Parallel Speaker and Content Modelling for Text-Dependent Speaker Verification. 435-439 - Hossein Zeinali
, Hossein Sameti, Lukás Burget
, Jan Cernocký
, Nooshin Maghsoodi, Pavel Matejka:
i-Vector/HMM Based Text-Dependent Speaker Verification System for RedDots Challenge. 440-444 - Rohan Kumar Das
, Sarfaraz Jelil, S. R. Mahadeva Prasanna:
Exploring Session Variability and Template Aging in Speaker Verification for Fixed Phrase Short Utterances. 445-449
Articulatory Measurements and Analysis
- Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu:
Prediction of the Articulatory Movements of Unseen Phonemes of a Speaker Using the Speech Structure of Another Speaker. 450-454 - Ganesh Sivaraman
, Vikramjit Mitra, Hosung Nam, Mark K. Tiede, Carol Y. Espy-Wilson:
Vocal Tract Length Normalization for Speaker Independent Acoustic-to-Articulatory Speech Inversion. 455-459 - Adam C. Lammert, Christine H. Shadle, Shrikanth S. Narayanan, Thomas F. Quatieri:
Investigation of Speed-Accuracy Tradeoffs in Speech Production Using Real-Time Magnetic Resonance Imaging. 460-464 - Tanner Sorensen
, Asterios Toutios, Louis Goldstein, Shrikanth S. Narayanan:
Characterizing Vocal Tract Dynamics Across Speakers Using Real-Time MRI. 465-469 - Mathieu Labrunie, Pierre Badin
, Dirk Voit, Arun A. Joseph, Laurent Lamalle
, Coriandre Vilain, Louis-Jean Boë, Jens Frahm:
Tracking Contours of Orofacial Articulators from Real-Time MRI of Speech. 470-474 - Sajan Goud Lingala
, Asterios Toutios, Johannes Töger, Yongwan Lim, Yinghua Zhu, Yoon-Chul Kim, Colin Vaz, Shrikanth S. Narayanan, Krishna S. Nayak
:
State-of-the-Art MRI Protocol for Comprehensive Assessment of Vocal Tract Structure and Function. 475-479
Automatic Assessment of Emotions
- Rui Xia, Yang Liu:
DBN-ivector Framework for Acoustic Emotion Recognition. 480-484 - Brian Stasak
, Julien Epps
, Nicholas Cummins
, Roland Goecke
:
An Investigation of Emotional Speech in Depression Classification. 485-489 - Reza Lotfian, Carlos Busso
:
Retrieving Categorical Emotions Using a Probabilistic Framework to Define Preference Learning Samples. 490-494 - Maximilian Schmitt, Fabien Ringeval, Björn W. Schuller
:
At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech. 495-499 - Arodami Chorianopoulou, Polychronis Koutsakis
, Alexandros Potamianos:
Speech Emotion Recognition Using Affective Saliency. 500-504 - Rahul Gupta, Nishant Nath, Taruna Agrawal, Panayiotis G. Georgiou, David C. Atkins
, Shrikanth S. Narayanan:
Laughter Valence Prediction in Motivational Interviewing Based on Lexical and Acoustic Cues. 505-509
Acoustic and Articulatory Phonetics
- Marcin Wlodarczak
, Mattias Heldner
:
Respiratory Belts and Whistles: A Preliminary Study of Breathing Acoustics for Turn-Taking. 510-514 - Constantijn Kaland
, Vincenzo Galatà
, Lorenzo Spreafico, Alessandro Vietti:
/r/ as Language Marker in Bilingual Speech Production and Perception. 515-519 - Manfred Pützer, Frank Zimmerer
, Wolfgang Wokurek, Jeanin Jügler:
Evaluation of Phonatory Behavior of German and French Speakers in Native and Non-Native Speech. 520-524 - Sofia Strömbergsson
:
Today's Most Frequently Used F0 Estimation Methods, and Their Accuracy in Estimating Male and Female Pitch in Clean Speech. 525-529 - Lei He, Volker Dellwo
:
A Praat-Based Algorithm to Extract the Amplitude Envelope and Temporal Fine Structure Using the Hilbert Transform. 530-534 - Ewald Enzinger
:
Likelihood Ratio Calculation in Acoustic-Phonetic Forensic Voice Comparison: Comparison of Three Statistical Modelling Approaches. 535-539
Source Separation and Spatial Audio
- Xiaoke Qi, Jianhua Tao:
A Sparse Spherical Harmonic-Based Model in Subbands for Head-Related Transfer Functions. 540-544 - Yusuf Ziya Isik
, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe
, John R. Hershey:
Single-Channel Multi-Speaker Separation Using Deep Clustering. 545-549 - Hao Li, Shuai Nie, Xueliang Zhang, Hui Zhang:
Jointly Optimizing Activation Coefficients of Convolutive NMF Using DNN for Speech Separation. 550-554 - Masood Delfarah, DeLiang Wang:
A Feature Study for Masking-Based Reverberant Speech Separation. 555-559 - Chung-Chien Hsu, Tai-Shih Chi, Jen-Tzung Chien
:
Discriminative Layered Nonnegative Matrix Factorization for Speech Separation. 560-564 - Arpita Gang, Pravesh Biyani:
On Discriminative Framework for Single Channel Audio Source Separation. 565-569
Special Session: Auditory-Visual Expressive Speech and Gesture in Humans and Machines
- Qin Jin, Junwei Liang
, Xiaozhu Lin:
Generating Natural Video Descriptions via Multimodal Processing. 570-574 - Martin Heckmann
:
Feature-Level Decision Fusion for Audio-Visual Word Prominence Detection. 575-579 - Slim Ouni
, Vincent Colotte, Sara Dahmani, Soumaya Azzi:
Acoustic and Visual Analysis of Expressive Speech: A Case Study of French Acted Speech. 580-584 - Adela Barbulescu, Rémi Ronfard
, Gérard Bailly:
Characterization of Audiovisual Dramatic Attitudes. 585-589 - Yuyun Huang, Emer Gilmartin
, Nick Campbell:
Conversational Engagement Recognition Using Auditory and Visual Cues. 590-594 - Theodora Chaspari, Jill Fain Lehman:
An Acoustic Analysis of Child-Child and Child-Robot Interactions for Understanding Engagement during Speech-Controlled Computer Games. 595-599 - Benjawan Kasisopa, Chutamanee Onsuwan, Charturong Tantibundhit, Nittayapa Klangpornkun, Suparak Techacharoenrungrueang, Sudaporn Luksaneeyanawin, Denis Burnham:
Auditory-Visual Lexical Tone Perception in Thai Elderly Listeners with and without Hearing Impairment. 600-604 - Hossein Khaki, Engin Erzin
:
Use of Agreement/Disagreement Classification in Dyadic Interactions for Continuous Emotion Recognition. 605-609
Special Session: Intelligibility Under the Microscope
- Marc René Schädler, David Hülsmeier, Anna Warzybok
, Sabine Hochmuth, Birger Kollmeier:
Microscopic Multilingual Matrix Test Predictions Using an ASR-Based Speech Recognition Model. 610-614 - Mats Exter, Bernd T. Meyer:
DNN-Based Automatic Speech Recognition as a Model for Human Phoneme Perception. 615-619 - Máté Attila Tóth, Martin Cooke:
Undoing Misperceptions: A Microscopic Analysis of Consistent Confusions Through Signal Modifications. 620-624 - Mahdie Karbasi
, Ahmed Hussen Abdelaziz, Hendrik Meutzner, Dorothea Kolossa
:
Blind Non-Intrusive Speech Intelligibility Prediction Using Twin-HMMs. 625-629 - Máté Attila Tóth, Martin Cooke, Jon Barker:
Misperceptions Arising from Speech-in-Babble Interactions. 630-634 - Anja Eichenauer, Mathias Dietz, Bernd T. Meyer, Tim Jürgens
:
Introducing Temporal Rate Coding for Speech in Cochlear Implants: A Microscopic Evaluation in Humans and Models. 635-639 - María Luisa García Lecumberri
, Jon Barker, Ricard Marxer
, Martin Cooke:
Language Effects in Noise-Induced Word Misperceptions. 640-644 - Léo Varnet
, Fanny Meunier, Michel Hoen
:
Speech Reductions Cause a De-Weighting of Secondary Acoustic Cues. 645-649 - Lionel Fontan
, Isabelle Ferrané, Jérôme Farinas
, Julien Pinquier
, Xavier Aumont:
Using Phonologically Weighted Levenshtein Distances for the Prediction of Microscopic Intelligibility. 650-654 - Mayuki Matsui:
The Impact of Manner of Articulation on the Intelligibility of Voicing Contrast in Noise: Cross-Linguistic Implications. 655-659 - Michael I. Mandel:
Directly Comparing the Listening Strategies of Humans and Machines. 660-664
Spoken Documents, Spoken Understanding and Semantic Analysis
- Marc-Antoine Rondeau, Yi Su:
LSTM-Based NeuroCRFs for Named Entity Recognition. 665-669 - Shih-Hung Liu, Kuan-Yu Chen, Yu-Lun Hsieh, Berlin Chen, Hsin-Min Wang
, Hsu-Chun Yen, Wen-Lian Hsu:
Exploring Word Mover's Distance and Semantic-Aware Embedding Techniques for Extractive Broadcast News Summarization. 670-674 - Imran A. Sheikh
, Irina Illina, Dominique Fohr, Georges Linarès:
Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition. 675-679 - Jérémy Trione, Benoît Favre, Frédéric Béchet:
Beyond Utterance Extraction: Summary Recombination for Speech Summarization. 680-684 - Bing Liu, Ian R. Lane:
Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling. 685-689 - Aaron Jaech, Larry P. Heck, Mari Ostendorf:
Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding. 690-694 - Faisal Ladhak, Ankur Gandhe, Markus Dreyer, Lambert Mathias, Ariya Rastrow, Björn Hoffmeister:
LatticeRnn: Recurrent Neural Networks Over Lattices. 695-699 - Santosh Kesiraju
, Lukás Burget
, Igor Szöke, Jan Cernocký
:
Learning Document Representations Using Subspace Multinomial Model. 700-704 - Zhiwei Zhao, Youzheng Wu:
Attention-Based Convolutional Neural Networks for Sentence Classification. 705-709 - Mohamed Morchid, Mohamed Bouaziz, Waad Ben Kheder, Killian Janod, Pierre-Michel Bousquet, Richard Dufour, Georges Linarès:
Spoken Language Understanding in a Latent Topic-Based Subspace. 710-714 - Dilek Hakkani-Tür
, Gökhan Tür
, Asli Celikyilmaz
, Yun-Nung Chen, Jianfeng Gao, Li Deng, Ye-Yi Wang:
Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM. 715-719 - Killian Janod, Mohamed Morchid, Richard Dufour, Georges Linarès, Renato De Mori:
Deep Stacked Autoencoders for Spoken Language Understanding. 720-724 - Gakuto Kurata, Bing Xiang, Bowen Zhou:
Labeled Data Generation with Encoder-Decoder LSTM for Semantic Slot Filling. 725-729 - Sabrina Stehwien, Ngoc Thang Vu:
Exploring the Correlation of Pitch Accents and Semantic Slots for Spoken Language Understanding. 730-734 - Yaodong Tang, Zhiyong Wu, Helen M. Meng, Mingxing Xu, Lianhong Cai:
Analysis on Gated Recurrent Unit Based Question Detection Approach. 735-739
Spoken Term Detection
- Shuji Oishi, Tatsuya Matsuba, Mitsuaki Makino, Atsuhiko Kai:
Combining State-Level Spotting and Posterior-Based Acoustic Match for Improved Query-by-Example Spoken Term Detection. 740-744 - Zhiqiang Lv, Meng Cai, Wei-Qiang Zhang, Jia Liu:
A Novel Discriminative Score Calibration Method for Keyword Search. 745-749 - Jorge Proença
, Fernando Perdigão
:
Segmented Dynamic Time Warping for Spoken Query-by-Example Search. 750-754 - Shi-wook Lee
, Kazuyo Tanaka, Yoshiaki Itoh:
Generating Complementary Acoustic Model Spaces in DNN-Based Sequence-to-Frame DTW Scheme for Out-of-Vocabulary Spoken Term Detection. 755-759 - Sankaran Panchapagesan, Ming Sun, Aparna Khare
, Spyros Matsoukas, Arindam Mandal, Björn Hoffmeister, Shiv Vitaladevuni:
Multi-Task Learning and Weighted Cross-Entropy for DNN-Based Keyword Spotting. 760-764 - Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-yi Lee, Lin-Shan Lee:
Audio Word2Vec: Unsupervised Learning of Audio Segment Representations Using Sequence-to-Sequence Autoencoder. 765-769 - Zhong Meng, Biing-Hwang Juang:
Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword Spotting. 770-774 - Arseniy Gorin, Rasa Lileikyte, Guangpu Huang, Lori Lamel, Jean-Luc Gauvain, Antoine Laurent:
Language Model Data Augmentation for Keyword Spotting in Low-Resourced Training Conditions. 775-779
Show & Tell Session 2
- Lyan Verwimp, Brecht Desplanques, Kris Demuynck, Joris Pelemans, Marieke Lycke, Patrick Wambacq:
STON: Efficient Subtitling in Dutch Using State-of-the-Art Tools. 780-781 - Petr Stanislav, Lubos Smídl, Jan Svec:
An Automatic Training Tool for Air Traffic Control Training. 782-783 - Reima Karhila, Aku Rouhe, Peter Smit, André Mansikkaniemi, Heini Kallio, Erik Lindroos, Raili Hildén, Martti Vainio, Mikko Kurimo:
Digitala: An Augmented Test and Review Process Prototype for High-Stakes Spoken Foreign Language Examination. 784-785 - Géraldine Damnati, Delphine Charlet, Marc Denjean:
Exploring Collections of Multimedia Archives Through Innovative Interfaces in the Context of Digital Humanities. 786-787
Feature Extraction and Acoustic Modeling Using Neural Networks for ASR
- Yougen Yuan, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li
:
Learning Neural Network Representations Using Cross-Lingual Bottleneck Features with Word-Pair Information. 788-792 - Yuzong Liu, Katrin Kirchhoff:
Novel Front-End Features Based on Neural Graph Embeddings for DNN-HMM and LSTM-CTC Acoustic Modeling. 793-797 - Basil Abraham, Srinivasan Umesh
, Neethu Mariam Joy:
Articulatory Feature Extraction Using CTC to Build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition. 798-802 - Tasha Nagamine, Michael L. Seltzer, Nima Mesgarani:
On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models. 803-807 - Ehsan Variani, Tara N. Sainath, Izhak Shafran, Michiel Bacchiani:
Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling. 808-812 - Tara N. Sainath, Bo Li:
Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks. 813-817
Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge
- Mitchell McLaren, Luciana Ferrer, Diego Castán, Aaron Lawson:
The Speakers in the Wild (SITW) Speaker Recognition Database. 818-822 - Mitchell McLaren, Luciana Ferrer, Diego Castán, Aaron Lawson:
The 2016 Speakers in the Wild Speaker Recognition Evaluation. 823-827 - Ondrej Novotný, Pavel Matejka, Oldrich Plchot, Ondrej Glembek, Lukás Burget
, Jan Cernocký
:
Analysis of Speaker Recognition Systems in Realistic Scenarios of the SITW 2016 Challenge. 828-832 - Oleg Kudashev, Sergey Novoselov, Konstantin Simonchik, Alexander Kozlov:
A Speaker Recognition System for the SITW Challenge. 833-837 - Houman Ghaemmaghami, Md. Hafizur Rahman, Ivan Himawan
, David Dean, Ahilan Kanagasundaram, Sridha Sridharan, Clinton Fookes
:
Speakers In The Wild (SITW): The QUT Speaker Recognition System. 838-842 - Abbas Khosravani, Mohammad Mehdi Homayounpour
:
AUT System for SITW Speaker Recognition Challenge. 843-847 - Waad Ben Kheder, Moez Ajili, Pierre-Michel Bousquet, Driss Matrouf, Jean-François Bonastre:
LIA System for the SITW Speaker Recognition Challenge. 848-852 - Yi Liu, Yao Tian, Liang He
, Jia Liu:
Investigating Various Diarization Algorithms for Speaker in the Wild (SITW) Speaker Recognition Challenge. 853-857
Non-Native Speech Perception
- Odette Scharenborg, Juul Coumans
, Sofoklis Kakouros
, Roeland van Hout:
Does the Importance of Word-Initial and Word-Final Information Differ in Native versus Non-Native Spoken-Word Recognition? 858-862 - Odette Scharenborg, Elea Kolkman, Sofoklis Kakouros
, Brechtje Post:
The Effect of Sentence Accent on Non-Native Speech Perception in Noise. 863-867 - Martin Cooke, María Luisa García Lecumberri
:
The Effects of Modified Speech Styles on Intelligibility for Non-Native Listeners. 868-872 - Hao Zhang
, Fei Chen, Nan Yan, Lan Wang, Feng Shi, Manwa L. Ng:
The Influence of Language Experience on the Categorical Perception of Vowels: Evidence from Mandarin and Korean. 873-877 - Dominic W. Massaro:
Multiple Influences on Vocabulary Acquisition: Parental Input Dominates. 878-882 - Jian Gong, María Luisa García Lecumberri
, Martin Cooke:
Can Intensive Exposure to Foreign Language Sounds Affect the Perception of Native Sounds? 883-887
Behavioral Signal Processing and Speaker State and Traits Analytics
- Nikoletta Bassiou, Andreas Tsiartas, Jennifer Smith, Harry Bratt, Colleen Richey, Elizabeth Shriberg, Cynthia M. D'Angelo
, Nonye Alozie:
Privacy-Preserving Speech Analytics for Automatic Assessment of Student Collaboration. 888-892 - Md. Nasir, Brian R. Baucom
, Shrikanth S. Narayanan, Panayiotis G. Georgiou:
Complexity in Prosody: A Nonlinear Dynamical Systems Approach for Dyadic Conversations; Behavior and Outcomes in Couples Therapy. 893-897 - Shao-Yen Tseng, Sandeep Nallan Chakravarthula, Brian R. Baucom
, Panayiotis G. Georgiou:
Couples Behavior Modeling and Annotation Using Low-Resource LSTM Language Models. 898-902 - Laura Fernández Gallardo, Benjamin Weiss:
Speech Likability and Personality-Based Social Relations: A Round-Robin Analysis over Communication Channels. 903-907 - Bo Xiao, Dogan Can, James Gibson, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Behavioral Coding of Therapist Language in Addiction Counseling Using Recurrent Neural Networks. 908-912 - Ting Dang
, Vidhyasaharan Sethu
, Eliathamby Ambikairajah:
Factor Analysis Based Speaker Normalisation for Continuous Emotion Prediction. 913-917
Spoken Term Detection
- Dhananjay Ram, Afsaneh Asaei, Hervé Bourlard:
Subspace Detection of DNN Posterior Probabilities via Sparse Representation for Query by Example Spoken Term Detection. 918-922 - Hongjie Chen, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li
:
Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection. 923-927 - Amir Hossein Harati Nejad Torbati, Joseph Picone:
A Nonparametric Bayesian Approach for Spoken Term Detection by Example Query. 928-932 - Van Tung Pham, Haihua Xu, Xiong Xiao, Nancy F. Chen
, Eng Siong Chng
, Haizhou Li
:
Rescoring Hypothesized Detections of Out-of-Vocabulary Keywords Using Subword Samples. 933-937 - Yimeng Zhuang, Xuankai Chang, Yanmin Qian, Kai Yu:
Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC. 938-942 - Yen-Chen Wu, Tzu-Hsiang Lin, Yang-De Chen, Hung-yi Lee, Lin-Shan Lee:
Interactive Spoken Content Retrieval by Deep Reinforcement Learning. 943-947
Co-Inference of Production and Acoustics
- Elizabeth Godoy, Andrew Dumas, Jennifer Melot
, Nicolas Malyska, Thomas F. Quatieri:
Relating Estimated Cyclic Spectral Peak Frequency to Measured Epilarynx Length Using Magnetic Resonance Imaging. 948-952 - Patrick Lumban Tobing
, Tomoki Toda
, Hirokazu Kameoka, Satoshi Nakamura:
Acoustic-to-Articulatory Inversion Mapping Based on Latent Trajectory Gaussian Mixture Model. 953-957 - Yehoshua Dissen, Joseph Keshet
:
Formant Estimation and Tracking Using Deep Learning. 958-962 - Colin Vaz, Asterios Toutios, Shrikanth S. Narayanan:
Convex Hull Convolutive Non-Negative Matrix Factorization for Uncovering Temporal Patterns in Multivariate Time-Series Data. 963-967 - Lauri Juvela
, Hirokazu Kameoka, Manu Airaksinen
, Junichi Yamagishi, Paavo Alku
:
Majorisation-Minimisation Based Optimisation of the Composite Autoregressive System with Application to Glottal Inverse Filtering. 968-972 - Xiaoyun Wang, Xugang Lu, Hisashi Kawai, Seiichi Yamamoto:
F0 Contour Analysis Based on Empirical Mode Decomposition for DNN Acoustic Modeling in Mandarin Speech Recognition. 973-977
Acoustic and Articulatory Phonetics
- Fang Hu, Chunyu Ge:
Vowels and Diphthongs in Cangnan Southern Min Chinese Dialect. 978-982 - Wenqi Hu, Fang Hu, Jian Jin:
Diphthongization of Nuclear Vowels and the Emergence of a Tetraphthong in Hetang Cantonese. 983-987 - Milos Cernak, Philip N. Garner
:
PhonVoc: A Phonetic and Phonological Vocoding Toolkit. 988-992 - Liping Xia, Fang Hu:
Vowels and Diphthongs in the Taiyuan Jin Chinese Dialect. 993-997 - Giuseppina Turco, Cécile Fougeron, Nicolas Audibert:
The Effects of Prosody on French V-to-V Coarticulation: A Corpus-Based Study. 998-1001 - Vincenzo Galatà
, Lorenzo Spreafico, Alessandro Vietti, Constantijn Kaland
:
An Acoustic Analysis of /r/ in Tyrolean. 1002-1006 - Seung-Eun Chang, Minsook Kim:
Hyperarticulated Production of Korean Glides by Age Group. 1007-1010 - Ho-hsien Pan, Hsiao-tung Huang, Shao-Ren Lyu:
Coda Stop and Taiwan Min Checked Tone Sound Changes. 1011-1015
Prosody, Phonation and Voice Quality
- Sarah E. Fenwick, Catherine T. Best, Chris Davis
, Michael D. Tyler
:
The Influence of Modality and Speaking Style on the Assimilation Type and Categorization Consistency of Non-Native Speech. 1016-1020 - Margaret Zellers
:
Prosodic Convergence with Spoken Stimuli in Laboratory Data. 1021-1025 - Charalambos Themistocleous
, Angelandria Savva, Andrie Aristodemou:
Effects of Stress on Fricatives: Evidence from Standard Modern Greek. 1026-1029 - Yue Sun, Shudon Hsiao, Yoshinori Sagisaka, Jinsong Zhang
:
Analysis of Chinese Syllable Durations in Running Speech of Japanese L2 Learners. 1030-1033 - Catherine Lai, Mireia Farrús
, Johanna D. Moore:
Automatic Paragraph Segmentation with Lexical and Prosodic Features. 1034-1038 - Manu Airaksinen
, Lauri Juvela
, Tom Bäckström
, Paavo Alku
:
Automatic Glottal Inverse Filtering with Non-Negative Matrix Factorization. 1039-1043 - Soo Jin Park, Caroline Sigouin, Jody Kreiman, Patricia A. Keating, Jinxi Guo, Gary Yeung, Fang-Yu Kuo, Abeer Alwan:
Speaker Identity and Voice Quality: Modeling Human Responses and Automatic Speaker Recognition. 1044-1048 - Sishir Kalita, Luke Horo, Priyankoo Sarmah
, S. R. Mahadeva Prasanna, Samarendra Dandapat:
Analysis of Glottal Stop in Assam Sora Language. 1049-1053 - Marc Garellek, Scott Seyfarth:
Acoustic Differences Between English /t/ Glottalization and Phrasal Creak. 1054-1058 - Anders Eriksson, Pier Marco Bertinetto, Mattias Heldner
, Rosalba Nodari
, Giovanna Lenoci:
The Acoustics of Lexical Stress in Italian as a Function of Stress Level and Speaking Style. 1059-1063 - Antje Schweitzer, Ngoc Thang Vu:
Cross-Gender and Cross-Dialect Tone Recognition for Vietnamese. 1064-1068 - Karthika Vijayan
, K. Sri Rama Murty
:
Prosody Modification Using Allpass Residual of Speech Signals. 1069-1073 - Sofoklis Kakouros
, Joris Pelemans, Lyan Verwimp, Patrick Wambacq
, Okko Räsänen
:
Analyzing the Contribution of Top-Down Lexical and Bottom-Up Acoustic Cues in the Detection of Sentence Prominence. 1074-1078 - Jeffrey Kallay, Melissa A. Redford:
A Longitudinal Study of Children's Intonation in Narrative Speech. 1079-1083
Speech Production Analysis and Modeling
- Reed Blaylock
, Louis Goldstein, Shrikanth S. Narayanan:
Velum Control for Oral Sounds. 1084-1088 - Gayeon Son:
F0 Development in Acquiring Korean Stop Distinction. 1089-1093 - Clara Cohen, Matt Carlson:
Phonetic Reduction Can Lead to Lengthening, and Enhancement Can Lead to Shortening. 1094-1098 - Takayuki Arai:
Mechanical Production of [b], [m] and [w] Using Controlled Labial and Velopharyngeal Gestures. 1099-1103 - Qiang Fang, Yun Chen, Haibo Wang, Jianguo Wei
, Jianrong Wang, Xiyu Wu, Aijun Li:
An Improved 3D Geometric Tongue Model. 1104-1107 - Mikko Tiainen
, Fatima M. Felisberti
, Kaisa Tiippana
, Martti Vainio
, Juraj Simko, Jirí Lukavský
, Lari Vainio
:
Congruency Effect Between Articulation and Grasping in Native English Speakers. 1108-1112 - Shamima Najnin, Bonny Banerjee
:
Emergence of Vocal Developmental Sequences in a Predictive Coding Model of Speech Acquisition. 1113-1117 - Julien Meyer, Laure Dentel, Fanny Meunier:
Categorization of Natural Spanish Whistled Vowels by Naïve Spanish Listeners. 1118-1121 - Rob Voigt, Dan Jurafsky, Meghan Sumner
:
Between- and Within-Speaker Effects of Bilingualism on F0 Variation. 1122-1126 - Calbert Graham
, Paula Buttery, Francis Nolan:
Vowel Characteristics in the Assessment of L2 English Pronunciation. 1127-1131 - Ahmed Geneid
, Anne-Maria Laukkanen
, Anita McAllister
, Robert Eklund:
Kulning (Swedish Cattle Calls): Acoustic, EGG, Stroboscopic and High-Speed Video Analyses of an Unusual Singing Style. 1132-1135 - Mísa Hejná, Pertti Palo, Scott Moisik
:
Glottal Squeaks in VC Sequences. 1136-1140 - Naoya Takahashi, Tofigh Naghibi, Beat Pfister:
Automatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Networks. 1141-1145
Spoken Dialogue Systems
- Xiaohu Liu, Ruhi Sarikaya, Liang Zhao, Yong Ni, Yi-Cheng Pan:
Personalized Natural Language Understanding. 1146-1150 - Layla El Asri, Jing He, Kaheer Suleman:
A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems. 1151-1155 - Spiros Georgiladakis, Georgia Athanasopoulou, Raveesh Meena, José Lopes
, Arodami Chorianopoulou, Elisavet Palogiannidi, Elias Iosif, Gabriel Skantze
, Alexandros Potamianos:
Root Cause Analysis of Miscommunication Hotspots in Spoken Dialogue Systems. 1156-1160 - Omar Zia Khan, Ruhi Sarikaya:
Making Personal Digital Assistants Aware of What They Do Not Know. 1161-1165 - Rivka Levitan
, Stefan Benus
, Ramiro H. Gálvez, Agustín Gravano, Florencia Savoretti, Marián Trnka
, Andreas Weise, Julia Hirschberg:
Implementing Acoustic-Prosodic Entrainment in a Conversational Avatar. 1166-1170 - Annika Silvervarg, Sofia Lindvall, Jonatan Andersson, Ida Esberg, Christian Jernberg, Filip Frumerie, Arne Jönsson:
Perceived Usability and Cognitive Demand of Secondary Tasks in Spoken Versus Visual-Manual Automotive Interaction. 1171-1175
Show & Tell Session 3
- Pascale Fung, Anik Dey, Farhad Bin Siddique, Ruixi Lin, Yang Yang, Yan Wan, Ricky Ho Yin Chan:
Zara: An Empathetic Interactive Virtual Agent. 1176-1177 - Cristian Tejedor García, David Escudero Mancebo, Enrique Cámara Arenas, César González Ferreras, Valentín Cardeñoso-Payo:
Measuring Pronunciation Improvement in Users of CAPT Tool TipTopTalk! 1178-1179 - Hideki Kawahara:
SparkNG: Interactive MATLAB Tools for Introduction to Speech Production, Perception and Processing Fundamentals and Application of the Aliasing-Free L-F Model Component. 1180-1181 - Erik Marchi, Florian Eyben, Gerhard Hagerer, Björn W. Schuller:
Real-Time Tracking of Speakers' Emotions, States, and Traits on Mobile Platforms. 1182-1183
Special Event: Mindfulness
- Nikki Mirghafori:
Mindfulness Special Event.
Keynote 2: Edward Chang
- Edward Chang:
The Human Speech Cortex. 1184
Special Event: Speaker Comparison for Forensic and Investigative Applications II
- Jean-François Bonastre, Joseph P. Campbell, Anders P. Eriksson, Hirotaka Nakasone, Reva Schwartz:
Speaker Comparison for Forensic and Investigative Applications II.
Special Session: Clinical and Neuroscience-Inspired Vocal Biomarkers of Neurological and Psychiatric Disorders
- Daniel Bone, Somer Bishop, Rahul Gupta, Sungbok Lee, Shrikanth S. Narayanan:
Acoustic-Prosodic and Turn-Taking Features in Interactions with Children with Neurodevelopmental Disorders. 1185-1189 - Daria Hemmerling, Juan Rafael Orozco-Arroyave
, Andrzej Skalski
, Janusz Gajda
, Elmar Nöth:
Automatic Detection of Parkinson's Disease Based on Modulated Vowels. 1190-1194 - Jun Wang, Prasanna V. Kothalkar, Beiming Cao, Daragh Heitzman:
Towards Automatic Detection of Amyotrophic Lateral Sclerosis from Speech Acoustic and Articulatory Samples. 1195-1199 - Gregory A. Ciccarelli, Thomas F. Quatieri, Satrajit S. Ghosh
:
Neurophysiological Vocal Source Modeling for Biomarkers of Disease. 1200-1204 - Rachelle L. Horwitz-Martin, Thomas F. Quatieri, Adam C. Lammert, James R. Williamson, Yana Yunusova
, Elizabeth Godoy, Daryush D. Mehta, Jordan R. Green:
Relation of Automatically Extracted Formant Trajectories with Intelligibility Loss and Speaking Rate Decline in Amyotrophic Lateral Sclerosis. 1205-1209 - Fabien Ringeval, Erik Marchi, Charline Grossard, Jean Xavier, Mohamed Chetouani
, David Cohen, Björn W. Schuller
:
Automatic Analysis of Typical and Atypical Encoding of Spontaneous Emotion in the Voice of Children. 1210-1214 - Soheil Khorram, John Gideon, Melvin G. McInnis, Emily Mower Provost
:
Recognition of Depression in Bipolar Disorder: Leveraging Cohort and Person-Specific Knowledge. 1215-1219 - Bahman Mirheidari, Daniel Blackburn
, Markus Reuber, Traci Walker
, Heidi Christensen
:
Diagnosing People with Dementia Using Automatic Conversation Analysis. 1220-1224
Special Session: Singing Synthesis Challenge: Fill-In the Gap
- Paul Yaozhu Chan
, Minghui Dong, Grace Xue Hui Ho, Haizhou Li
:
SERAPHIM: A Wavetable Synthesis System with 3D Lip Animation for Real-Time Speech and Singing Applications on Mobile Platforms. 1225-1229 - Jordi Bonada
, Martí Umbert, Merlijn Blaauw
:
Expressive Singing Synthesis Based on Unit Selection for the Singing Synthesis Challenge 2016. 1230-1234 - Olivier Perrotin
, Christophe d'Alessandro
:
Vocal Effort Modification for Singing Synthesis. 1235-1239 - Eder del Blanco, Inma Hernáez
, Eva Navas
, Xabier Sarasola, Daniel Erro:
Bertsokantari: a TTS Based Singing Synthesis System. 1240-1244 - Lionel Feugère
, Christophe d'Alessandro, Samuel Delalez, Luc Ardaillon, Axel Roebel:
Evaluation of Singing Synthesis: Methodology and Case Study with Concatenative and Performative Systems. 1245-1249 - Luc Ardaillon, Celine Chabot-Canet, Axel Roebel
:
Expressive Control of Singing Voice Synthesis Using Musical Contexts and a Parametric F0 Model. 1250-1254 - Marius Cotescu
:
Optimal Unit Stitching in a Unit Selection Singing Synthesis System. 1255-1259
Conversation and Interaction
- Katherine Hilton:
The Perception of Overlapping Speech: Effects of Speaker Prosody and Listener Attitudes. 1260-1264 - Agustín Gravano, Pablo Brusco, Stefan Benus
:
Who Do You Think Will Speak Next? Perception of Turn-Taking Cues in Slovak and Argentine Spanish. 1265-1269 - Juan Manuel Pérez, Ramiro H. Gálvez, Agustín Gravano:
Disentrainment may be a Positive Thing: A Novel Measure of Unsigned Acoustic-Prosodic Synchrony, and its Relation to Speaker Engagement. 1270-1274 - Marcin Wlodarczak
, Mattias Heldner
:
Respiratory Turn-Taking Cues. 1275-1279 - Emma Rennie, Rebecca Lunsford
, Peter A. Heeman:
The Discourse Marker "so" in Turn-Taking and Turn-Releasing Behavior. 1280-1284 - Ethan Sherr-Ziarko:
Acoustic Properties of Formality in Conversational Japanese. 1285-1289
Automatic Learning of Representations
- Thomas Pellegrini, Sandrine Mouysset:
Inferring Phonemic Classes from CNN Activation Maps Using Clustering Techniques. 1290-1294 - Neil Zeghidour, Gabriel Synnaeve, Nicolas Usunier, Emmanuel Dupoux:
Joint Learning of Speaker and Phonetic Similarities with Siamese Networks. 1295-1299 - Vikramjit Mitra, Dimitra Vergyri, Horacio Franco:
Unsupervised Learning of Acoustic Units Using Autoencoders and Kohonen Nets. 1300-1304 - Zhenyao Zhu, Jesse H. Engel, Awni Y. Hannun:
Learning Multiscale Features Directly from Waveforms. 1305-1309 - Michael Heck, Sakriani Sakti, Satoshi Nakamura:
Supervised Learning of Acoustic Models in a Zero Resource Setting to Improve DPGMM Clustering. 1310-1314 - Haihua Xu, Hang Su, Chongjia Ni, Xiong Xiao, Hao Huang, Eng Siong Chng
, Haizhou Li
:
Semi-Supervised and Cross-Lingual Knowledge Transfer Learnings for DNN Hybrid Acoustic Models Under Low-Resource Conditions. 1315-1319
Language Modeling for Conversational Speech and Confidence Measures
- Taichi Asami, Ryo Masumura, Yushi Aono, Koichi Shinoda:
Recurrent Out-of-Vocabulary Word Detection Using Distribution of Features. 1320-1324 - Naoyuki Kanda, Shoji Harada, Xugang Lu, Hisashi Kawai:
Investigation of Semi-Supervised Acoustic Model Training Based on the Committee of Heterogeneous Neural Networks. 1325-1329 - Sahar Ghannay, Yannick Estève, Nathalie Camelin, Paul Deléglise:
Acoustic Word Embeddings for ASR Error Detection. 1330-1334 - Axel Horndasch, Anton Batliner, Caroline Kaufhold, Elmar Nöth:
Combining Semantic Word Classes and Sub-Word Unit Speech Recognition for Robust OOV Detection. 1335-1339 - Chuandong Xie, Wu Guo, Guoping Hu, Junhua Liu:
Web Data Selection Based on Word Embedding for Low-Resource Speech Recognition. 1340-1344
Topics in Speech Perception
- Jianjing Kuang, Mark Y. Liberman
:
Pitch-Range Perception: The Dynamic Interaction Between Voice Quality and Fundamental Frequency. 1350-1354 - Fei Chen
, Benson C. L. Chiao:
Comparing the Contributions of Amplitude and Phase to Speech Intelligibility in a Vocoder-Based Speech Synthesis Model. 1355-1358 - Fei Chen
:
Modeling Noise Influence to Speech Intelligibility Non-Intrusively by Reduced Speech Dynamic Range. 1359-1362 - Gábor Pintér, Hiroki Watanabe
:
Do GMM Phoneme Classifiers Perceive Synthetic Sibilants as Humans Do? 1363-1367 - Marina Frye, Cristiano Micheli, Inga M. Schepers, Gerwin Schalk
, Jochem W. Rieger
, Bernd T. Meyer:
Neural Responses to Speech-Specific Modulations Derived from a Spectro-Temporal Filter Bank. 1368-1372 - Kimberley Mulder, Louis ten Bosch
, Lou Boves:
Comparing Different Methods for Analyzing ERP Signals. 1373-1377 - Robert Eklund, Martin Ingvar:
Supplementary Motor Area Activation in Disfluency Perception: An fMRI Study of Listener Neural Responses to Spontaneously Produced Unfilled and Filled Pauses. 1378-1381 - Daniel Fogerty, Fei Chen
:
Vowel Fundamental and Formant Frequency Contributions to English and Mandarin Sentence Intelligibility. 1382-1386
Behavioral Signal Processing and Speaker State and Traits Analytics
- Che-Wei Huang, Shrikanth S. Narayanan:
Attention Assisted Discovery of Sub-Utterance Structure in Speech Emotion Recognition. 1387-1391 - Linchuan Li, Zhiyong Wu, Mingxing Xu, Helen M. Meng, Lianhong Cai:
Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition. 1392-1396 - Jürgen Trouvain, Zofia Malisz
:
Inter-Speech Clicks in an Interspeech Keynote. 1397-1401 - Joanna Grzybowska, Stanislaw Kacprzak
:
Speaker Age Classification and Regression Using i-Vectors. 1402-1406 - Haoqi Li, Brian R. Baucom
, Panayiotis G. Georgiou:
Sparsely Connected and Disjointly Trained Deep Neural Networks for Low Resource Behavioral Annotation: Acoustic Classification in Couples' Therapy. 1407-1411 - Guozhen An, Sarah Ita Levitan
, Rivka Levitan, Andrew Rosenberg, Michelle Levine, Julia Hirschberg:
Automatically Classifying Self-Rated Personality Scores from Speech. 1412-1416 - Jill Fain Lehman, Rita Singh:
Estimation of Children's Physical Characteristics from Their Voices. 1417-1421 - Hayakawa Akira
, Saturnino Luz, Nick Campbell:
Talking to a System and Talking to a Human: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task. 1422-1426 - Rahul Gupta, Shrikanth S. Narayanan:
Predicting Affective Dimensions Based on Self Assessed Depression Severity. 1427-1431 - Wen-Yu Huang, Shan-Wen Hsiao, Hung-Ching Sun, Ming-Chuan Hsieh, Ming-Hsueh Tsai, Chi-Chun Lee
:
Enhancement of Automatic Oral Presentation Assessment System Using Latent N-Grams Word Representation and Part-of-Speech Information. 1432-1436 - Sri Harsha Dumpala, P. Gangamohan, Suryakanth V. Gangashetty
, B. Yegnanarayana:
Use of Vowels in Discriminating Speech-Laugh from Laughter and Neutral Speech. 1437-1441 - Kan Kawabata, Visar Berisha
, Anna Scaglione, Amy LaCross
:
A Convex Model for Linguistic Influence in Group Conversations. 1442-1446 - James Gibson, Dogan Can, Bo Xiao, Zac E. Imel, David C. Atkins
, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
A Deep Learning Approach to Modeling Empathy in Addiction Counseling. 1447-1451 - Kun-Yi Huang
, Chung-Hsien Wu
, Yu-Ting Kuo, Fong-Lin Jang:
Unipolar Depression vs. Bipolar Disorder: An Elicitation-Based Approach to Short-Term Detection of Mood Disorder. 1452-1456
Speech Synthesis Poster
- Abir Masmoudi, Mariem Ellouze
, Fethi Bougares, Yannick Estève, Lamia Hadrich Belguith
:
Conditional Random Fields for the Tunisian Dialect Grapheme-to-Phoneme Conversion. 1457-1461 - Sittipong Saychum, Sarawoot Kongyoung, Anocha Rugchatjaroen, Patcharika Chootrakool, Sawit Kasuriya, Chai Wutiwiwatchai:
Efficient Thai Grapheme-to-Phoneme Conversion Using CRF-Based Joint Sequence Modeling. 1462-1466 - Aurore Jaumard-Hakoun, Kele Xu
, Clémence Leboullenger, Pierre Roussel-Ragot, Bruce Denby:
An Articulatory-Based Singing Voice Synthesis Using Tongue and Lips Imaging. 1467-1471 - Xu Li, Zhiyong Wu, Helen M. Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai:
Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis. 1472-1476 - Xu Li, Zhiyong Wu, Helen M. Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai:
Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data. 1477-1481 - Sarah Taylor, Akihiro Kato, Iain A. Matthews, Ben P. Milner:
Audio-to-Visual Speech Conversion Using Deep Neural Networks. 1482-1486 - Toru Nakashika, Yasuhiro Minami:
Generative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann Machine. 1487-1491 - Asterios Toutios, Tanner Sorensen
, Krishna Somandepalli, Rachel Alexander, Shrikanth S. Narayanan:
Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging Data. 1492-1496 - Xurong Xie, Xunying Liu, Lan Wang:
Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information. 1497-1501 - Zheng-Chen Liu, Zhen-Hua Ling, Li-Rong Dai:
Articulatory-to-Acoustic Conversion with Cascaded Prediction of Spectral and Excitation Features Using Neural Networks. 1502-1506 - Christopher Liberatore, Ricardo Gutierrez-Osuna:
Generating Gestural Scores from Acoustics Through a Sparse Anchor-Based Representation of Speech. 1507-1511 - David Guennec, Damien Lolive:
On the Suitability of Vocalic Sandwiches in a Corpus-Based TTS Engine. 1512-1516 - Decha Moungsri, Tomoki Koriyama
, Takao Kobayashi
:
Unsupervised Stress Information Labeling Using Gaussian Process Latent Variable Model for Statistical Speech Synthesis. 1517-1521 - Jinfu Ni, Yoshinori Shiga, Hisashi Kawai:
Using Zero-Frequency Resonator to Extract Multilingual Intonation Structure. 1522-1526
Resources and Annotation of Resources
- Jia Yu, Xiong Xiao, Lei Xie, Eng Siong Chng
, Haizhou Li
:
A DNN-HMM Approach to Story Segmentation. 1527-1531 - Jean-Philippe Goldman, Pierre-Edouard Honnet, Robert A. J. Clark, Philip N. Garner
, Maria Ivanova, Alexandros Lazaridis, Hui Liang, Tiago Macedo, Beat Pfister, Manuel Sam Ribeiro, Eric Wehrli, Junichi Yamagishi:
The SIWIS Database: A Multilingual Speech Database with Acted Emphasis. 1532-1535 - Emre Yilmaz
, Henk van den Heuvel, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, David A. van Leeuwen:
Open Source Speech and Language Resources for Frisian. 1536-1540 - Andreas Kathol, Elizabeth Shriberg, Massimiliano de Zambotti
:
The SRI CLEO Speaker-State Corpus. 1541-1544 - Nancy F. Chen
, Rong Tong, Darren Wee, Pei Xuan Lee, Bin Ma, Haizhou Li
:
SingaKids-Mandarin: Speech Corpus of Singaporean Children Speaking Mandarin Chinese. 1545-1549 - Colleen Richey, Cynthia M. D'Angelo
, Nonye Alozie, Harry Bratt, Elizabeth Shriberg:
The SRI Speech-Based Collaborative Learning Corpus. 1550-1554 - Anil Ramakrishna, Rahul Gupta, Ruth B. Grossman, Shrikanth S. Narayanan:
An Expectation Maximization Approach to Joint Modeling of Multidimensional Ratings Derived from Multiple Annotators. 1555-1559 - Jindrich Matousek
, Daniel Tihelka
:
Voting Detector: A Combination of Anomaly Detectors to Reveal Annotation Errors in TTS Corpora. 1560-1564
Show & Tell Session 4
- Mario Corrales-Astorgano, David Escudero Mancebo, César González Ferreras, Yurena Gutiérrez-González, Valle Flores-Lucas, Valentín Cardeñoso-Payo, Lourdes Aguilar-Cuevas:
The Magic Stone: A Video Game to Improve Communication Skills of People with Intellectual Disabilities. 1565-1566 - Finnian Kelly, Anil Alexander, Oscar Forth, Samuel Kent, Jonas Lindh, Joel Åkesson:
Identifying Perceptually Similar Voices with a Speaker Recognition System Using Auto-Phonetic Features. 1567-1568 - Kristy James, Alexander Hewer, Ingmar Steiner, Stefanie Wuhrer:
A Real-Time Framework for Visual Feedback of Articulatory Data Using Statistical Shape Models. 1569-1570 - Alex Marin, Paul A. Crook, Omar Zia Khan, Vasiliy Radostev, Khushboo Aggarwal, Ruhi Sarikaya:
Flexible, Rapid Authoring of Goal-Orientated, Multi-Turn Dialogues Using the Task Completion Platform. 1571-1572
Acoustic Model Adaptation
- Marc Delcroix
, Keisuke Kinoshita
, Atsunori Ogawa, Takuya Yoshioka, Dung T. Tran, Tomohiro Nakatani:
Context Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic Models. 1573-1577 - Boon Pang Lim, Faith Wong, Yuyao Li, Jia Wei Bay:
Transfer Learning with Bottleneck Feature Networks for Whispered Speech Recognition. 1578-1582 - Tasha Nagamine, Zhuo Chen, Nima Mesgarani:
Adaptation of Neural Networks Constrained by Prior Statistics of Node Co-Activations. 1583-1587 - Masayuki Suzuki, Ryuki Tachibana, Samuel Thomas, Bhuvana Ramabhadran, George Saon
:
Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings. 1588-1592 - Lahiru Samarakoon, Khe Chai Sim:
Subspace LHUC for Fast Adaptation of Deep Neural Network Acoustic Models. 1593-1597 - Joachim Fainberg
, Peter Bell, Mike Lincoln, Steve Renals
:
Improving Children's Speech Recognition Through Out-of-Domain Data Augmentation. 1598-1602
Special Session: Sharing Research and Education Resources for Understanding Speech Processing
- Florian Metze, Eric Riebling, Anne S. Warlaumont, Elika Bergelson:
Virtual Machines and Containers as a Platform for Experimentation. 1603-1607 - Phil D. Green, Ricard Marxer
, Stuart P. Cunningham
, Heidi Christensen
, Frank Rudzicz
, Maria Yancheva, André Coy, Massimiliano Malavasi, Lorenzo Desideri
, Fabio Tamburini:
CloudCAST - Remote Speech Technology for Speech Professionals. 1608-1612 - Thomas Hain
, Jeremy Christian, Oscar Saz, Salil Deena, Madina Hasan, Raymond W. M. Ng, Rosanna Milner
, Mortaza Doulaty, Yulan Liu:
webASR 2 - Improved Cloud Based Speech Technology. 1613-1617 - Andrew R. Plummer
, Mary E. Beckman:
Sharing Speech Synthesis Software for Research and Education Within Low-Tech and Low-Resource Communities. 1618-1622 - Ronald L. Sprouse, Keith Johnson:
The Berkeley Phonetics Machine. 1623-1626 - Rebecca Bates, Eric Fosler-Lussier, Florian Metze, Martha A. Larson, Gina-Anne Levow, Emily Mower Provost
:
Experiences with Shared Resources for Research and Education in Speech and Language Processing. 1627-1631
Special Session: Voice Conversion Challenge
- Tomoki Toda
, Ling-Hui Chen, Daisuke Saito, Fernando Villavicencio, Mirjam Wester, Zhizheng Wu, Junichi Yamagishi:
The Voice Conversion Challenge 2016. 1632-1636 - Mirjam Wester, Zhizheng Wu, Junichi Yamagishi:
Analysis of the Voice Conversion Challenge 2016 Evaluation Results. 1637-1641 - Ling-Hui Chen, Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Li-Rong Dai:
The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion. 1642-1646 - Seyed Hamidreza Mohammadi, Alexander Kain:
A Voice Conversion Mapping Function Based on a Stacked Joint-Autoencoder. 1647-1651 - Yi-Chiao Wu, Hsin-Te Hwang, Chin-Cheng Hsu, Yu Tsao
, Hsin-Min Wang
:
Locally Linear Embedding for Exemplar-Based Spectral Conversion. 1652-1656 - Fernando Villavicencio, Junichi Yamagishi, Jordi Bonada
, Felipe Espic:
Applying Spectral Normalisation and Efficient Envelope Estimation and Statistical Transformation for the Voice Conversion Challenge 2016. 1657-1661 - Daniel Erro, Agustín Alonso, Luis Serrano, David Tavarez, Igor Odriozola, Xabier Sarasola, Eder del Blanco, Jon Sánchez
, Ibon Saratxaga
, Eva Navas
, Inma Hernáez
:
ML Parameter Generation with a Reformulated MGE Training Criterion - Participation in the Voice Conversion Challenge 2016. 1662-1666 - Kazuhiro Kobayashi, Shinnosuke Takamichi, Satoshi Nakamura, Tomoki Toda
:
The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016. 1667-1671
Intelligibility and Masking
- Maury Lander-Portnoy:
Release from Energetic Masking Caused by Repeated Patterns of Glimpsing Windows. 1672-1676 - Bobby Gibbs II, Daniel Fogerty:
Glimpsing Predictions for Natural and Vocoded Sentence Intelligibility During Modulation Masking: Effect of the Glimpse Cutoff Criterion. 1677-1681 - Li Xu:
Temporal Envelopes in Sine-Wave Speech Recognition. 1682-1686 - Jing Liu, Rosanna H. N. Tong, Fei Chen
:
Understanding Periodically Interrupted Mandarin Speech. 1687-1691 - Fei Chen
, Daniel Fogerty:
Factors Affecting the Intelligibility of Sine-Wave Speech. 1692-1695 - Nao Hodoshima
:
Effects of Urgent Speech and Preceding Sounds on Speech Intelligibility in Noisy and Reverberant Environments. 1696-1699
Robust Speaker Recognition and Anti-Spoofing
- Md. Sahidullah
, Héctor Delgado
, Massimiliano Todisco
, Hong Yu, Tomi Kinnunen, Nicholas W. D. Evans, Zheng-Hua Tan
:
Integrated Spoofing Countermeasures and Automatic Speaker Verification: An Evaluation on ASVspoof 2015. 1700-1704 - Pavel Korshunov, Sébastien Marcel:
Cross-Database Evaluation of Audio-Based Spoofing Detection Systems. 1705-1709 - Kaavya Sriskandaraja
, Vidhyasaharan Sethu
, Phu Ngoc Le, Eliathamby Ambikairajah
:
Investigation of Sub-Band Discriminative Information Between Spoofed and Genuine Speech. 1710-1714 - Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng
, Haizhou Li
:
An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions. 1715-1719 - Md. Sahidullah
, Rosa González Hautamäki, Dennis Alexander Lehmann Thomsen, Tomi Kinnunen, Zheng-Hua Tan
, Ville Hautamäki
, Robert Parts, Martti Pitkänen:
Robust Speaker Recognition with Combined Use of Acoustic and Throat Microphone Speech. 1720-1724 - Zhong Meng, Biing-Hwang Juang:
Statistical Modeling of Speaker's Voice with Temporal Co-Location for Active Voice Authentication. 1725-1729
Speech Enhancement and Applications
- Johannes Fischer, Tom Bäckström
:
Joint Enhancement and Coding of Speech by Incorporating Wiener Filtering in a CELP Codec. 1730-1734 - Hong Liu, Xiuling Wang, Miao Sun, Cheng Pang:
Multi-Channel Linear Prediction Based on Binaural Coherence for Speech Dereverberation. 1735-1739 - Martin Blass, Pejman Mowlaee
, W. Bastiaan Kleijn
:
Single-Channel Speech Enhancement Using Double Spectrum. 1740-1744 - Lukas Drude, Bhiksha Raj, Reinhold Haeb-Umbach
:
On the Appropriateness of Complex-Valued Neural Networks for Speech Enhancement. 1745-1749 - Steffen Zeiler, Hendrik Meutzner, Ahmed Hussen Abdelaziz, Dorothea Kolossa
:
Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement. 1750-1754 - Constantin Spille, Hendrik Kayser
, Hynek Hermansky
, Bernd T. Meyer:
Assessing Speech Quality in Speech-Aware Hearing Aids Based on Phoneme Posteriorgrams. 1755-1759
Speech Analysis
- Dhananjaya N. Gowda, Paavo Alku
:
Time-Varying Quasi-Closed-Phase Weighted Linear Prediction Analysis of Speech for Accurate Formant Detection and Tracking. 1760-1764 - Yongwan Lim, Sajan Goud Lingala
, Asterios Toutios, Shrikanth S. Narayanan, Krishna S. Nayak
:
Improved Depiction of Tissue Boundaries in Vocal Tract Real-Time MRI Using Automatic Off-Resonance Correction. 1765-1769 - Merlijn Blaauw
, Jordi Bonada
:
Modeling and Transforming Speech Using Variational Autoencoders. 1770-1774 - Chandra Sekhar Seelamantula:
Phase-Encoded Speech Spectrograms. 1775-1779 - Peter Birkholz
, Petko Bakardjiev
, Steffen Kürbis, Rico Petrick:
Towards Minimally Invasive Velar State Detection in Normal and Silent Speech. 1780-1784 - Jianshu Zhang, Jian Tang, Li-Rong Dai:
RNN-BLSTM Based Multi-Pitch Estimation. 1785-1789 - Masanori Morise, Hideki Kawahara:
TUSK: A Framework for Overviewing the Performance of F0 Estimators. 1790-1794 - Pradeep Rengaswamy, Gurunath Reddy M., K. Sreenivasa Rao, Pallab Dasgupta:
A Robust Non-Parametric and Filtering Based Approach for Glottal Closure Instant Detection. 1795-1799
Speaker Recognition
- Rahim Saeidi, Ilkka Huhtakallio, Paavo Alku
:
Analysis of Face Mask Effect on Speaker Recognition. 1800-1804 - Elliot Singer, Tyler Campbell, Douglas A. Reynolds:
Data Selection for Within-Class Covariance Estimation. 1805-1809 - Marc Ferras, Srikanth R. Madikeri, Subhadeep Dey, Petr Motlícek
, Hervé Bourlard:
Inter-Task System Fusion for Speaker Recognition. 1810-1814 - Zhenchun Lei, Yanhong Wan, Jian Luo, Yingen Yang:
Mahalanobis Metric Scoring Learned from Weighted Pairwise Constraints in I-Vector Speaker Recognition System. 1815-1819 - Meet H. Soni, Tanvina B. Patel, Hemant A. Patil:
Novel Subband Autoencoder Features for Detection of Spoofed Speech. 1820-1824 - Mitchell McLaren, Diego Castán, Luciana Ferrer, Aaron Lawson:
On the Issue of Calibration in DNN-Based Speaker Recognition Systems. 1825-1829 - Waad Ben Kheder, Driss Matrouf, Moez Ajili, Jean-François Bonastre:
Probabilistic Approach Using Joint Long and Short Session i-Vectors Modeling to Deal with Short Utterances for Speaker Recognition. 1830-1834 - Ahilan Kanagasundaram, David Dean, Sridha Sridharan, Clinton Fookes
, Ivan Himawan
:
Short Utterance Variance Modelling and Utterance Partitioning for PLDA Speaker Verification. 1835-1838 - Nicolai Bæk Thomsen, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan
, Børge Lindberg, Søren Holdt Jensen:
Speaker-Dependent Dictionary-Based Speech Enhancement for Text-Dependent Speaker Verification. 1839-1843 - Chengzhu Yu, Chunlei Zhang, Finnian Kelly
, Abhijeet Sangwan, John H. L. Hansen:
Text-Available Speaker Recognition System for Forensic Applications. 1844-1847 - Qingyang Hong, Lin Li, Lihong Wan, Jun Zhang, Feng Tong:
Transfer Learning for Speaker Verification on Short Utterances. 1848-1852 - Jianbo Ma, Vidhyasaharan Sethu
, Eliathamby Ambikairajah
, Kong-Aik Lee
:
Twin Model G-PLDA for Duration Mismatch Compensation in Text-Independent Speaker Verification. 1853-1857 - Xiao-Lei Zhang:
Universal Background Sparse Coding and Multilayer Bootstrap Network for Speaker Clustering. 1858-1862 - Yao Tian, Meng Cai, Liang He
, Wei-Qiang Zhang, Jia Liu:
Improving Deep Neural Networks Based Speaker Verification Using Unlabeled Data. 1863-1867
Decoding, System Combination
- Naoyuki Kanda, Xugang Lu, Hisashi Kawai:
Maximum a posteriori Based Decoding for CTC Acoustic Models. 1868-1872 - Afsaneh Asaei, Gil Luyet, Milos Cernak, Hervé Bourlard:
Phonetic and Phonological Posterior Search Space Hashing Exploiting Class-Specific Sparsity Structures. 1873-1877 - George Tucker, Minhua Wu, Ming Sun, Sankaran Panchapagesan, Gengshen Fu, Shiv Vitaladevuni:
Model Compression Applied to Small-Footprint Keyword Spotting. 1878-1882 - Angel Mario Castro Martinez, Marc René Schädler:
Why do ASR Systems Despite Neural Nets Still Depend on Robust Features. 1883-1887 - Qing He, Gregory W. Wornell
, Wei Ma:
An Adaptive Multi-Band System for Low Power Voice Command Recognition. 1888-1892 - Michael Price, Anantha P. Chandrakasan, James R. Glass:
Memory-Efficient Modeling and Search Techniques for Hardware ASR Decoders. 1893-1897 - Jingzhou Yang, Anton Ragni, Mark J. F. Gales, Kate M. Knill:
Log-Linear System Combination Using Structured Support Vector Machines. 1898-1902 - Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu
:
Efficient Segmental Cascades for Speech Recognition. 1903-1907 - Sirui Xu, Eric Fosler-Lussier:
A WFST Framework for Single-Pass Multi-Stream Decoding. 1908-1912 - William Hartmann, Le Zhang, Kerri Barnes, Roger Hsiao, Stavros Tsakalidis, Richard M. Schwartz:
Comparison of Multiple System Combination Techniques for Keyword Spotting. 1913-1917 - Masato Obara, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee
, Yoshiaki Itoh:
Rescoring by Combination of Posteriorgram Score and Subword-Matching Score for Use in Query-by-Example. 1918-1922 - Zhehuai Chen, Wei Deng, Tao Xu, Kai Yu:
Phone Synchronous Decoding with CTC Lattice. 1923-1927
Special Session: Clinical and Neuroscience-Inspired Vocal Biomarkers of Neurological and Psychiatric Disorders
- Saurabh Sahu, Carol Y. Espy-Wilson:
Speech Features for Depression Detection. 1928-1932 - Tomás Arias-Vergara
, Juan Camilo Vásquez-Correa
, Juan Rafael Orozco-Arroyave
, Jesús Francisco Vargas-Bonilla
, Elmar Nöth:
Parkinson's Disease Progression Assessment from Speech Using GMM-UBM. 1933-1937 - Jochen Weiner, Christian Herff, Tanja Schultz
:
Speech-Based Detection of Alzheimer's Disease in Conversational German. 1938-1942 - Sharifa Alghowinem
, Roland Goecke
, Julien Epps
, Michael Wagner, Jeffrey F. Cohn:
Cross-Cultural Depression Recognition from Vocal Biomarkers. 1943-1947 - Luke Zhou, Kathleen C. Fraser
, Frank Rudzicz
:
Speech Recognition in Alzheimer's Disease and in its Assessment. 1948-1952 - Florian B. Pokorny
, Peter B. Marschik, Christa Einspieler, Björn W. Schuller
:
Does She Speak RTT? Towards an Earlier Identification of Rett Syndrome Through Intelligent Pre-Linguistic Vocalisation Analysis. 1953-1957 - Massimo Pettorino
, Maria Grazia Busà, Elisa Pellegrino:
Speech Rhythm in Parkinson's Disease: A Study on Italian. 1958-1961
Show & Tell Session 5
- Xavier Anguera, Vu Van:
English Language Speech Assistant. 1962-1963 - Allen Guo, Arlo Faria, Korbinian Riedhammer:
Remeeting - Deep Insights to Conversations. 1964-1965 - Paul Yaozhu Chan, Minghui Dong, Grace Xue Hui Ho, Haizhou Li:
SERAPHIM Live! - Singing Synthesis for the Performer, the Composer, and the 3D Game Developer. 1966-1967 - Fabrice Malfrère, Olivier Deroo, Emmanuelle Franques, Jonathan Hourez, Nicolas Mazars, Vincent Pagel, Geoffrey Wilfart:
My-Own-Voice: A Web Service That Allows You to Create a Text-to-Speech Voice From Your Own Voice. 1968-1969
Keynote 3: Anne Fernald
- Anne Fernald:
Talking with Kids Really Matters: Early Language Experience Shapes Later Life Chances. 1970
Far-Field Speech Processing
- Tara N. Sainath, Arun Narayanan, Ron J. Weiss, Ehsan Variani, Kevin W. Wilson, Michiel Bacchiani, Izhak Shafran:
Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction. 1971-1975 - Bo Li, Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Michiel Bacchiani:
Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition. 1976-1980 - Hakan Erdogan, John R. Hershey, Shinji Watanabe
, Michael I. Mandel, Jonathan Le Roux:
Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks. 1981-1985 - Cristina Guerrero, Georgina Tryfou, Maurizio Omologo
:
Channel Selection for Distant Speech Recognition Exploiting Cepstral Distance. 1986-1990 - Michael I. Mandel, Jon Barker:
Multichannel Spatial Clustering for Robust Far-Field Automatic Speech Recognition in Mismatched Conditions. 1991-1995 - Vijayaditya Peddinti, Vimal Manohar, Yiming Wang, Daniel Povey, Sanjeev Khudanpur:
Far-Field ASR Without Parallel Data. 1996-2000
Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception, Sincerity & Native Language
- Björn W. Schuller
, Stefan Steidl
, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron C. Elkins
, Yue Zhang, Eduardo Coutinho
, Keelan Evanini:
The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language. 2001-2005 - Björn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron C. Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini:
The Deception Sub-Challenge: The Data. - Sarah Ita Levitan, Guozhen An, Min Ma, Rivka Levitan, Andrew Rosenberg, Julia Hirschberg:
Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection. 2006-2010 - Shahin Amiriparian
, Jouni Pohjalainen, Erik Marchi, Sergey Pugachevskiy, Björn W. Schuller
:
Is Deception Emotional? An Emotion-Driven Predictive Approach. 2011-2015 - Claude Montacié
, Marie-José Caraty:
Prosodic Cues and Answer Type Detection for the Deception Sub-Challenge. 2016-2020 - Björn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini:
The Sincerity Sub-Challenge: The Data. - Brandon M. Booth, Rahul Gupta, Pavlos Papadopoulos, Ruchir Travadi, Shrikanth S. Narayanan:
Automatic Estimation of Perceived Sincerity from Spoken Language. 2021-2025 - Gábor Gosztolya, Tamás Grósz
, György Szaszák, László Tóth
:
Estimating the Sincerity of Apologies in Speech by DNN Rank Learning and Prosodic Analysis. 2026-2030 - Hung-Shin Lee, Yu Tsao
, Chi-Chun Lee
, Hsin-Min Wang
, Wei-Cheng Lin, Wei-Chen Chen, Shan-Wen Hsiao, Shyh-Kang Jeng:
Minimization of Regression and Ranking Losses with Shallow Neural Networks on Automatic Sincerity Evaluation. 2031-2035 - Robert Herms:
Prediction of Deception and Sincerity from Speech Using Automatic Phone Recognition-Based Features. 2036-2040 - Yue Zhang, Felix Weninger, Zhao Ren, Björn W. Schuller
:
Sincerity and Deception in Speech: Two Sides of the Same Coin? A Transfer- and Multi-Task Learning Perspective. 2041-2045 - Heysem Kaya
, Alexey A. Karpov
:
Fusing Acoustic Feature Representations for Computational Paralinguistics Tasks. 2046-2050
Special Session: Speech, Audio, and Language Processing Techniques Applied to Bird and Animal Vocalizations
- Naomi Harte, Peter Jancovic, Karl-L. Schuchmann:
Introduction. - Naomi Harte, Peter Jancovic, Karl-L. Schuchmann:
Poster Overview Presentations. - Naomi Harte, Peter Jancovic, Karl-L. Schuchmann:
Discussion. - Naomi Harte, Peter Jancovic, Karl-L. Schuchmann:
Closing Remarks.
Dialogue Systems and Analysis of Dialogue
- Merwan Barlier, Romain Laroche, Olivier Pietquin:
A Stochastic Model for Computer-Aided Human-Human Dialogue. 2051-2055 - Gaël Lejeune, François Rioult, Bruno Crémilleux:
Highlighting Psychological Features for Predicting Child Interjections During Story Telling. 2056-2059 - Kai Sun, Su Zhu, Lu Chen, Siqiu Yao, Xueyang Wu, Kai Yu:
Hybrid Dialogue State Tracking for Real World Human-to-Human Dialogues. 2060-2064 - Gaurav Fotedar, Aditya Gaonkar P., Saikat Chatterjee, Prasanta Kumar Ghosh:
Automatic Recognition of Social Roles Using Long Term Role Transitions in Small Group Interactions. 2065-2069 - Paul Van Eecke
, Raquel Fernández:
On the Influence of Gender on Interruptions in Multiparty Dialogue. 2070-2074 - Ian Beaver, Cynthia Freeman:
Detection of User Escalation in Human-Computer Interactions. 2075-2079
Interaction between Speech Production and Perception
- Marie-Lou Barnaud, Julien Diard, Pierre Bessière
, Jean-Luc Schwartz:
Assessing Idiosyncrasies in a Bayesian Model of Speech Communication. 2080-2084 - Maria K. Wolters
, Najoung Kim, Jung-Ho Kim, Sarah E. MacPherson
, Jong-Chan Park
:
Prosodic and Linguistic Analysis of Semantic Fluency Data: A Window into Speech Production and Cognition. 2085-2089 - William F. Katz
, Divya Prabhakaran:
Sensorimotor Response to Visual Imagery of Tongue Displacement. 2090-2094 - Tiphaine Caudrelier
, Pascal Perrier
, Jean-Luc Schwartz, Amélie Rochet-Capellan:
Does Auditory-Motor Learning of Speech Transfer from the CV Syllable to the CVCV Word? 2095-2099 - Antje Schweitzer, Michael Walsh:
Exemplar Dynamics in Phonetic Convergence of Speech Rate. 2100-2104 - Outi Tuomainen
, Valérie Hazan
:
Articulation Rate in Adverse Listening Conditions in Younger and Older Adults. 2105-2109
Multimodal Processing
- Julia Olcoz, Oscar Saz, Thomas Hain
:
Error Correction in Lightly Supervised Alignment of Broadcast Subtitles. 2110-2114 - Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng, Thomas Hain
:
Automatic Genre and Show Identification of Broadcast Media. 2115-2119 - Guan-Lin Chao, William Chan, Ian R. Lane:
Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments. 2120-2124 - Amit Aides, Hagai Aronowitz:
Text-Dependent Audiovisual Synchrony Detection for Spoofing Detection in Mobile Person Recognition. 2125-2129 - Fei Tao, John H. L. Hansen, Carlos Busso
:
Improving Boundary Estimation in Audiovisual Speech Activity Detection Using Bayesian Information Criterion. 2130-2134 - Sebastian Gergen, Steffen Zeiler, Ahmed Hussen Abdelaziz, Robert M. Nickel
, Dorothea Kolossa
:
Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR. 2135-2139
Pitch, Tone, and Music
- Anna M. Kruspe
:
Retrieval of Textual Song Lyrics from Sung Inputs. 2140-2144 - Jiahong Yuan, Mark Y. Liberman
:
Phoneme, Phone Boundary, and Tone in Automatic Scoring of Mandarin Proficiency. 2145-2149 - Charles Chen, Razvan C. Bunescu, Li Xu, Chang Liu:
Tone Classification in Mandarin Chinese Using Convolutional Neural Networks. 2150-2154 - Vishala Pannala
, G. Aneeja, Sudarsana Reddy Kadiri
, B. Yegnanarayana:
Robust Estimation of Fundamental Frequency Using Single Frequency Filtering Approach. 2155-2159 - Ryunosuke Daido, Yuji Hisaminato:
A Fast and Accurate Fundamental Frequency Estimator Using Recursive Moving Average Filters. 2160-2164 - Prateek Verma, Ronald W. Schafer:
Frequency Estimation from Waveforms Using Multi-Layered Neural Networks. 2165-2169
Speaker Diarization and Recognition
- Douglas E. Sturim, William M. Campbell:
Speaker Linking and Applications Using Non-Parametric Hashing Methods. 2170-2174 - Gaël Le Lan, Delphine Charlet, Anthony Larcher, Sylvain Meignier:
Iterative PLDA Adaptation for Speaker Diarization. 2175-2179 - Harishchandra Dubey
, Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen:
A Speaker Diarization System for Studying Peer-Led Team Learning Groups. 2180-2184 - Rosanna Milner
, Thomas Hain
:
DNN-Based Speaker Clustering for Speaker Diarisation. 2185-2189 - Itshak Lapidot, Jean-François Bonastre:
On the Importance of Efficient Transition Modeling for Speaker Diarization. 2190-2193 - Gregory Sell, Alan McCree, Daniel Garcia-Romero:
Priors for Speaker Counting and Diarization with AHC. 2194-2198 - Nauman Dawalatabad, Srikanth R. Madikeri, C. Chandra Sekhar, Hema A. Murthy:
Two-Pass IB Based Speaker Diarization System Using Meeting-Specific ANN Based Features. 2199-2203 - Zeyan Oo, Yuta Kawakami, Longbiao Wang, Seiichi Nakagawa, Xiong Xiao, Masahiro Iwahashi:
DNN-Based Amplitude and Phase Feature Enhancement for Noise Robust Speaker Identification. 2204-2208 - Ulrich Scherhag, Andreas Nautsch
, Christian Rathgeb, Christoph Busch
:
Unit-Selection Attack Detection Based on Unfiltered Frequency-Domain Features. 2209-2213 - Mairym Lloréns Monteserín, Jason D. Zevin
:
Investigating the Impact of Dialect Prestige on Lexical Decision. 2214-2218 - Jinxi Guo, Gary Yeung, Deepak Muralidharan, Harish Arsikere, Amber Afshan, Abeer Alwan:
Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features. 2219-2222 - Hang Su, Steven Wegmann:
Factor Analysis Based Speaker Verification Using ASR. 2223-2227 - Jeroen Zegers
, Hugo Van hamme
:
Joint Sound Source Separation and Speaker Recognition. 2228-2232 - Naveen Kumar, Md. Nasir, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Robust Multichannel Gender Classification from Speech in Movie Audio. 2233-2237
Speech Synthesis Poster
- Xavi Gonzalvo, Siamak Tazari, Chun-an Chan, Markus Becker, Alexander Gutkin
, Hanna Silén:
Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer. 2238-2242 - Wenfu Wang, Shuang Xu, Bo Xu:
First Step Towards End-to-End Parametric TTS Synthesis: Generating Spectral Parameters with Neural Attention. 2243-2247 - Zhengqi Wen, Ya Li, Jianhua Tao:
The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network Based Speech Synthesis. 2248-2252 - Eunwoo Song, Frank K. Soong, Hong-Goo Kang:
Improved Time-Frequency Trajectory Excitation Vocoder for DNN-Based Speech Synthesis. 2253-2257 - Yamato Ohtani, Koichiro Mori, Masahiro Morita:
Voice Quality Control Using Perceptual Expressions for Statistical Parametric Speech Synthesis Based on Cluster Adaptive Training. 2258-2262 - Felipe Espic, Cassia Valentini-Botinhao, Zhizheng Wu, Simon King:
Waveform Generation Based on Signal Reshaping for Statistical Parametric Speech Synthesis. 2263-2267 - Yi Zhao, Daisuke Saito, Nobuaki Minematsu:
Speaker Representations for Speaker Adaptation in Multiple Speakers' BLSTM-RNN-Based Speech Synthesis. 2268-2272 - Heiga Zen
, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, Przemyslaw Szczepaniak:
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices. 2273-2277 - Nobukatsu Hojo, Yusuke Ijima, Hideyuki Mizuno:
An Investigation of DNN-Based Speech Synthesis Using Speaker Codes. 2278-2282 - Lauri Juvela
, Xin Wang
, Shinji Takaki, Manu Airaksinen
, Junichi Yamagishi, Paavo Alku
:
Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks. 2283-2287 - Kentaro Tachibana, Tomoki Toda
, Yoshinori Shiga, Hisashi Kawai:
Model Integration for HMM- and DNN-Based Speech Synthesis Using Product-of-Experts Framework. 2288-2292 - Blaise Potard, Matthew P. Aylett, David A. Baude, Petr Motlícek
:
Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN. 2293-2297 - Alexandros Lazaridis, Milos Cernak, Philip N. Garner
:
Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody. 2298-2302 - Chen-Yu Chiang:
On Smoothing and Enhancing Dynamics of Pitch Contours Represented by Discrete Orthogonal Polynomials for Prosody Generation. 2303-2307 - Anandaswarup Vadapalli, Suryakanth V. Gangashetty
:
An Investigation of Recurrent Neural Network Architectures Using Word Embeddings for Phrase Break Prediction. 2308-2312 - Hao Liu, Heng Lu, Xu Shao, Yi Xu:
Model-Based Parametric Prosody Synthesis with Deep Neural Network. 2313-2317
Language Model Adaptation
- Thomas Drugman, Janne Pylkkönen, Reinhard Kneser:
Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models. 2318-2322 - Vitaly Kuznetsov, Hank Liao, Mehryar Mohri, Michael Riley, Brian Roark:
Learning N-Gram Language Models from Uncertain Data. 2323-2327 - Barlas Oguz, Issac Alphonso, Shuangyu Chang:
Entropy Based Pruning for Non-Negative Matrix Based Language Models with Contextual Features. 2328-2332 - Siva Reddy Gangireddy, Pawel Swietojanski
, Peter Bell, Steve Renals
:
Unsupervised Adaptation of Recurrent Neural Network Language Models. 2333-2337 - Yoni Halpern, Keith B. Hall
, Vlad Schogol, Michael Riley, Brian Roark, Gleb Skobeltsyn, Martin Bäuml:
Contextual Prediction Models for Speech Recognition. 2338-2342 - Salil Deena, Madina Hasan, Mortaza Doulaty, Oscar Saz, Thomas Hain
:
Combining Feature and Model-Based Adaptation of RNNLMs for Multi-Genre Broadcast Speech Recognition. 2343-2347
Show & Tell Session 6
- Michael C. Brady:
A Low Cost Desktop Robot and Tele-Presence Device for Interactive Speech Research. 2348-2349 - Simon Stone, Peter Birkholz:
Silent-Speech Command Word Recognition Using Electro-Optical Stomatography. 2350-2351 - Petr Stanislav, Jan Svec, Pavel Ircing:
An Engine for Online Video Search in Large Archives of the Holocaust Testimonies. 2352-2353 - Piero Cosi, Giulio Paci, Giacomo Sommavilla, Fabio Tesser:
MIVOQ-PTTS - A Revolutionary New Way of Thinking TTS. 3888-3889
Robustness in Speech Processing
- Katerina Zmolíková
, Martin Karafiát
, Karel Veselý, Marc Delcroix
, Shinji Watanabe
, Lukás Burget
, Jan Cernocký
:
Data Selection by Sequence Summarizing Neural Network in Mismatch Condition Training. 2354-2358 - Souvik Kundu, Khe Chai Sim, Mark J. F. Gales:
Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition. 2359-2363 - Konstantin Markov, Tomoko Matsui
:
Robust Speech Recognition Using Generalized Distillation Framework. 2364-2368 - Yusuke Shinohara:
Adversarial Multi-Task Learning of Deep Neural Networks for Robust Speech Recognition. 2369-2372 - Víctor Poblete, Juan Pablo Escudero, Josué Fredes
, José Novoa, Richard M. Stern
, Simon King, Néstor Becerra Yoma:
The Use of Locally Normalized Cepstral Coefficients (LNCC) to Improve Speaker Recognition Accuracy in Highly Reverberant Rooms. 2373-2377 - William Hartmann, Tim Ng, Roger Hsiao, Stavros Tsakalidis, Richard M. Schwartz:
Two-Stage Data Augmentation for Low-Resourced Speech Recognition. 2378-2382
Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception, Sincerity & Native Language
- Björn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini:
The Native Language Sub-Challenge: The Data. - Avni Rajpal, Tanvina B. Patel, Hardik B. Sailor
, Maulik C. Madhavi
, Hemant A. Patil, Hiroya Fujisaki:
Native Language Identification Using Spectral and Source-Based Features. 2383-2387 - Yishan Jiao, Ming Tu, Visar Berisha
, Julie M. Liss:
Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short Term Features. 2388-2392 - Gil Keren, Jun Deng, Jouni Pohjalainen, Björn W. Schuller
:
Convolutional Neural Networks with Data Augmentation for Classifying Speakers' Native Language. 2393-2397 - Mohammed Senoussaoui, Patrick Cardinal, Najim Dehak
, Alessandro L. Koerich:
Native Language Detection Using the I-Vector Framework. 2398-2402 - Mark A. Huckvale:
Within-Speaker Features for Native Language Recognition in the Interspeech 2016 Computational Paralinguistics Challenge. 2403-2407 - Prashanth Gurunath Shivakumar, Sandeep Nallan Chakravarthula, Panayiotis G. Georgiou:
Multimodal Fusion of Multirate Acoustic, Prosodic, and Lexical Speaker Characteristics for Native Language Identification. 2408-2412 - Alberto Abad
, Eugénio Ribeiro
, Fábio N. Kepler, Ramón Fernandez Astudillo, Isabel Trancoso
:
Exploiting Phone Log-Likelihood Ratio Features for the Detection of the Native Language of Non-Native English Speakers. 2413-2417 - Gábor Gosztolya, Tamás Grósz
, Róbert Busa-Fekete, László Tóth
:
Determining Native Language and Deception Using Phonetic Features and Classifier Combination. 2418-2422 - Björn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini:
The INTERSPEECH 2016 Computational Paralinguistics Challenge: A Summary of Results. - Björn W. Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini:
Discussion.
Acoustic and Articulatory Phonetics
- Marija Tabain
, Richard Beare
:
A Preliminary Ultrasound Study of Nasal and Lateral Coronals in Arrernte. 2423-2427 - Asterios Toutios, Sajan Goud Lingala
, Colin Vaz, Jangwon Kim, John H. Esling, Patricia A. Keating, Matthew Gordon, Dani Byrd
, Louis Goldstein, Krishna S. Nayak
, Shrikanth S. Narayanan:
Illustrating the Production of the International Phonetic Alphabet Sounds Using Fast Real-Time Magnetic Resonance Imaging. 2428-2432 - Margaret E. L. Renwick, Ioana Vasilescu, Camille Dutrey, Lori Lamel, Bianca Vieru:
Marginal Contrast Among Romanian Vowels: Evidence from ASR and Functional Load. 2433-2437 - Shuanglin Fan, Kiyoshi Honda, Jianwu Dang, Hui Feng:
Effects of Subglottal-Coupling and Interdental-Space on Formant Trajectories During Front-to-Back Vowel Transitions in Chinese. 2438-2442 - Mairym Lloréns Monteserín, Shrikanth S. Narayanan, Louis Goldstein:
Perceptual Lateralization of Coda Rhotic Production in Puerto Rican Spanish. 2443-2447 - Hao Yi, Sam Tilsen
:
Interaction Between Lexical Tone and Intonation: An EMA Study. 2448-2452
Speech Synthesis Oral I: Neural Networks
- Huaiping Ming, Dong-Yan Huang, Lei Xie, Jie Wu, Minghui Dong, Haizhou Li
:
Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion. 2453-2457 - Ausdang Thangthai, Ben Milner, Sarah Taylor:
Visual Speech Synthesis Using Dynamic Visemes, Contextual Features and DNNs. 2458-2462 - Srikanth Ronanki, Gustav Eje Henter, Zhizheng Wu, Simon King:
A Template-Based Approach for Speech Synthesis Intonation Generation Using LSTMs. 2463-2467 - Bo Li, Heiga Zen
:
Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis. 2468-2472 - Manu Airaksinen
, Bajibabu Bollepalli, Lauri Juvela
, Zhizheng Wu, Simon King, Paavo Alku
:
GlottDNN - A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis. 2473-2477 - Masanari Nishimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
Singing Voice Synthesis Based on Deep Neural Networks. 2478-2482
Speech Quality & Intelligibility
- Tom Bäckström
, Florin Ghido, Johannes Fischer:
Blind Recovery of Perceptual Models in Distributed Speech and Audio Coding. 2483-2487 - Yan Tang
, Martin Cooke:
Glimpse-Based Metrics for Predicting Speech Intelligibility in Additive Noise Conditions. 2488-2492 - Friedemann Köster, Sebastian Möller:
Analyzing the Relation Between Overall Quality and the Quality of Individual Phases in a Telephone Conversation. 2493-2497 - Emma Jokinen, Paavo Alku
:
Intelligibility Enhancement at the Receiving End of the Speech Transmission System - Effects of Far-End Noise Reduction. 2498-2502 - Mario Ganzeboom, Marjoke Bakker, Catia Cucchiarini, Helmer Strik
:
Intelligibility of Disordered Speech: Global and Detailed Scores. 2503-2507 - Maria Koutsogiannaki
, Yannis Stylianou:
Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility in Noise. 2508-2512
Speech Translation and Metadata for Linguistic/Discourse Structure
- Jan Niehues
, Thai Son Nguyen, Eunah Cho, Thanh-Le Ha, Kevin Kilgour, Markus Müller, Matthias Sperber, Sebastian Stüker, Alex Waibel:
Dynamic Transcription for Low-Latency Speech Translation. 2513-2517 - Oliver Adams, Graham Neubig, Trevor Cohn, Steven Bird
:
Learning a Translation Model from Word Lattices. 2518-2522 - Vicky Zayats, Mari Ostendorf, Hannaneh Hajishirzi:
Disfluency Detection Using a Bidirectional LSTM. 2523-2527 - Xiaoyin Che, Sheng Luo, Haojin Yang, Christoph Meinel:
Sentence Boundary Detection Based on Parallel Lexical and Acoustic Models. 2528-2532 - Quoc Truong Do, Sakriani Sakti, Graham Neubig, Satoshi Nakamura:
Transferring Emphasis in Speech Translation Using Hard-Attentional Neural Network Models. 2533-2537 - Ngoc-Tien Le, Christophe Servan
, Benjamin Lecouteux, Laurent Besacier:
Better Evaluation of ASR in Speech Translation Context Using Word Embeddings. 2538-2542
Speech Coding and Audio Processing for Noise Reduction
- Srikanth Korse
, Tobias Jähnel, Tom Bäckström
:
Entropy Coding of Spectral Envelopes for Speech and Audio Coding Using Distribution Quantization. 2543-2547 - Stéphane Villette, Sen Li, Pravin Ramadas, Daniel J. Sinder:
An Objective Evaluation Methodology for Blind Bandwidth Extension. 2548-2552 - Anssi Rämö, Antti Kurittu, Henri Toukomaa:
EVS Channel Aware Mode Robustness to Frame Erasures. 2553-2557 - Shadi Pirhosseinloo, Kostas Kokkinakis:
An Interaural Magnification Algorithm for Enhancement of Naturally-Occurring Level Differences. 2558-2561 - Hendrik Kayser
, Niko Moritz, Jörn Anemüller:
Probabilistic Spatial Filter Estimation for Signal Enhancement in Multi-Channel Automatic Speech Recognition. 2562-2566 - Youna Ji, Young-Cheol Park:
Improved a priori SAP Estimator in Complex Noisy Environment for Dual Channel Microphone System. 2567-2571 - Kah-Meng Cheong, Yuh-Yuan Wang, Tai-Shih Chi:
A Spectral Modulation Sensitivity Weighted Pre-Emphasis Filter for Active Noise Control System. 2572-2576 - Ganji Sreeram, Rohit Sinha
:
Semi-Coupled Dictionary Based Automatic Bandwidth Extension Approach for Enhancing Children's ASR. 2577-2581
Special Session: Speech, Audio, and Language Processing Techniques Applied to Bird and Animal Vocalizations
- Jordi Bonada
, Robert Lachlan
, Merlijn Blaauw
:
Bird Song Synthesis Based on Hidden Markov Models. 2582-2586 - Kantapon Kaewtip, Charles E. Taylor, Abeer Alwan:
Noise-Robust Hidden Markov Models for Limited Training Data for Within-Species Bird Phrase Classification. 2587-2591 - Alan Wisler
, Laura J. Brattain, Rogier Landman
, Thomas F. Quatieri:
A Framework for Automated Marmoset Vocalization Detection and Classification. 2592-2596 - Ikkyu Aihara, Takeshi Mizumoto, Hiromitsu Awano, Hiroshi G. Okuno
:
Call Alternation Between Specific Pairs of Male Frogs Revealed by a Sound-Imaging Method in Their Natural Habitat. 2597-2601 - Patrice Guyot, Alice Eldridge, Ying Chen Eyre-Walker, Alison Johnston
, Thomas Pellegrini, Mika Peck
:
Sinusoidal Modelling for Ecoacoustics. 2602-2606 - Dan Stowell
, Veronica Morfi, Lisa F. Gill
:
Individual Identity in Songbirds: Signal Representations and Metric Learning for Locating the Information in Complex Corvid Calls. 2607-2611 - Peter Jancovic, Münevver Köküer:
Recognition of Multiple Bird Species Based on Penalised Maximum Likelihood and HMM-Based Modelling of Individual Vocalisation Elements. 2612-2616 - Ciira Wa Maina
:
Cost Effective Acoustic Monitoring of Bird Species. 2617-2620 - Daniel Kohlsdorf, Denise Herzing, Thad Starner:
Feature Learning and Automatic Segmentation for Dolphin Communication Analysis. 2621-2625 - Reiji Suzuki, Shiho Matsubayashi
, Kazuhiro Nakadai, Hiroshi G. Okuno
:
Localizing Bird Songs Using an Open Source Robot Audition System with a Microphone Array. 2626-2630 - Frank Kurth
:
Robust Detection of Multiple Bioacoustic Events with Repetitive Structures. 2631-2635 - Roger K. Moore
:
A Real-Time Parametric General-Purpose Mammalian Vocal Synthesiser. 2636-2640 - Colm O'Reilly, Nicola M. Marples, David J. Kelly
, Naomi Harte
:
YIN-Bird: Improved Pitch Tracking for Bird Vocalisations. 2641-2645
Learning, Education and Different Speech
- Yao-Chi Hsu, Ming-Han Yang, Hsiao-Tsung Hung, Berlin Chen:
Mispronunciation Detection Leveraging Maximum Performance Criterion Training of Acoustic Models and Decision Functions. 2646-2650 - Peter A. Heeman, Rebecca Lunsford
, Andy McMillin, J. Scott Yaruss
:
Using Clinician Annotations to Improve Automatic Speech Recognition of Stuttered Speech. 2651-2655 - Simin Xie, Nan Yan, Ping Yu, Manwa L. Ng, Lan Wang, Zhuanzhuan Ji:
Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale. 2656-2660 - Lauren Ward, Alessandro Stefani, Daniel V. Smith, Andreas Duenser
, Jill Freyne, Barbara Dodd, Angela Morgan
:
Automated Screening of Speech Development Issues in Children by Identifying Phonological Error Patterns. 2661-2665 - Ju Lin, Yanlu Xie, Jinsong Zhang
:
Automatic Pronunciation Evaluation of Non-Native Mandarin Tone by Using Multi-Level Confidence Measures. 2666-2670 - Myung Jong Kim, Jun Wang, Hoirin Kim:
Dysarthric Speech Recognition Using Kullback-Leibler Divergence-Based Hidden Markov Model. 2671-2675 - Anne S. Warlaumont, Heather L. Ramsdell-Hudock:
Detection of Total Syllables and Canonical Syllables in Infant Vocalizations. 2676-2680 - Duc Le, Emily Mower Provost
:
Improving Automatic Recognition of Aphasic Speech with AphasiaBank. 2681-2685 - Vincent Laborde, Thomas Pellegrini, Lionel Fontan
, Julie Mauclair
, Halima Sahraoui, Jérôme Farinas
:
Pronunciation Assessment of Japanese Learners of French with GOP Scores and Phonetic Information. 2686-2690 - Sean Robertson, Cosmin Munteanu, Gerald Penn
:
Pronunciation Error Detection for New Language Learners. 2691-2695 - Hongwei Ding, Xinping Xu:
L2 English Rhythm in Read Speech by Chinese Students. 2696-2700
Dialogue Systems and Analysis of Dialogue
- Miao Li
, Zhipeng Chen, Ji Wu:
Improving the Probabilistic Framework for Representing Dialogue Systems with User Response Model. 2701-2705 - Yiping Song, Lili Mou, Rui Yan, Li Yi, Zinan Zhu, Xiaohua Hu, Ming Zhang
:
Dialogue Session Segmentation by Embedding-Enhanced TextTiling. 2706-2710 - Miao Li
, Zhiyang He, Ji Wu:
Target-Based State and Tracking Algorithm for Spoken Dialogue System. 2711-2715 - Sheng-syun Shen, Hung-yi Lee:
Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection. 2716-2720 - Manoj Kumar
, Rahul Gupta, Daniel Bone, Nikolaos Malandrakis, Somer Bishop, Shrikanth S. Narayanan:
Objective Language Feature Analysis in Children with Neurodevelopmental Disorders During Autism Assessment. 2721-2725 - Iñigo Casanueva, Thomas Hain
, Phil D. Green:
Improving Generalisation to New Speakers in Spoken Dialogue State Tracking. 2726-2730 - Bo-Hsiang Tseng, Sheng-syun Shen, Hung-yi Lee, Lin-Shan Lee:
Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine. 2731-2735
Topics in Speech Recognition
- Suman V. Ravuri, Steven Wegmann:
How Neural Network Depth Compensates for HMM Conditional Independence Assumptions in DNN-HMM Acoustic Models. 2736-2740 - Dimitri Palaz, Gabriel Synnaeve, Ronan Collobert:
Jointly Learning to Locate and Classify Words Using Convolutional Networks. 2741-2745 - Raziel Alvarez, Rohit Prabhavalkar
, Anton Bakhtin:
On the Efficient Representation and Execution of Deep Acoustic Models. 2746-2750 - Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahremani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur
:
Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI. 2751-2755 - Martin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf
:
Virtual Adversarial Training Applied to Neural Higher-Order Factors for Phone Classification. 2756-2760 - Jeremy Heng Meng Wong, Mark J. F. Gales:
Sequence Student-Teacher Training of Deep Neural Networks. 2761-2765
Special Session: Realism in Robust Speech Processing
- John H. L. Hansen, Hynek Boril
:
Robustness in Speech, Speaker, and Language Recognition: "You've Got to Know Your Limitations". 2766-2770 - Emma Jokinen, Ulpu Remes
, Paavo Alku
:
The Use of Read versus Conversational Lombard Speech in Spectral Tilt Modeling for Intelligibility Enhancement in Near-End Noise Conditions. 2771-2775 - Douglas E. Sturim, Pedro A. Torres-Carrasquillo, Joseph P. Campbell:
Corpora for the Evaluation of Robust Speaker Recognition Systems. 2776-2780 - Nancy Bertin, Ewen Camberlein, Emmanuel Vincent, Romain Lebarbenchon, Stéphane Peillon, Éric Lamande, Sunit Sivasankaran, Frédéric Bimbot, Irina Illina, Ariane Tom, Sylvain Fleury, Éric Jamet
:
A French Corpus for Distant-Microphone Speech Processing in Real Homes. 2781-2785 - Mirco Ravanelli
, Piergiorgio Svaizer
, Maurizio Omologo
:
Realistic Multi-Microphone Data Simulation for Distant Speech Recognition. 2786-2790 - Hannes Gamper, Mark R. P. Thomas, Lyle Corbin, Ivan Tashev:
Synthesis of Device-Independent Noise Corpora for Realistic ASR Evaluation. 2791-2795 - Fred Richardson, Michael S. Brandstein, Jennifer Melot
, Douglas A. Reynolds:
Speaker Recognition Using Real vs Synthetic Parallel Data for DNN Channel Compensation. 2796-2800 - Dayana Ribas, Emmanuel Vincent, John H. L. Hansen, Emma Jokinen, Mirco Ravanelli, Hannes Gamper, Fred Richardson:
Discussion.
Spoken Word Recognition
- Louis ten Bosch
, Lou Boves, Mirjam Ernestus:
Combining Data-Oriented and Process-Oriented Approaches to Modeling Reaction Time Data. 2801-2805 - Michael McAuliffe, Molly Babel, Charlotte Vaughn
:
Do Listeners Learn Better from Natural Speech? 2806-2810 - Polina Drozdova, Roeland van Hout, Odette Scharenborg:
Processing and Adaptation to Ambiguous Sounds during the Course of Perceptual Learning. 2811-2815 - Florian Hintz
, Odette Scharenborg:
The Effect of Background Noise on the Activation of Phonological and Semantic Information During Spoken-Word Recognition. 2816-2820 - Shinae Kang, Clara Cohen:
Relationships Between Functional Load and Auditory Confusability Under Different Speech Environments. 2821-2825 - Jasmeen Kanwal
, Amanda Ritchart:
The Role of Pitch in Punjabi Word Identification. 2826-2830
Speech Synthesis Oral: High Level Linguistic Features
- Marie Tahon
, Raheel Qader, Gwénolé Lecorvé, Damien Lolive:
Improving TTS with Corpus-Specific Pronunciation Adaptation. 2831-2835 - Amr El-Desoky Mousa, Björn W. Schuller
:
Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks for Grapheme-to-Phoneme Conversion Utilizing Complex Many-to-Many Alignments. 2836-2840 - Daan van Esch, Mason Chua, Kanishka Rao:
Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks. 2841-2845 - Maël Pouget, Olha Nahorna, Thomas Hueber, Gérard Bailly:
Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis. 2846-2850 - Rasmus Dall, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
Redefining the Linguistic Context Feature Set for HMM and DNN TTS Through Position and Parsing. 2851-2855 - Xin Wang
, Shinji Takaki, Junichi Yamagishi:
Enhance the Word Vector with Prosodic Information for the Recurrent Neural Network Based TTS System. 2856-2860
Speech Enhancement
- Kwang Myung Jeon, Hong Kook Kim:
Local Sparsity Based Online Dictionary Learning for Environment-Adaptive Speech Enhancement with Nonnegative Matrix Factorization. 2861-2865 - Pavlos Papadopoulos, Colin Vaz, Shrikanth S. Narayanan:
Noise Aware and Combined Noise Models for Speech Denoising in Unknown Noise Conditions. 2866-2869 - Seyedmahdad Mirsamadi
, Ivan Tashev:
Causal Speech Enhancement Combining Data-Driven Learning and Suppression Rule Estimation. 2870-2874 - Alessio Brutti, Antigoni Tsiami
, Athanasios Katsamanis
, Petros Maragos:
A Phase-Based Time-Frequency Masking for Multi-Channel Speech Enhancement in Domestic Environments. 2875-2879 - Petko Nikolov Petkov, Yannis Stylianou:
Generalizing Steady State Suppression for Enhanced Intelligibility Under Reverberation. 2880-2884 - Katsuhiko Yamamoto
, Toshio Irino, Toshie Matsui, Shoko Araki
, Keisuke Kinoshita
, Tomohiro Nakatani:
Speech Intelligibility Prediction Based on the Envelope Power Spectrum Model with the Dynamic Compressive Gammachirp Auditory Filterbank. 2885-2889
Dialogue: Backchannels and Turntaking
- Tatsuya Kawahara
, Takashi Yamaguchi, Koji Inoue, Katsuya Takanashi, Nigel G. Ward:
Prediction and Generation of Backchannel Form for Attentive Listening Systems. 2890-2894 - Rebecca Lunsford
, Peter A. Heeman, Emma Rennie:
Measuring Turn-Taking Offsets in Human-Human Dialogues. 2895-2899 - Tomer Meshorer, Peter A. Heeman:
Using Past Speaker Behavior to Better Predict Turn Transitions. 2900-2904 - Gérard Bailly, Frédéric Elisei, Alexandra Juphard, Olivier Moreaud:
Quantitative Analysis of Backchannels Uttered by an Interviewer During Neuropsychological Tests. 2905-2909 - Shammur Absar Chowdhury, Evgeny A. Stepanov
, Giuseppe Riccardi:
Predicting User Satisfaction from Turn-Taking in Spoken Conversations. 2910-2914 - Catharine Oertel, Joakim Gustafson, Alan W. Black:
Towards Building an Attentive Artificial Listener: On the Perception of Attentiveness in Feedback Utterances. 2915-2919
Language Recognition
- Youngjune L. Gwon, William M. Campbell, Douglas E. Sturim, H. T. Kung:
Language Recognition via Sparse Coding. 2920-2924 - Sarith Fernando, Vidhyasaharan Sethu
, Eliathamby Ambikairajah
:
A Feature Normalisation Technique for PLLR Based Language Identification Systems. 2925-2929 - Mounika K. V., Sivanand Achanta, Lakshmi H. R.
, Suryakanth V. Gangashetty
, Anil Kumar Vuppala:
An Investigation of Deep Neural Network Architectures for Language Recognition in Indian Languages. 2930-2933 - Ahmed Ali, Najim Dehak
, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James R. Glass, Peter Bell, Steve Renals
:
Automatic Dialect Detection in Arabic Broadcast Speech. 2934-2938 - Raymond W. M. Ng, Bhusan Chettri
, Thomas Hain
:
Combining Weak Tokenisers for Phonotactic Language Recognition in a Resource-Constrained Setting. 2939-2943 - Wang Geng, Wenfu Wang, Yuanyuan Zhao, Xinyuan Cai, Bo Xu:
End-to-End Language Identification Using Attention-Based Recurrent Neural Networks. 2944-2948 - Hesam Sagha, Pavel Matejka, Maryna Gavryukova, Filip Povolný, Erik Marchi, Björn W. Schuller
:
Enhancing Multilingual Recognition of Emotion in Speech by Language Identification. 2949-2953
Speech and Audio Segmentation and Classification
- Seongkyu Mun, Suwon Shon, Wooil Kim, Hanseok Ko
:
Deep Neural Network Bottleneck Features for Acoustic Event Recognition. 2954-2957 - Antonio Origlia
, Francesco Cutugno
:
Combining Energy and Cross-Entropy Analysis for Nuclear Segments Detection. 2958-2962 - Roland Maas, Sree Hari Krishnan Parthasarathi, Brian John King, Ruitong Huang, Björn Hoffmeister:
Anchored Speech Detection. 2963-2967 - Mahesh Kumar Nandwana, Taufiq Hasan
:
Towards Smart-Cars That Can Listen: Abnormal Acoustic Event Detection on the Road. 2968-2971 - K. V. Vijay Girish, A. G. Ramakrishnan
, T. V. Ananthapadmanabha:
Hierarchical Classification of Speaker and Background Noise and Estimation of SNR Using Sparse Representation. 2972-2976 - Haomin Zhang, Ian McLoughlin
, Yan Song:
Robust Sound Event Detection in Continuous Audio Environments. 2977-2981 - Naoya Takahashi, Michael Gygli, Beat Pfister, Luc Van Gool:
Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition. 2982-2986 - Stefan Meier, Walter Kellermann:
Artificial Neural Network-Based Feature Combination for Spatial Voice Activity Detection. 2987-2991 - Tomi Kinnunen, Alexey Sholokhov, Elie Khoury
, Dennis Alexander Lehmann Thomsen, Md. Sahidullah
, Zheng-Hua Tan
:
HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors. 2992-2996 - Florian B. Pokorny
, Robert Peharz
, Wolfgang Roth, Matthias Zöhrer, Franz Pernkopf
, Peter B. Marschik, Björn W. Schuller
:
Manual versus Automated: The Challenging Routine of Infant Vocalisation Segmentation in Home Videos to Study Neuro(mal)development. 2997-3001 - Luciana Ferrer, Martin Graciarena:
Minimizing Annotation Effort for Adaptation of Speech-Activity Detection Systems. 3002-3006
New Products and Services
- Roger K. Moore
, Hui Li, Shih-Hao Liao:
Progress and Prospects for Spoken Language Technology: What Ordinary People Think. 3007-3011 - Roger K. Moore
, Ricard Marxer
:
Progress and Prospects for Spoken Language Technology: Results from Four Sexennial Surveys. 3012-3016 - Purushotam G. Radadia, Rahul Kumar, Kanika Kalra, Shirish Karande, Sachin Lodha:
On Employing a Highly Mismatched Crowd for Speech Transcription. 3017-3021 - Roger Hsiao, Ralf Meermeier, Tim Ng, Zhongqiang Huang, Maxwell Jordan, Enoch Kan, Tanel Alumäe
, Jan Silovský, William Hartmann, Francis Keith, Omer Lang, Man-Hung Siu, Owen Kimball:
Sage: The New BBN Speech Processing Platform. 3022-3026 - Kang Hyun Lee, Tae Gyoon Kang, Woo Hyun Kang, Nam Soo Kim:
DNN-Based Feature Enhancement Using Joint Training Framework for Robust Multichannel Speech Recognition. 3027-3031 - Michael Wand
, Jürgen Schmidhuber:
Deep Neural Network Frontend for Continuous EMG-Based Speech Recognition. 3032-3036 - Basil Abraham, Srinivasan Umesh
, Neethu Mariam Joy:
Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages. 3037-3041 - Anton Ragni, Edgar Dakin, Xie Chen, Mark J. F. Gales, Kate M. Knill:
Multi-Language Neural Network Language Models. 3042-3046 - Ottokar Tilk, Tanel Alumäe
:
Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration. 3047-3051