


default search action
INTERSPEECH 2011: Florence, Italy
- 12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011, Florence, Italy, August 27-31, 2011. ISCA 2011
Keynote Sessions
Keynote 1
- Julia Hirschberg:
Speaking More Like You: Entrainment in Conversational Speech. 4001
Keynote 2
- Tom M. Mitchell:
Neural Representations of Word Meanings. 4002
Keynote 3
- Alex Pentland:
Signals and Speech. 1-4
Keynote 4: Roundtable - Future and Applications of Speech and Language Technologies for the Good Health of Society
- Gabriele Miceli:
Language Disorders: Viewpoints on a Complex Object. - Björn Granström:
Speech Technology in (Re)Habilitation of Persons with Communication Disabilities. - Hiroshi Ishiguro:
From Teleoperated Androids to Cellphones as Surrogates.
Regular Oral Sessions
Speaker Recognition - Modeling
- Avi Matza:
Skew Gaussian Mixture Models for Speaker Recognition. 5-8 - Orith Toledo-Ronen, Hagai Aronowitz, Ron Hoory, Jason W. Pelecanos, David Nahamoo:
Towards Goat Detection in Text-Dependent Speaker Verification. 9-12 - Jean-François Bonastre, Xavier Anguera Miró, Gabriel Hernández Sierra, Pierre-Michel Bousquet:
Speaker Modeling Using Local Binary Decisions. 13-16 - Hagai Aronowitz, Ron Hoory, Jason W. Pelecanos, David Nahamoo:
New Developments in Voice Biometrics for User Authentication. 17-20 - Miranti Indar Mandasari, Mitchell McLaren, David A. van Leeuwen:
Evaluation of i-vector Speaker Recognition Systems for Forensic Application. 21-24 - Mohammed Senoussaoui, Patrick Kenny, Niko Brümmer, Edward de Villiers, Pierre Dumouchel:
Mixture of PLDA Models in i-vector Space for Gender-Independent Speaker Recognition. 25-28
Speech Perception - Speech Intelligibility
- Nandini Iyer, Douglas Brungart, Brian D. Simpson:
Segregation of Whispered Speech Interleaved with Noise or Speech Maskers. 29-32 - Roi Kliper, Hendrik Kayser, Daphna Weinshall, Israel Nelken, Jörn Anemüller:
Monaural Azimuth Localization Using Spectral Dynamics of Speech. 33-36 - Jan Rennies, Thomas Brand, Birger Kollmeier:
Prediction of Binaural Intelligibility Level Differences in Reverberation. 37-40 - Aurore Gautreau, Michel Hoen, Fanny Meunier:
Let's All Speak Together! Exploring the Impact of Various Languages on the Comprehension of Speech in Multi-Linguistic Babble. 41-44 - Valeriy Shafiro, Stanley Sheft, Robert Risley:
Cross-Rate Variation in the Intelligibility of Dual-Rate Gated Speech in Older Listeners. 45-48 - Chia-ying Lee, James R. Glass, Oded Ghitza:
An Efferent-Inspired Auditory Model Front-End for Speech Recognition. 49-52
Speech Representation and Modelling
- Faten Ben Ali, Laurent Girin, Sonia Djaziri Larbi:
A Long-Term Harmonic Plus Noise Model for Speech Signals. 53-56 - Alan Ó Cinnéide, David Dorran, Mikel Gainza, Eugene Coyle:
A Frequency Domain Approach to ARX-LF Voiced Speech Parameterization and Synthesis. 57-60 - Vikram Ramanarayanan, Athanasios Katsamanis, Shrikanth S. Narayanan:
Automatic Data-Driven Learning of Articulatory Primitives from Real-Time MRI Data Using Convolutive NMF with Sparseness Constraints. 61-64 - Dong Wang, Ravichander Vipperla, Nicholas W. D. Evans:
Online Pattern Learning for Non-Negative Convolutive Sparse Coding. 65-68 - Nicolas Malyska, Thomas F. Quatieri, Robert B. Dunn:
Sinewave Representations of Nonmodality. 69-72 - Ch. Srikanth Raj, Thippur V. Sreenivas:
Time-Varying Signal Adaptive Transform and IHT Recovery of Compressive Sensed Speech. 73-76
Emotion, Speaking Style, and Social Behavior
- Martin Wöllmer, Felix Weninger, Florian Eyben, Björn W. Schuller:
Acoustic-Linguistic Recognition of Interest in Speech with Bottleneck-BLSTM Nets. 77-80 - Mustafa Erden, Levent M. Arslan:
Automatic Detection of Anger in Human-Human Call Center Dialogs. 81-84 - Keng-hao Chang, Howard Lei, John F. Canny:
Improved Classification of Speaking Styles for Mental Health Monitoring Using Phoneme Dynamics. 85-88 - Matthew Black, Panayiotis G. Georgiou, Athanasios Katsamanis, Brian R. Baucom, Shrikanth S. Narayanan:
"You made me do it": Classification of Blame in Married Couples' Interactions by Fusing Automatically Derived Speech and Language Information. 89-92 - Martijn Goudbeek, Marie Nilsenová:
Context and Priming Effects in the Recognition of Emotion of Old and Young Listeners. 93-96 - Agustín Gravano, Rivka Levitan, Laura Willson, Stefan Benus, Julia Hirschberg, Ani Nenkova:
Acoustic and Prosodic Correlates of Social Behavior. 97-100
HMM-based Speech Synthesis I
- Kyung Hwan Oh, June Sig Sung, Doo Hwa Hong, Nam Soo Kim:
Decision Tree-Based Clustering with Outlier Detection for HMM-Based Speech Synthesis. 101-104 - Hanna Silén, Elina Helander, Moncef Gabbouj:
Prediction of Voice Aperiodicity Based on Spectral Representations in HMM Speech Synthesis. 105-108 - Takashi Nose, Takao Kobayashi:
A Perceptual Expressivity Modeling Technique for Speech Synthesis Based on Multiple-Regression HSMM. 109-112 - Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda:
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis. 113-116 - Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi:
Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-Based Speech Synthesis. 117-120 - Matt Shannon, Heiga Zen, William J. Byrne:
The Effect of Using Normalized Models in Statistical Speech Synthesis. 121-124
Speaker Recognition - Modeling, Automatic Procedures, Analysis I
- Ce Zhang, Rong Zheng, Bo Xu:
Restoring the Residual Speaker Information in Total Variability Modeling for Speaker Verification. 125-128 - Hagai Aronowitz, Oren Barkan:
New Developments in Joint Factor Analysis for Speaker Verification. 129-132 - Joaquin Gonzalez-Rodriguez:
Speaker Recognition Using Temporal Contours in Linguistic Units: The Case of Formant and Formant-Bandwidth Trajectories. 133-136 - Ondrej Glembek, Lukás Burget, Niko Brümmer, Oldrich Plchot, Pavel Matejka:
Discriminatively Trained i-vector Extractor for Speaker Verification. 137-140 - Michelle Hewlett Sanchez, Luciana Ferrer, Elizabeth Shriberg, Andreas Stolcke:
Constrained Cepstral Speaker Recognition Using Matched UBM and JFA Training. 141-144 - Alan McCree, Douglas E. Sturim, Douglas A. Reynolds:
A New Perspective on GMM Subspace Compensation Based on PPCA and Wiener Filtering. 145-148
Speech Perception - Perceptual Learning and Cross-Language Perception
- Odette Scharenborg, Holger Mitterer, James M. McQueen:
Perceptual Learning of Liquids. 149-152 - Annelie Tuinman, Holger Mitterer, Anne Cutler:
The Efficiency of Cross-Dialectal Word Recognition. 153-156 - Minoru Tsuzaki, Keiichi Tokuda, Hisashi Kawai, Jinfu Ni:
Estimation of Perceptual Spaces for Speaker Identities Based on the Cross-Lingual Discrimination Task. 157-160 - Sharon Peperkamp
, Camillia Bouchon:
The Relation Between Perception and Production in L2 Phonological Processing. 161-164 - Maria Paola Bissiri, María Luisa García Lecumberri, Martin Cooke, Jan Volín
:
The Role of Word-Initial Glottal Stops in Recognizing English Words. 165-168 - Caicai Zhang, Gang Peng, William S.-Y. Wang:
Effect of Language Experience on the Categorical Perception of Cantonese Vowel Duration. 169-172
Speech Analysis
- Christian Fischer Pedersen, Ove Andersen, Paul Dalsgaard:
Adaptive Estimation of Zeros of Time-Varying Z-Transforms. 173-176 - John Kane, Christer Gobl:
Identifying Regions of Non-Modal Phonation Using Features of the Wavelet Transform. 177-180 - Xing Fan, Keith W. Godin, John H. L. Hansen:
Acoustic Analysis of Whispered Speech for Phoneme and Speaker Dependency. 181-184 - Afsaneh Asaei, Mohammad Javad Taghizadeh, Hervé Bourlard, Volkan Cevher:
Multi-Party Speech Recovery Exploiting Structured Sparsity Models. 185-188 - Sri Harish Reddy Mallidi, Sriram Ganapathy, Hynek Hermansky:
Modulation Spectrum Analysis for Recognition of Reverberant Speech. 189-192 - Petko Nikolov Petkov, W. Bastiaan Kleijn
, Bert de Vries:
Discrete Choice Models for Non-Intrusive Quality Assessment. 193-196
Speech Enhancement and Dereverberation
- Keisuke Kinoshita, Mehrez Souden, Marc Delcroix, Tomohiro Nakatani:
Single Channel Dereverberation Using Example-Based Speech Enhancement with Uncertainty Decoding Technique. 197-200 - Jan S. Erkelens, Richard Heusdens:
A Statistical Room Impulse Response Model with Frequency Dependent Reverberation Time for Single-Microphone Late Reverberation Suppression. 201-204 - Chenxi Zheng, Tiago H. Falk, Wai-Yip Chan:
An Assessment of the Improvement Potential of Time-Frequency Masking for Speech Dereverberation. 205-208 - Thiago de M. Prego, Amaro A. de Lima, Sergio L. Netto:
Perceptual Improvement of a Two-Stage Algorithm for Speech Dereverberation. 209-212 - Najib Hadir, Friedrich Faubel, Dietrich Klakow:
A Model-Based Spectral Envelope Wiener Filter for Perceptually Motivated Speech Enhancement. 213-216 - Jorge I. Marin-Hurtado, Devangi N. Parikh, David V. Anderson:
Binaural Noise-Reduction Method Based on Blind Source Separation and Perceptual Post Processing. 217-220
ASR - Feature Extraction II
- Tim Ng, Bing Zhang, Spyridon Matsoukas, Long Nguyen:
Region Dependent Transform on MLP Features for Speech Recognition. 221-224 - Martin Heckmann, Claudius Gläser:
Discriminant Sub-Space Projection of Spectro-Temporal Speech Features Based on Maximizing Mutual Information. 225-228 - Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura:
Combining Feature Space Discriminative Training with Long-Term Spectro-Temporal Features for Noise-Robust Speech Recognition. 229-232 - Sumit Chopra, Patrick Haffner, Dimitrios Dimitriadis:
Combining Frame and Segment Level Processing via Temporal Pooling for Phonetic Classification. 233236 - Dong Yu, Michael L. Seltzer:
Improved Bottleneck Features Using Pretrained Deep Neural Networks. 237-240 - Yuan-Fu Liao, Chia-Hsing Lin, We-Der Fang:
Minimum Classification Error Based Spectro-Temporal Feature Extraction for Robust Audio Classification. 241-244
Speaker Recognition - Modeling, Automatic Procedures, Analysis II
- Ce Zhang, Rong Zheng, Bo Xu:
Data-Driven Gaussian Component Selection for Fast GMM-Based Speaker Verification. 245-248 - Daniel Garcia-Romero, Carol Y. Espy-Wilson:
Analysis of i-vector Length Normalization in Speaker Recognition Systems. 249-252 - Weiwu Jiang, Zhifeng Li, Helen M. Meng:
An Analysis Framework Based on Random Subspace Sampling for Speaker Verification. 253-256 - Nicolas Scheffer, Yun Lei, Luciana Ferrer:
Factor Analysis Back Ends for MLLR Transforms in Speaker Recognition. 257-260 - Craig S. Greenberg, Alvin F. Martin, Bradford Barr, George R. Doddington:
Report on Performance Results in the NIST 2010 Speaker Recognition Evaluation. 261-264 - Marcel Kockmann, Luciana Ferrer, Lukás Burget, Jan Cernocký:
iVector Fusion of Prosodic and Cepstral Features for Speaker Verification. 265-268
Speech Production - Articulatory Measurements
- Yoon-Chul Kim, Michael I. Proctor, Shrikanth S. Narayanan, Krishna S. Nayak:
Visualization of Vocal Tract Shape Using Interleaved Real-Time MRI of Multiple Scan Planes. 269-272 - Ralf Winkler, Susanne Fuchs, Pascal Perrier, Mark Tiede:
Biomechanical Tongue Models: An Approach to Studying Inter-Speaker Variability. 273-276 - Jun Wang, Jordan R. Green, Ashok Samal, David Marx:
Quantifying Articulatory Distinctiveness of Vowels. 277-280 - Michael I. Proctor, Adam C. Lammert, Athanasios Katsamanis, Louis M. Goldstein, Christina Hagedorn, Shrikanth S. Narayanan:
Direct Estimation of Articulatory Kinematics from Real-Time Magnetic Resonance Image Sequences. 281-284 - Peter Birkholz, Christiane Neuschaefer-Rube:
Combined Optical Distance Sensing and Electropalatography to Measure Articulation. 285-288 - Santitham Prom-on, Yi Xu, Fang Liu:
Simulating Post-L F0 Bouncing by Modeling Articulatory Dynamics. 289-292
Acoustic Event Detection
- Jürgen T. Geiger, Mohamed Anouar Lakhal, Björn W. Schuller, Gerhard Rigoll:
Learning New Acoustic Events in an HMM-Based System Using MAP Adaptation. 293-296 - Yi Ren Leng, Tran Huy Dat, Norihide Kitaoka, Haizhou Li:
Alternative Frequency Scale Cepstral Coefficient for Robust Sound Event Recognition. 297-300 - Akinori Ito, Akihito Aiba, Masashi Ito, Shozo Makino:
Evaluation of Abnormal Sound Detection using Multi-Stage GMM in Various Environments. 301-304 - Joerg Schmalenstroeer, Markus Bartek, Reinhold Haeb-Umbach:
Unsupervised Learning of Acoustic Events Using Dynamic Time Warping and Hierarchical K-Means++ Clustering. 305-308 - Pradeep Natarajan, Stavros Tsakalidis, Vasant Manohar, Rohit Prasad, Premkumar Natarajan:
Unsupervised Audio Analysis for Categorizing Heterogeneous Consumer Domain Videos. 313-316
Speech Synthesis - Unit Selection and Hybrid Approaches
- Vivek Kumar Rangarajan Sridhar, Ann K. Syrdal, Alistair Conkie, Srinivas Bangalore:
Enriching Text-to-Speech Synthesis Using Automatic Dialog Act Tags. 317-320 - Lukas Latacz, Wesley Mattheyses, Werner Verhelst:
Joint Target and Join Cost Weight Training for Unit Selection Synthesis. 321-324 - Andreas Windmann, Igor Jauk, Fabio Tamburini, Petra Wagner:
Prominence-Based Prosody Prediction for Unit Selection Speech Synthesis. 325-328 - Sathish Pammi, Marc Schröder:
Evaluating the Meaning of Synthesized Listener Vocalizations. 329-332 - Iñaki Sainz, Daniel Erro, Eva Navas, Inma Hernáez:
A Hybrid TTS Approach for Prosody and Acoustic Modules. 333-336 - Alexander Sorin, Slava Shechtman, Vincent Pollet:
Uniform Speech Parameterization for Multi-Form Segment Synthesis. 337-340
Speech Enhancement Analysis and Evaluation
- Ryoichi Miyazaki, Hiroshi Saruwatari, Kiyohiro Shikano:
Theoretical Analysis of Musical Noise and Speech Distortion in Structure-Generalized Parametric Blind Spatial Subtraction Array. 341-344 - Yan Tang, Martin Cooke:
Subjective and Objective Evaluation of Speech Intelligibility Enhancement Under Constant Energy and Duration Constraints. 345-348 - Nagarjuna Reddy Muraka, Chandra Sekhar Seelamantula:
A Risk-Estimation-Based Comparison of Mean Square Error and Itakura-Saito Distortion Measures for Speech Enhancement. 349-352 - Mahdi Triki:
On Noise Tracking for Noise Floor Estimation. 353-356 - Ben Milner:
Maximum a posteriori Estimation of Noise from Non-Acoustic Reference Signals in Very Low Signal-to-Noise Ratio Environments. 357-360 - Ryo Wakisaka, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani:
Blind Speech Prior Estimation for Generalized Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator. 361-364
Speaker Recognition - Analysis and Statistics I
- Kornel Laskowski, Qin Jin:
Harmonic Structure Transform for Speaker Recognition. 365-368 - Hemant A. Patil, Maulik C. Madhavi, Keshab K. Parhi:
Combining Evidence from Spectral and Source-Like Features for Person Recognition from Humming. 369-372 - Yanhua Long, Zhi-Jie Yan, Frank K. Soong, Li-Rong Dai, Wu Guo:
Improvements in Speaker Characterization Using Spectral Subband Energy Based on Harmonic plus Noise Model. 373-376 - Yosef A. Solewicz, Hagai Aronowitz:
Implicit Segmentation in Two-Wire Speaker Recognition. 377-380 - Sibel Yaman, Jason W. Pelecanos, Mohamed Kamal Omar:
Boosting Speaker Recognition Performance with Compact Representations. 381-384 - Carlos Vaquero, Alfonso Ortega
, Eduardo Lleida:
Partitioning of Two-Speaker Conversation Datasets. 385-388
Speech Production - Coarticulation and Speech Timing
- Stefan Benus, Marianne Pouplier:
Jaw Movement in Vowels and Liquids Forming the Syllable Nucleus. 389-392 - Barbara Gili Fivela, Antonio Stella, Sonia D'Apolito, Francesco Sigona:
Coarticulation Across Prosodic Domains in Italian: An Ultrasound Investigation. 393-396 - Juraj Simko, Fred Cummins, Stefan Benus:
Investigating the Stability of Intergestural Timing Relations. 397-400 - Claudio Zmarich, Barbara Gili Fivela, Pascal Perrier, Christophe Savariaux, Graziano Tisato:
Speech Timing Organization for the Phonological Length Contrast in Italian Consonants. 401-404 - Chiara Celata, Silvia Calamai:
Timing in Italian VNC Sequences at Different Speech Rates. 405-408 - Christina Hagedorn, Michael I. Proctor, Louis Goldstein:
Automatic Analysis of Singleton and Geminate Consonant Articulation Using Real-Time Magnetic Resonance Imaging. 409-412
Speech Segmentation
- Yih-Ru Wang:
A Two-Stage Sample-Based Phone Boundary Detector Using Segmental Similarity Features. 413-416 - Qiang Huang, Stephen J. Cox:
Iterative Improvement of Speaker Segmentation in a Noisy Environment Using High-Level Knowledge. 417-420 - Diego Castán, Carlos Vaquero, Alfonso Ortega, David Martínez González, Jesús Antonio Villalba López, Eduardo Lleida:
Hierarchical Audio Segmentation with HMM and Factor Analysis in Broadcast News Domain. 421-424 - Ozlem Kalinli:
Syllable Segmentation of Continuous Speech Using Auditory Attention Cues. 425-428 - Vijayaditya Peddinti, Kishore Prahallad:
Exploiting Phone-Class Specific Landmarks for Refinement of Segment Boundaries in TTS Databases. 429-432 - Agnès Pedone, Juan José Burred, Simon Maller, Pierre Leveau:
Phoneme-Level Text to Audio Synchronization on Speech Signals with Background Music. 433-436
ASR - Acoustic Models II
- Frank Seide, Gang Li, Dong Yu:
Conversational Speech Transcription Using Context-Dependent Deep Neural Networks. 437-440 - Guangsen Wang, Khe Chai Sim:
Sequential Classification Criteria for NNs in Automatic Speech Recognition. 441-444 - Mathew Magimai-Doss, Ramya Rasipuram, Guillermo Aradilla, Hervé Bourlard:
Grapheme-Based Automatic Speech Recognition Using KL-HMM. 445-448 - Joseph Keshet, Chih-Chieh Cheng, Mark Stoehr, David A. McAllester:
Direct Error Rate Minimization of Hidden Markov Models. 449-452 - Xie Sun, Xin Chen, Yunxin Zhao:
On the Effectiveness of Statistical Modeling Based Template Matching Approach for Continuous Speech Recognition. 453-456 - Guangsen Wang, Khe Chai Sim:
Comparison of Smoothing Techniques for Robust Context Dependent Acoustic Modelling in Hybrid NN/HMM Systems. 457-460
Robust Speech Recognition II
- Ramón Fernandez Astudillo, João Paulo da Silva Neto:
Propagation of Uncertainty Through Multilayer Perceptrons for Robust Automatic Speech Recognition. 461-464 - Katariina Mahkonen, Antti Hurmalainen, Tuomas Virtanen, Jort F. Gemmeke:
Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition. 465-468 - Heikki Kallasjoki, Ulpu Remes, Jort F. Gemmeke, Tuomas Virtanen, Kalle J. Palomäki:
Uncertainty Measures for Improving Exemplar-Based Source Separation. 469-472 - Hsien-Cheng Liao, Yuan-Fu Liao, Chin-Hui Lee:
Maximum Confidence Measure Based Interaural Phase Difference Estimation for Noise Masking in Dual-Microphone Robust Speech Recognition. 473-476 - Shirin Badiezadegan, Richard C. Rose:
A Performance Monitoring Approach to Fusing Enhanced Spectrogram Channels in Robust Speech Recognition. 477-480 - Ning Cheng, Xunying Liu, Lan Wang:
Generalized Variable Parameter HMMs for Noise Robust Speech Recognition. 481-484