


default search action
INTERSPEECH 2010: Makuhari, Japan
- Takao Kobayashi, Keikichi Hirose, Satoshi Nakamura:

11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Chiba, Japan, September 26-30, 2010. ISCA 2010
Keynotes
- Steve J. Young:

Still talking to machines (cognitively speaking). 1-10 - Tohru Ifukube:

Sound-based assistive technology supporting "seeing", "hearing" and "speaking" for the disabled and the elderly. 11-19 - Chiu-yu Tseng:

Beyond sentence prosody. 20-29
Special Session: Models of Speech - In Search of Better Representations
- Hosung Nam, Vikramjit Mitra, Mark Tiede, Elliot Saltzman, Louis Goldstein, Carol Y. Espy-Wilson, Mark Hasegawa-Johnson:

A procedure for estimating gestural scores from natural speech. 30-33 - Yen-Liang Shue, Gang Chen, Abeer Alwan:

On the interdependencies between voice quality, glottal gaps, and voice-source related acoustic measures. 34-37 - Hideki Kawahara, Masanori Morise, Toru Takahashi, Hideki Banno, Ryuichi Nisimura, Toshio Irino:

Simplification and extension of non-periodic excitation source representations for high-quality speech manipulation systems. 38-41 - Sadao Hiroya

, Takemi Mochida:
Phase equalization-based autoregressive model of speech signals. 42-45 - Yi Xu, Santitham Prom-on:

Articulatory-functional modeling of speech prosody: a review. 46-49 - Humberto M. Torres, Hansjörg Mixdorff, Jorge A. Gurlekian, Hartmut R. Pfitzinger:

Two new estimation methods for a superpositional intonation model. 50-53
ASR: Acoustic Models I-III
- Simon Wiesler, Georg Heigold, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney:

A discriminative splitting criterion for phonetic decision trees. 54-57 - Mark J. F. Gales, Kai Yu:

Canonical state models for automatic speech recognition. 58-61 - Pierre L. Dognin, John R. Hershey, Vaibhava Goel, Peder A. Olsen:

Restructuring exponential family mixture models. 62-65 - Françoise Beaufays, Vincent Vanhoucke, Brian Strope:

Unsupervised discovery and training of maximally dissimilar cluster models. 66-69 - Khe Chai Sim:

Probabilistic state clustering using conditional random field for context-dependent acoustic modelling. 70-73 - Xie Sun, Yunxin Zhao:

Integrate template matching and statistical modeling for speech recognition. 74-77 - George Saon, Hagen Soltau:

Boosting systems for LVCSR. 1341-1344 - Vaibhava Goel, Tara N. Sainath, Bhuvana Ramabhadran, Peder A. Olsen, David Nahamoo, Dimitri Kanevsky:

Incorporating sparse representation phone identification features in automatic speech recognition using exponential families. 1345-1348 - Xin Chen, Yunxin Zhao:

Integrating MLP features and discriminative training in data sampling based ensemble acoustic modeling. 1349-1352 - Jui-Ting Huang, Mark Hasegawa-Johnson:

Semi-supervised training of Gaussian mixture models by conditional entropy minimization. 1353-1356 - Guangchuan Shi, Yu Shi, Qiang Huo:

A study of irrelevant variability normalization based training and unsupervised online adaptation for LVCSR. 1357-1360 - Roger Hsiao, Florian Metze, Tanja Schultz:

Improvements to generalized discriminative feature transformation for speech recognition. 1361-1364 - Karel Veselý, Lukás Burget, Frantisek Grézl:

Parallel training of neural networks for speech recognition. 2934-2937 - Rita Singh, Benjamin Lambert, Bhiksha Raj:

The use of sense in unsupervised training of acoustic models for ASR systems. 2938-2941 - Jun Du, Yu Hu, Hui Jiang:

Boosted mixture learning of Gaussian mixture HMMs for speech recognition. 2942-2945 - Volker Leutnant, Reinhold Haeb-Umbach:

On the exploitation of hidden Markov models and linear dynamic models in a hybrid decoder architecture for continuous speech recognition. 2946-2949 - Alberto Abad, Thomas Pellegrini, Isabel Trancoso, João Paulo Neto:

Context dependent modelling approaches for hybrid speech recognizers. 2950-2953 - Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi:

A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination. 2954-2957 - Hank Liao, Christopher Alberti, Michiel Bacchiani, Olivier Siohan:

Decision tree state clustering with word and syllable features. 2958-2961 - Hiroshi Fujimura, Takashi Masuko, Mitsuyoshi Tachimori:

A duration modeling technique with incremental speech rate normalization. 2962-2965 - Martin Wöllmer, Yang Sun, Florian Eyben, Björn W. Schuller

:
Long short-term memory networks for noise robust speech recognition. 2966-2969 - Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada:

One-model speech recognition and synthesis based on articulatory movement HMMs. 2970-2973 - Xiaodong Cui, Jian Xue, Pierre L. Dognin, Upendra V. Chaudhari, Bowen Zhou:

Acoustic modeling with bootstrap and restructuring for low-resourced languages. 2974-2977 - Tetsuo Kosaka, Keisuke Goto, Takashi Ito, Masaharu Katoh:

Lecture speech recognition by combining word graphs of various acoustic models. 2978-2981 - Khe Chai Sim, Shilin Liu:

Semi-parametric trajectory modelling using temporally varying feature mapping for speech recognition. 2982-2985 - Dong Yu, Li Deng:

Deep-structured hidden conditional random fields for phonetic recognition. 2986-2989 - Jonathan Malkin, Jeff A. Bilmes:

Semi-supervised learning for improved expression of uncertainty in discriminative classifiers. 2990-2993 - Peder A. Olsen, Vaibhava Goel, Charles A. Micchelli, John R. Hershey:

Modeling posterior probabilities using the linear exponential family. 2994-2997
Spoken Dialogue Systems I, II
- Fabrice Lefèvre, François Mairesse, Steve J. Young:

Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation. 78-81 - Rajesh Balchandran, Leonid Rachevsky, Bhuvana Ramabhadran, Miroslav Novak:

Techniques for topic detection based processing in spoken dialog systems. 82-85 - Senthilkumar Chandramohan, Matthieu Geist, Olivier Pietquin:

Optimizing spoken dialogue management with fitted value iteration. 86-89 - Filip Jurcícek, Blaise Thomson, Simon Keizer, François Mairesse, Milica Gasic, Kai Yu, Steve J. Young:

Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems. 90-93 - Alexander Schmitt, Michael Scholz, Wolfgang Minker, Jackson Liscombe, David Suendermann:

Is it possible to predict task completion in automated troubleshooters?. 94-97 - David Suendermann, Jackson Liscombe, Roberto Pieraccini:

Minimally invasive surgery for spoken dialog systems. 98-101
Spoken Dialogue Systems II
- Ramón López-Cózar, David Griol:

New technique to enhance the performance of spoken dialogue systems based on dialogue states-dependent language models and grammatical rules. 2998-3001 - Lluís F. Hurtado, Joaquin Planells, Encarna Segarra, Emilio Sanchis, David Griol:

A stochastic finite-state transducer approach to spoken dialog management. 3002-3005 - Romain Laroche, Philippe Bretier, Ghislain Putois:

Enhanced monitoring tools and online dialogue optimisation merged into a new spoken dialogue system design experience. 3006-3009 - Romain Laroche, Ghislain Putois, Philippe Bretier:

Optimising a handcrafted dialogue system design. 3010-3013 - Felix Putze, Tanja Schultz:

Utterance selection for speech acts in a cognitive tourguide scenario. 3014-3017 - Gabriel Parent, Maxine Eskénazi:

Lexical entrainment of real users in the let's go spoken dialog system. 3018-3021 - Silvia Quarteroni, Meritxell González, Giuseppe Riccardi, Sebastian Varges:

Combining user intention and error modeling for statistical dialog simulators. 3022-3025 - Jaakko Hakulinen, Markku Turunen, Raúl Santos de la Cámara, Nigel T. Crook:

Parallel processing of interruptions and feedback in companions affective dialogue system. 3026-3029 - Antoine Raux, Neville Mehta, Deepak Ramachandran, Rakesh Gupta:

Dynamic language modeling using Bayesian networks for spoken dialog systems. 3030-3033 - Sunao Hara, Norihide Kitaoka, Kazuya Takeda:

Automatic detection of task-incompleted dialog for spoken dialog system based on dialog act n-gram. 3034-3037 - Wei-Bin Liang, Chung-Hsien Wu, Yu-Cheng Hsiao:

Dialogue act detection in error-prone spoken dialogue systems using partial sentence tree and latent dialogue act matrix. 3038-3041 - Tatsuya Kawahara, Kouhei Sumi, Zhi-Qiang Chang, Katsuya Takanashi:

Detection of hot spots in poster conversations based on reactive tokens of audience. 3042-3045 - Yoichi Matsuyama, Shinya Fujie, Hikaru Taniyama, Tetsunori Kobayashi:

Psychological evaluation of a group communication activation robot in a party game. 3046-3049 - Kyoko Matsuyama, Kazunori Komatani, Ryu Takeda, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno:

Analyzing user utterances in barge-in-able spoken dialogue system for improving identification accuracy. 3050-3053 - Mattias Heldner, Jens Edlund, Julia Hirschberg:

Pitch similarity in the vicinity of backchannels. 3054-3057 - Khiet P. Truong, Ronald Poppe

, Dirk Heylen:
A rule-based backchannel prediction model using pitch and pause information. 3058-3061
Speech Perception: Factors Influencing Perception
- Paul Boersma, Katerina Chládková:

Detecting categorical perception in continuous discrimination data. 102-105 - Titia Benders, Paola Escudero:

The interrelation between the stimulus range and the number of response categories in vowel categorization. 106-109 - Marie Nilsenová, Martijn Goudbeek, Luuk Kempen:

The relation between pitch perception preference and emotion identification. 110-113 - Takashi Otake, James M. McQueen, Anne Cutler:

Competition in the perception of spoken Japanese words. 114-117 - Makiko Sadakata, Lotte van der Zanden, Kaoru Sekiyama:

Influence of musical training on perception of L2 speech. 118-121 - Donald Derrick

, Bryan Gick:
Full body aero-tactile integration in speech perception. 122-125
Prosody: Models
- Tomás Dubeda, Katalin Mády:

Nucleus position within the intonation phrase: a typological study of English, Czech and Hungarian. 126-129 - Yong-cheol Lee, Satoshi Nambu:

Focus-sensitive operator or focus inducer: always and only. 130-133 - Jiahong Yuan, Mark Y. Liberman:

F0 declination in English and Mandarin broadcast news speech. 134-137 - Katrin Schweitzer, Michael Walsh, Bernd Möbius, Hinrich Schütze:

Frequency of occurrence effects on pitch accent realisation. 138-141 - César González Ferreras, Carlos Vivaracho-Pascual, David Escudero Mancebo, Valentín Cardeñoso-Payo:

On the automatic toBI accent type identification from data. 142-145 - Andrew Rosenberg:

AutoBI - a tool for automatic toBI annotation. 146-149
Speech Synthesis: Unit Selection and Others
- Volker Strom, Simon King:

A classifier-based target cost for unit selection speech synthesis trained on perceptual data. 150-153 - Wei Zhang, Xiaodong Cui:

Applying scalable phonetic context similarity in unit selection of concatenative text-to-speech. 154-157 - Mitsuaki Isogai, Hideyuki Mizuno:

Speech database reduction method for corpus-based TTS system. 158-161 - Heng Lu, Zhen-Hua Ling, Si Wei, Li-Rong Dai, Ren-Hua Wang:

Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier. 162-165 - Hanna Silén, Elina Helander, Jani Nurminen, Konsta Koppinen, Moncef Gabbouj:

Using robust viterbi algorithm and HMM-modeling in unit selection TTS to replace units of poor quality. 166-169 - Yeon-Jun Kim, Marc C. Beutnagel:

Automatic detection of abnormal stress patterns in unit selection synthesis. 170-173 - Daniel Tihelka, Jirí Kala, Jindrich Matousek:

Enhancements of viterbi search for fast unit selection synthesis. 174-177 - Thomas Ewender, Beat Pfister:

Accurate pitch marking for prosodic modification of speech segments. 178-181 - Shifeng Pan, Meng Zhang, Jianhua Tao:

A novel hybrid approach for Mandarin speech synthesis. 182-185 - Josafá de Jesus Aguiar Pontes, Sadaoki Furui:

Modeling liaison in French by using decision trees. 186-189 - Jian Luan, Jian Li:

Improvement on plural unit selection and fusion. 190-193 - Alok Parlikar, Alan W. Black, Stephan Vogel:

Improving speech synthesis of machine translation output. 194-197 - Ghislain Putois, Jonathan Chevelu, Cédric Boidin:

Paraphrase generation to improve text-to-speech synthesis. 198-201
ASR: Search, Decoding and Confidence Measures I, II
- Chang Woo Han, Shin Jae Kang, Chul Min Lee, Nam Soo Kim:

Phone mismatch penalty matrices for two-stage keyword spotting via multi-pass phone recognizer. 202-205 - Petr Motlícek, Fabio Valente, Philip N. Garner:

English spoken term detection in multilingual recordings. 206-209 - Icksang Han, Chiyoun Park, Jeongmi Cho, Jeongsu Kim:

A hybrid approach to robust word lattice generation via acoustic-based word detection. 210-213 - Volker Steinbiss, Martin Sundermeyer, Hermann Ney:

Direct observation of pruning errors (DOPE): a search analysis tool. 214-217 - David Rybach, Michael Riley:

Direct construction of compact context-dependency transducers from data. 218-221 - Miroslav Novak:

Incremental composition of static decoding graphs with label pushing. 222-225 - Zhanlei Yang, Wenju Liu:

A novel path extension framework using steady segment detection for Mandarin speech recognition. 226-229 - Ralf Schlüter, Markus Nußbaum-Thom, Hermann Ney:

On the relation of Bayes risk, word error, and word posteriors in ASR. 230-233 - David Nolden, Hermann Ney, Ralf Schlüter:

Time conditioned search in automatic speech recognition reconsidered. 234-237 - Satoshi Kobashikawa, Taichi Asami, Yoshikazu Yamaguchi, Hirokazu Masataki, Satoshi Takahashi:

Efficient data selection for speech recognition based on prior confidence estimation using speech and context independent models. 238-241 - Atsunori Ogawa, Atsushi Nakamura:

A novel confidence measure based on marginalization of jointly estimated error cause probabilities. 242-245 - Julien Fayolle, Fabienne Moreau, Christian Raymond, Guillaume Gravier, Patrick Gros:

CRF-based combination of contextual features to improve a posteriori word-level confidence measures. 1942-1945 - Martin Wöllmer, Florian Eyben, Björn W. Schuller

, Gerhard Rigoll:
Recognition of spontaneous conversational speech using long short-term memory phoneme predictions. 1946-1949 - Thomas Pellegrini, Isabel Trancoso:

Improving ASR error detection with non-decoder based features. 1950-1953 - Ladan Golipour, Douglas D. O'Shaughnessy:

Phoneme classification and lattice rescoring based on a k-NN approach. 1954-1957 - Jeff A. Bilmes, Hui Lin:

Online adaptive learning for speech recognition decoding. 1958-1961 - Takaaki Hori, Shinji Watanabe, Atsushi Nakamura:

Improvements of search error risk minimization in viterbi beam search for speech recognition. 1962-1965
Special-Purpose Speech Applications
- Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore

, Sergey I. Rybchenko:
Evaluation of a silent speech interface based on magnetic sensing. 246-249 - Rubén San Segundo

, Verónica López-Ludeña, Raquel Martín, Syaheerah L. Lutfi, Javier Ferreiros
, Ricardo de Córdoba, José Manuel Pardo:
Advanced speech communication system for deaf people. 250-253 - Sethserey Sam, Eric Castelli, Laurent Besacier:

Unsupervised acoustic model adaptation for multi-origin non native ASR. 254-257 - Dilek Hakkani-Tür

, Dimitra Vergyri, Gökhan Tür:
Speech-based automated cognitive status assessment. 258-261 - Toru Imai, Shinichi Homma, Akio Kobayashi, Takahiro Oku, Shoei Sato:

Speech recognition with a seamlessly updated language model for real-time closed-captioning. 262-265 - Takuya Nishimoto, Takayuki Watanabe:

The comparison between the deletion-based methods and the mixing-based methods for audio CAPTCHA systems. 266-269 - Martine Adda-Decker, Lori Lamel, Natalie D. Snoeren:

Comparing mono- & multilingual acoustic seed models for a low e-resourced language: a case-study of luxembourgish. 270-273 - R. J. J. H. van Son, Irene Jacobi, Frans J. M. Hilgers:

Manipulating treacheoesophageal speech. 274-277 - David Imseng, Hervé Bourlard, Mathew Magimai-Doss:

Towards mixed language speech recognition systems. 278-281 - Etienne Barnard, Johan Schalkwyk, Charl Johannes van Heerden, Pedro J. Moreno:

Voice search for development. 282-285 - Gina-Anne Levow, Susan Duncan, Edward T. King:

Cross-cultural investigation of prosody in verbal feedback in interactional rapport. 286-289 - Mary Tai Knox, Gerald Friedland:

Multimodal speaker diarization using oriented optical flow histograms. 290-293 - Catherine Middag, Yvan Saeys, Jean-Pierre Martens:

Towards an ASR-free objective analysis of pathological speech. 294-297
Speech Analysis
- Keith W. Godin, John H. L. Hansen:

Session variability contrasts in the MARP corpus. 298-301 - Kazuhiro Kondo

, Yusuke Takano:
Estimation of two-to-one forced selection intelligibility scores by speech recognizers using noise-adapted models. 302-305 - Thomas Schaaf, Florian Metze:

Analysis of gender normalization using MLP and VTLN features. 306-309 - Guillaume Aimetti, Roger K. Moore

, Louis ten Bosch:
Discovering an optimal set of minimally contrasting acoustic speech units: a point of focus for whole-word pattern matching. 310-313 - Themos Stafylakis

, Xavier Anguera:
Improvements to the equal-parameter BIC for speaker diarization. 314-317 - Nima Mesgarani, Samuel Thomas, Hynek Hermansky:

A multistream multiresolution framework for phoneme recognition. 318-321 - Giampiero Salvi, Fabio Tesser, Enrico Zovato, Piero Cosi:

Cluster analysis of differential spectral envelopes on emotional speech. 322-325 - Samuel R. Bowman, Karen Livescu:

Modeling pronunciation variation with context-dependent articulatory feature decision trees. 326-329 - Bhiksha Raj, Kevin W. Wilson, Alexander Krueger, Reinhold Haeb-Umbach:

Ungrounded independent non-negative factor analysis. 330-333 - John R. Hershey, Peder A. Olsen, Steven J. Rennie:

Signal interaction and the devil function. 334-337
Systems for LVCSR
- Yuya Akita, Masato Mimura, Graham Neubig, Tatsuya Kawahara:

Semi-automated update of automatic transcription system for the Japanese national congress. 338-341 - Xunying Liu, Mark J. F. Gales, Philip C. Woodland:

Language model cross adaptation for LVCSR system combination. 342-345 - Shinji Watanabe, Takaaki Hori, Atsushi Nakamura:

Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data. 346-349 - Pavel Kveton, Miroslav Novak:

Accelerating hierarchical acoustic likelihood computation on graphics processors. 350-353 - Jiulong Shan, Genqing Wu, Zhihong Hu, Xiliu Tang, Martin Jansche, Pedro J. Moreno:

Search by voice in Mandarin Chinese. 354-357 - Thomas Hain

, Lukás Burget, John Dines, Philip N. Garner, Asmaa El Hannani, Marijn Huijbregts, Martin Karafiát, Mike Lincoln, Vincent Wan:
The AMIDA 2009 meeting transcription system. 358-361
Speaker Characterization and Recognition I-IV
- William M. Campbell, Zahi N. Karam:

Simple and efficient speaker comparison using approximate KL divergence. 362-365 - Hanwu Sun, Bin Ma, Chien-Lin Huang, Trung Hieu Nguyen, Haizhou Li:

The IIR NIST SRE 2008 and 2010 summed channel speaker recognition systems. 366-369 - Chien-Lin Huang, Hanwu Sun, Bin Ma, Haizhou Li:

Speaker characterization using long-term and temporal information. 370-373 - Sergio Perez-Gomez, Daniel Ramos, Javier Gonzalez-Dominguez, Joaquin Gonzalez-Rodriguez:

Score-level compensation of extreme speech duration variability in speaker verification. 374-377 - Alberto Abad, Isabel Trancoso:

Speaker recognition experiments using connectionist transformation network features. 378-381 - Yun Lei, John H. L. Hansen:

Speaker recognition using supervised probabilistic principal component analysis. 382-385 - Benjamin Bigot, Julien Pinquier, Isabelle Ferrané, Régine André-Obrecht:

Looking for relevant features for speaker role recognition. 1057-1060 - Marcel Kockmann, Lukás Burget, Ondrej Glembek, Luciana Ferrer, Jan Cernocký:

Prosodic speaker verification using subspace multinomial models with intersession compensation. 1061-1064 - Eryu Wang, Kong-Aik Lee, Bin Ma, Haizhou Li, Wu Guo, Li-Rong Dai:

The estimation and kernel metric of spectral correlation for text-independent speaker verification. 1065-1068 - Rahim Saeidi, Pejman Mowlaee, Tomi Kinnunen, Zheng-Hua Tan, Mads Græsbøll Christensen, Søren Holdt Jensen, Pasi Fränti:

Improving monaural speaker identification by double-talk detection. 1069-1072 - B. Avinash, Sunitha Guruprasad, B. Yegnanarayana:

Exploring subsegmental and suprasegmental features for a text-dependent speaker verification in distant speech signals. 1073-1076 - Qingsong Liu, Wei Huang, Dongxing Xu, Hongbin Cai, Beiqian Dai:

A fast implementation of factor analysis for speaker verification. 1077-1080 - Ce Zhang, Rong Zheng, Bo Xu:

An investigation into direct scoring methods without SVM training in speaker verification. 1437-1440 - Reda Jourani, Khalid Daoudi, Régine André-Obrecht, Driss Aboutajdine:

Large margin Gaussian mixture models for speaker identification. 1441-1444 - Rong Zheng, Bo Xu:

On the use of Gaussian component information in the generative likelihood ratio estimation for speaker verification. 1445-1448 - Man-Wai Mak, Wei Rao:

Acoustic vector resampling for GMMSVM-based speaker verification. 1449-1452 - Konstantin Biatov:

A fast speaker indexing using vector quantization and second order statistics with adaptive threshold computation. 1453-1456 - Gang Wang, Xiaojun Wu, Thomas Fang Zheng:

Using phoneme recognition and text-dependent speaker verification to improve speaker segmentation for Chinese speech. 1457-1460 - Claudio Garretón, Néstor Becerra Yoma:

On enhancing feature sequence filtering with filter-bank energy transformation in speaker verification with telephone speech. 1461-1464 - Donglai Zhu, Bin Ma, Kong-Aik Lee, Cheung-Chi Leung, Haizhou Li:

MAP estimation of subspace transform for speaker recognition. 1465-1468 - Ayeh Jafari, Ramji Srinivasan, Danny Crookes, Ji Ming:

A longest matching segment approach for text-independent speaker recognition. 1469-1472 - Ville Hautamäki, Tomi Kinnunen, Mohaddeseh Nosratighods, Kong-Aik Lee, Bin Ma, Haizhou Li:

Approaching human listener accuracy with modern speaker verification. 1473-1476 - Jouni Pohjalainen, Rahim Saeidi, Tomi Kinnunen, Paavo Alku:

Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions. 1477-1480 - Guoli Ye, Brian Mak:

The use of subvector quantization and discrete densities for fast GMM computation for speaker verification. 1481-1484 - Fred S. Richardson, Joseph P. Campbell:

Transcript-dependent speaker recognition using mixer 1 and 2. 2102-2105 - Thomas Drugman, Thierry Dutoit:

On the potential of glottal signatures for speaker recognition. 2106-2109 - R. Padmanabhan, Hema A. Murthy:

Acoustic feature diversity and speaker verification. 2110-2113 - Omid Dehzangi, Bin Ma, Engsiong Chng, Haizhou Li:

A discriminative performance metric for GMM-UBM speaker identification. 2114-2117 - Xavier Anguera, Jean-François Bonastre:

A novel speaker binary key derived from anchor models. 2118-2121 - Weiqiang Zhang, Yan Deng, Liang He, Jia Liu:

Variant time-frequency cepstral features for speaker recognition. 2122-2125 - Ning Wang, P. C. Ching, Tan Lee:

Exploitation of phase information for speaker recognition. 2126-2129 - Yanhua Long, Li-Rong Dai, Bin Ma, Wu Guo:

Effects of the phonological relevance in speaker verification. 2130-2133 - Gabriel Hernández Sierra, Jean-François Bonastre, Driss Matrouf, José R. Calvo:

Topological representation of speech for speaker recognition. 2134-2137 - Seyed Omid Sadjadi, John H. L. Hansen:

Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions. 2138-2141 - Xiang Zhang, Chuan Cao, Lin Yang, Hongbin Suo, Jianping Zhang, Yonghong Yan:

Speaker recognition using the resynthesized speech via spectrum modeling. 2142-2145
Source Separation
- Robert Peharz, Michael Stark, Franz Pernkopf, Yannis Stylianou:

A factorial sparse coder model for single channel source separation. 386-389 - Yasmina Benabderrahmane, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy:

Oriented PCA method for blind speech separation of convolutive mixtures. 390-393 - Hsin-Lung Hsieh, Jen-Tzung Chien:

Online Gaussian process for nonstationary speech separation. 394-397 - Meng Yu, Wenye Ma, Jack Xin, Stanley J. Osher:

Convexity and fast speech extraction by split bregman method. 398-401 - Wenye Ma, Meng Yu, Jack Xin, Stanley J. Osher:

Reducing musical noise in blind source separation by time-domain sparse filters and split bregman method. 402-405 - John Woodruff, Rohit Prabhavalkar

, Eric Fosler-Lussier, DeLiang Wang:
Combining monaural and binaural evidence for reverberant speech segregation. 406-409
Speech Synthesis: HMM-Based Speech Synthesis I, II
- Heiga Zen:

Speaker and language adaptive training for HMM-based polyglot speech synthesis. 410-413 - Kai Yu, Heiga Zen, François Mairesse, Steve J. Young:

Context adaptive training with factorized decision trees for HMM-based speech synthesis. 414-417 - Junichi Yamagishi, Oliver Watts, Simon King, Bela Usabaev:

Roles of the average voice in speaker-adaptive HMM-based speech synthesis. 418-421 - Yao Qian, Zhi-Jie Yan, Yi-Jian Wu, Frank K. Soong, Xin Zhuang, Shengyi Kong:

An HMM trajectory tiling (HTT) approach to high quality TTS. 422-425 - Yining Chen, Zhi-Jie Yan, Frank K. Soong:

A perceptual study of acceleration parameters in HMM-based TTS. 426-429 - Shuji Yokomizo, Takashi Nose, Takao Kobayashi:

Evaluation of prosodic contextual factors for HMM-based speech synthesis. 430-433 - Slava Shechtman, Alexander Sorin:

Sinusoidal model parameterization for HMM-based TTS system. 805-808 - Yoshinori Shiga, Tomoki Toda, Shinsuke Sakai, Hisashi Kawai:

Improved training of excitation for HMM-based parametric speech synthesis. 809-812 - June Sig Sung, Doo Hwa Hong, Kyung Hwan Oh, Nam Soo Kim:

Excitation modeling based on waveform interpolation for HMM-based speech synthesis. 813-816 - Xin Zhuang, Yao Qian, Frank K. Soong, Yi-Jian Wu, Bo Zhang:

Formant-based frequency warping for improving speaker adaptation in HMM TTS. 817-820 - Hongwei Hu, Martin J. Russell:

Improved modelling of speech dynamics using non-linear formant trajectories for HMM-based speech synthesis. 821-824 - Zhen-Hua Ling, Yu Hu, Li-Rong Dai:

Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis. 825-828 - Matt Shannon, William Byrne:

Autoregressive clustering for HMM speech synthesis. 829-832 - Nicholas Pilkington, Heiga Zen:

An implementation of decision tree-based context clustering on graphics processing units. 833-836 - Alexander Gutkin, Xavi Gonzalvo, Stefan Breuer, Paul Taylor:

Quantized HMMs for low footprint text-to-speech synthesis. 837-840 - Oliver Watts, Junichi Yamagishi, Simon King:

The role of higher-level linguistic features in HMM-based speech synthesis. 841-844 - Ayami Mase, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:

HMM-based singing voice synthesis system using pitch-shifted pseudo training data. 845-848 - Jinfu Ni, Hisashi Kawai:

An unsupervised approach to creating web audio contents-based HMM voices. 849-852 - Tomoki Koriyama, Takashi Nose, Takao Kobayashi:

Conversational spontaneous speech synthesis using average voice model. 853-856
Multi-Modal Signal Processing
- Jonas Hörnstein, José Santos-Victor:

Learning words and speech units through natural interactions. 434-437 - Qingju Liu, Wenwu Wang, Philip J. B. Jackson:

Bimodal coherence based scale ambiguity cancellation for target speech extraction and enhancement. 438-441 - Hiroaki Kawashima, Yu Horii, Takashi Matsuyama:

Speech estimation in non-stationary noise environments using timing structures between mouth movements and sound signals. 442-445 - Lijuan Wang, Xiaojun Qian, Wei Han, Frank K. Soong:

Synthesizing photo-real talking head via trajectory-guided sample selection. 446-449 - Victoria M. Florescu, Lise Crevier-Buchman, Bruce Denby, Thomas Hueber, Antonia Colazo-Simon, Claire Pillot-Loiseau, Pierre Roussel-Ragot, Cédric Gendrot, Sophie Quattrocchi:

Silent vs vocalized articulation for a portable ultrasound-based silent speech interface. 450-453 - Gregor Hofer, Korin Richmond:

Comparison of HMM and TMDN methods for lip synchronisation. 454-457
Paralanguage
- Florian Schiel, Christian Heinrich, Veronika Neumeyer:

Rhythm and formant features for automatic alcohol detection. 458-461 - Irena Yanushevskaya, Christer Gobl, John Kane, Ailbhe Ní Chasaide:

An exploration of voice source correlates of focus. 462-465 - James D. Harnsberger, Rahul Shrivastav, W. S. Brown Jr.:

Modeling perceived vocal age in american English. 466-469 - Marie-José Caraty, Claude Montacié:

Multivariate analysis of vocal fatigue in continuous reading. 470-473 - Alexander Kain, Jan P. H. van Santen:

Frequency-domain delexicalization using surrogate vowels. 474-477 - Florian Metze, Anton Batliner, Florian Eyben, Tim Polzehl, Björn W. Schuller

, Stefan Steidl:
Emotion recognition using imperfect speech recognition. 478-481 - Gang Liu, Yun Lei, John H. L. Hansen:

A novel feature extraction strategy for multi-stream robust emotion identification. 482-485 - Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, Marie-Odile Berger:

Setup for acoustic-visual speech synthesis by concatenating bimodal units. 486-489 - Bart Jochems, Martha A. Larson, Roeland Ordelman, Ronald Poppe, Khiet P. Truong:

Towards affective state modeling in narrative and conversational settings. 490-493 - Narichika Nomoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi:

Detection of anger emotion in dialog speech using prosody feature and temporal relation of utterances. 494-497 - Benjamin Roustan, Marion Dohen:

Gesture and speech coordination: the influence of the relationship between manual gesture and speech. 498-501 - Hynek Boril, Seyed Omid Sadjadi, Tristan Kleinschmidt, John H. L. Hansen:

Analysis and detection of cognitive load and frustration in drivers' speech. 502-505 - Akira Sasou, Yasuharu Hashimoto, Katsuhiko Sakaue:

Acoustic-based recognition of head gestures accompanying speech. 506-509 - Sandro Castronovo, Angela Mahr, Margarita Pentcheva, Christian A. Müller:

Multimodal dialog in the car: combining speech and turn-and-push dial to control comfort functions. 510-513 - Danil Korchagin, Philip N. Garner, Petr Motlícek:

Hands free audio analysis from home entertainment. 514-517 - Shaikh Mostafa Al Masum, Antonio Rui Ferreira Rebordão, Keikichi Hirose:

Affective story teller: a TTS system for emotional expressivity. 518-521
ASR: Speaker Adaptation, Robustness Against Reverberation
- Shweta Ghai, Rohit Sinha:

Enhancing children's speech recognition under mismatched condition by explicit acoustic normalization. 522-525 - Bo Li, Khe Chai Sim:

Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems. 526-529 - Ravichander Vipperla, Steve Renals, Joe Frankel:

Augmentation of adaptation data. 530-533 - Lukás Machlica, Zbynek Zajíc, Ludek Müller:

Discriminative adaptation based on fast combination of DMAP and dfMLLR. 534-537 - Doddipatla Rama Sanand, Ralf Schlüter, Hermann Ney:

Revisiting VTLN using linear transformation on conventional MFCC. 538-541 - Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:

Speaker adaptation based on nonlinear spectral transform for speech recognition. 542-545 - Tetsuo Kosaka, Takashi Ito, Masaharu Katoh, Masaki Kohda:

Speaker adaptation based on system combination using speaker-class models. 546-549 - Yongwon Jeong, Young Rok Song, Hyung Soon Kim:

Speaker adaptation in transformation space using two-dimensional PCA. 550-553 - Jan Trmal, Jan Zelinka, Ludek Müller:

On speaker adaptive training of artificial neural networks. 554-557 - Yongjun He, Jiqing Han:

Model synthesis for band-limited speech recognition. 558-561 - Takahiro Fukumori, Masanori Morise, Takanobu Nishiura:

Performance estimation of reverberant speech recognition based on reverberant criteria RSR-dn with acoustic parameters. 562-565 - Armin Sehr, Christian Hofmann, Roland Maas, Walter Kellermann:

A novel approach for matched reverberant training of HMMs using data pairs. 566-569 - Hari Krishna Maganti, Marco Matassoni:

An auditory based modulation spectral feature for reverberant speech recognition. 570-573 - Martin Wolf, Climent Nadeu:

On the potential of channel selection for recognition of reverberated speech with multiple microphones. 574-577 - Randy Gomez, Tatsuya Kawahara:

An improved wavelet-based dereverberation for robust automatic speech recognition. 578-581 - Rico Petrick, Thomas Fehér, Masashi Unoki, Rüdiger Hoffmann:

Methods for robust speech recognition in reverberant environments: a comparison. 582-585
Language Learning, TTS, and Other Applications
- Masayuki Suzuki, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose:

Integration of multilayer regression analysis with structure-based pronunciation assessment. 586-589 - Joost van Doremalen, Catia Cucchiarini, Helmer Strik

:
Using non-native error patterns to improve pronunciation verification. 590-593 - Dean Luo, Yu Qiao, Nobuaki Minematsu, Yutaka Yamauchi, Keikichi Hirose:

Regularized-MLLR speaker adaptation for computer-assisted language learning system. 594-597 - Kuniaki Hirabayashi, Seiichi Nakagawa:

Automatic evaluation of English pronunciation by Japanese speakers using various acoustic features and pattern recognition techniques. 598-601 - Hsien-Cheng Liao, Jiang-Chun Chen, Sen-Chia Chang, Ying-Hua Guan, Chin-Hui Lee:

Decision tree based tone modeling with corrective feedbacks for automatic Mandarin tone assessment. 602-605 - Jingli Lu, Ruili Wang, Liyanage C. De Silva, Yang Gao, Jia Liu:

CASTLE: a computer-assisted stress teaching and learning environment for learners of English as a second language. 606-609 - Shen Huang, Hongyan Li, Shijin Wang, Jiaen Liang, Bo Xu:

Automatic reference independent evaluation of prosody quality using multiple knowledge fusions. 610-613 - Su-Youn Yoon, Mark Hasegawa-Johnson, Richard Sproat:

Landmark-based automated pronunciation error detection. 614-617 - Zhiwei Shuang, Shiyin Kang, Yong Qin, Li-Rong Dai, Lianhong Cai:

HMM based TTS for mixed language text. 618-621 - Hui Liang, John Dines:

An analysis of language mismatch in HMM state mapping-based cross-lingual speaker adaptation. 622-625 - Tatsuya Kawahara, Norihiro Katsumaru, Yuya Akita, Shinsuke Mori:

Classroom note-taking system for hearing impaired students using automatic speech recognition adapted to lectures. 626-629 - Paul R. Dixon, Sadaoki Furui:

Exploring web-browser based runtimes engines for creating ubiquitous speech interfaces. 630-632
Pitch and Glottal-Waveform Estimation and Modeling I, II
- Xuejing Sun, Sameer Gadre:

Efficient three-stage pitch estimation for packet loss concealment. 633-636 - Keiichi Funaki:

On evaluation of the f0 estimation based on time-varying complex speech analysis. 637-640 - Feng Huang, Tan Lee:

Pitch estimation in noisy speech based on temporal accumulation of spectrum peaks. 641-644 - Tianyu T. Wang, Thomas F. Quatieri:

Multi-pitch estimation by a joint 2-d representation of pitch and pitch dynamics. 645-648 - Pirros Tsiakoulis, Alexandros Potamianos:

On the effect of fundamental frequency on amplitude and frequency modulation patterns in speech resonances. 649-652 - M. Shahidur Rahman, Tetsuya Shimamura:

Pitch determination using autocorrelation function in spectral domain. 653-656 - Thomas Drugman, Thierry Dutoit:

Chirp complex cepstrum-based decomposition for asynchronous glottal analysis. 657-660 - Alan Ó Cinnéide, David Dorran, Mikel Gainza, Eugene Coyle:

Exploiting glottal formant parameters for glottal inverse filtering and parameterization. 661-664 - Nicolas Sturmel, Christophe d'Alessandro, Boris Doval:

Glottal parameters estimation on speech using the zeros of the z-transform. 665-668 - Sri Harish Reddy Mallidi, Kishore Prahallad, Suryakanth V. Gangashetty, B. Yegnanarayana:

Significance of pitch synchronous analysis for speaker recognition using AANN models. 669-672 - Gang Chen, Xue Feng, Yen-Liang Shue, Abeer Alwan:

On using voice source measures in automatic gender classification of children's speech. 673-676 - Wei Chu, Abeer Alwan:

SAFE: a statistical algorithm for F0 estimation for both clean and noisy speech. 2590-2593 - Jung Ook Hong, Patrick J. Wolfe:

Robust and efficient pitch estimation using an iterative ARMA technique. 2594-2597 - Yasunori Ohishi, Hirokazu Kameoka, Daichi Mochihashi, Hidehisa Nagano, Kunio Kashino:

Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases. 2598-2601 - Martin Heckmann, Claudius Gläser, Frank Joublin, Kazuhiro Nakadai:

Applying geometric source separation for improved pitch extraction in human-robot interaction. 2602-2605 - John Kane, Mark Kane, Christer Gobl:

A spectral LF model based approach to voice source parameterisation. 2606-2609 - Thomas Drugman, Thierry Dutoit:

Glottal-based analysis of the lombard effect. 2610-2613
Open Vocabulary Spoken Document Retrieval (Special Session)
- Yoshiaki Itoh, Hiromitsu Nishizaki, Xinhui Hu, Hiroaki Nanjo, Tomoyosi Akiba, Tatsuya Kawahara, Seiichi Nakagawa, Tomoko Matsui, Yoichi Yamashita, Kiyoaki Aikawa:

Constructing Japanese test collections for spoken term detection. 677-680 - Satoshi Natori, Hiromitsu Nishizaki, Yoshihiro Sekiguchi:

Japanese spoken term detection using syllable transition network derived from multiple speech recognizers' outputs. 681-684 - Sha Meng, Weiqiang Zhang, Jia Liu:

Combining Chinese spoken term detection systems via side-information conditioned linear logistic regression. 685-688 - Taisuke Kaneko, Tomoyosi Akiba:

Metric subspace indexing for fast spoken term detection. 689-692 - Chun-an Chan, Lin-Shan Lee:

Unsupervised spoken-term detection with spoken queries using segment-based dynamic time warping. 693-696 - Daniel Schneider, Timo Mertens, Martha A. Larson, Joachim Köhler:

Contextual verification for open vocabulary spoken term detection. 697-700 - Javier Tejedor

, Doroteo T. Toledano, Miguel Bautista, Simon King, Dong Wang, José Colás:
Augmented set of features for confidence estimation in spoken term detection. 701-704 - Xinhui Hu, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:

Cluster-based language model for spoken document retrieval using NMF-based document clustering. 705-708
Robust ASR
- Rogier C. van Dalen, Mark J. F. Gales:

Asymptotically exact noise-corrupted speech likelihoods. 709-712 - Ramón Fernandez Astudillo, Reinhold Orglmeister:

A MMSE estimator in mel-cepstral domain for robust large vocabulary automatic speech recognition using uncertainty propagation. 713-716 - Bhiksha Raj, Tuomas Virtanen, Sourish Chaudhuri, Rita Singh:

Non-negative matrix factorization based compensation of music for automatic speech recognition. 717-720 - Kris Demuynck, Xueru Zhang, Dirk Van Compernolle, Hugo Van hamme

:
Feature versus model based noise robustness. 721-724 - Ji Hun Park, Seon Man Kim, Jae Sam Yoon, Hong Kook Kim, Sung Joo Lee, Yunkeun Lee:

SNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment. 725-728 - Chanwoo Kim, Richard M. Stern, Kiwan Eom, Jaewon Lee:

Automatic selection of thresholds for signal separation algorithms based on interaural delay. 729-732
Language and Dialect Identification
- Florian Verdet, Driss Matrouf, Jean-François Bonastre, Jean Hennebert:

Channel detectors for system fusion in the context of NIST LRE 2009. 733-736 - Rong Tong, Bin Ma, Haizhou Li, Engsiong Chng:

Selecting phonotactic features for language recognition. 737-740 - Abualsoud Hanani, Michael J. Carey, Martin J. Russell:

Improved language recognition using mixture components statistics. 741-744 - Mikel Peñagarikano, Amparo Varona, Luis Javier Rodríguez-Fuentes, Germán Bordel:

Using cross-decoder co-occurrences of phone n-grams in SVM-based phonotactic language recognition. 745-748 - Oscar Koller, Alberto Abad, Isabel Trancoso, Céu Viana:

Exploiting variety-dependent phones in portuguese variety identification applied to broadcast news transcription. 749-752 - Fadi Biadsy, Julia Hirschberg, Michael Collins:

Dialect recognition using a phone-GMM-supervector-based SVM kernel. 753-756
Technologies for Learning and Education
- Xiaojun Qian, Frank K. Soong, Helen M. Meng:

Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT). 757-760 - Liang-Yu Chen, Jyh-Shing Roger Jang:

Automatic pronunciation scoring using learning to rank and DP-based score segmentation. 761-764 - Wai Kit Lo, Shuang Zhang, Helen M. Meng:

Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system. 765-768 - Minh Duong, Jack Mostow:

Adapting a duration synthesis model to rate children's oral reading prosody. 769-772 - Su-Youn Yoon, Lei Chen, Klaus Zechner:

Predicting word accuracy for the automatic speech recognition of non-native speech. 773-776 - Taotao Zhu, Dengfeng Ke, Zhenbiao Chen, Bo Xu:

A new approach for automatic tone error detection in strong accented Mandarin based on dominant set. 777-780
Emotional Speech
- S. R. Mahadeva Prasanna, D. Govind:

Analysis of excitation source information in emotional speech. 781-784 - Dongrui Wu, Thomas D. Parsons, Shrikanth S. Narayanan:

Acoustic feature analysis in speech emotion primitives estimation. 785-788 - Lan-Ying Yeh, Tai-Shih Chi:

Spectro-temporal modulations for robust speech emotion recognition. 789-792 - Chi-Chun Lee, Matthew Black, Athanasios Katsamanis, Adam C. Lammert, Brian R. Baucom

, Andrew Christensen, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Quantification of prosodic entrainment in affective spontaneous spoken interactions of married couples. 793-796 - Emily Mower, Kyu Jeong Han, Sungbok Lee, Shrikanth S. Narayanan:

A cluster-profile representation of emotion using agglomerative hierarchical clustering. 797-800 - Björn W. Schuller

, Laurence Devillers:
Incremental acoustic valence recognition: an inter-corpus perspective on features, matching, and performance in a gating paradigm. 801-804
New Paradigms in ASR I, II
- Xiaodong Wang, Kunihiko Owa, Makoto Shozakai:

Mandarin digit recognition assisted by selective tone distinction. 857-860 - Kazuhiko Abe, Sakriani Sakti, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:

Brazilian portuguese acoustic model training based on data borrowing from other language. 861-864 - Ngoc Thang Vu, Tim Schlippe, Franziska Kraus, Tanja Schultz:

Rapid bootstrapping of five eastern european languages using the rapid language adaptation toolkit. 865-868 - Houwei Cao, Tan Lee, P. C. Ching:

Cross-lingual speaker adaptation via Gaussian component mapping. 869-872 - Mohamed Elmahdy, Rainer Gruhn, Wolfgang Minker, Slim Abdennadher:

Cross-lingual acoustic modeling for dialectal Arabic speech recognition. 873-876 - Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:

Cross-lingual and multi-stream posterior features for low resource LVCSR systems. 877-880 - Shiva Sundaram, Jerome R. Bellegarda:

Latent perceptual mapping: a new acoustic modeling framework for speech recognition. 881-884 - Richard Dufour, Fethi Bougares, Yannick Estève, Paul Deléglise:

Unsupervised model adaptation on targeted speech segments for LVCSR system combination. 885-888 - Irene Ayllón Clemente, Martin Heckmann, Alexander Denecke, Britta Wrede, Christian Goerick:

Incremental word learning using large-margin discriminative training and variance floor estimation. 889-892 - Tuomas Virtanen, Jort F. Gemmeke, Antti Hurmalainen:

State-based labelling for a sparse representation of speech and its application to robust speech recognition. 893-896 - Mirko Hannemann, Stefan Kombrink, Martin Karafiát, Lukás Burget:

Similarity scoring for recognizing repeated out-of-vocabulary words. 897-900 - Dino Seppi, Dirk Van Compernolle:

Data pruning for template-based automatic speech recognition. 901-904 - Man-Hung Siu, Herbert Gish, Arthur Chan, William Belfield:

Improved topic classification and keyword discovery using an HMM-based speech recognizer trained without supervision. 2838-2841 - Dimitri Kanevsky, Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo:

An analysis of sparseness and regularization in exemplar-based methods for speech classification. 2842-2845 - Abdel-rahman Mohamed, Dong Yu, Li Deng:

Investigation of full-sequence training of deep belief networks for speech recognition. 2846-2849 - Yow-Bang Wang, Lin-Shan Lee:

Mandarin tone recognition using affine-invariant prosodic features and tone posteriorgram. 2850-2853 - Geoffrey Zweig, Patrick Nguyen, Jasha Droppo, Alex Acero:

Continuous speech recognition with a TF-IDF acoustic model. 2854-2857 - Geoffrey Zweig, Patrick Nguyen:

SCARF: a segmental conditional random field toolkit for speech recognition. 2858-2861
Speech Production: Various Approaches
- Akiko Amano-Kusumoto, John-Paul Hosom, Alexander Kain:

Speaking style dependency of formant targets. 905-908 - Tatsuya Kitamura:

Similarity of effects of emotions on the speech organ configuration with and without speaking. 909-912 - Daniel Bone, Samuel Kim, Sungbok Lee, Shrikanth S. Narayanan:

A study of intra-speaker and inter-speaker affective variability using electroglottograph and inverse filtered glottal waveforms. 913-916 - Ken-Ichi Sakakibara, Hiroshi Imagawa, Miwako Kimura, Hisayuki Yokonishi, Niro Tayama:

Modal analysis of vocal fold vibrations using laryngotopography. 917-920 - Martti Vainio, Matti Airas, Juhani Järvikivi, Paavo Alku:

Laryngeal voice quality in the expression of focus. 921-924 - Masako Fujimoto, Kikuo Maekawa, Seiya Funatsu:

Laryngeal characteristics during the production of geminate consonants. 925-928 - Julien Cisonni, Kazunori Nozaki, Annemie Van Hirtum, Shigeo Wada:

Numerical study of turbulent flow-induced sound production in presence of a tooth-shaped obstacle: towards sibilant [s] physical modeling. 929-932 - Iris Hanique, Barbara Schuppler, Mirjam Ernestus:

Morphological and predictability effects on schwa reduction: the case of dutch word-initial syllables. 933-936 - Samer Al Moubayed, Gopal Ananthakrishnan:

Acoustic-to-articulatory inversion based on local regression. 937-940 - Mirjam Broersma:

Korean lenis, fortis, and aspirated stops: effect of place of articulation on acoustic realization. 941-944 - Toru Nakashika, Ryuki Tachibana, Masafumi Nishimura, Tetsuya Takiguchi, Yasuo Ariki:

Speech synthesis by modeling harmonics structure with multiple function. 945-948 - Makoto Otani, Tatsuya Hirahara:

Physics of body-conducted silent speech - production, propagation and representation of non-audible murmur. 949-952
Speech Enhancement
- Subhojit Chakladar, Nam Soo Kim, Yu Gwang Jin, Tae Gyoon Kang:

Multichannel noise reduction using low order RTF estimate. 953-956 - Inho Lee, Jongsung Yoon, Yoonjae Lee, Hanseok Ko:

Reinforced blocking matrix with cross channel projection for speech enhancement. 957-960 - Ning Cheng, Wenju Liu, Lan Wang:

Masking property based microphone array post-filter design. 961-964 - Yusuke Sato, Tetsuya Hoya

, Hovagim Bakardjian, Andrzej Cichocki:
Reduction of broadband noise in speech signals by multilinear subspace analysis. 965-968 - Jungpyo Hong, Seung Ho Han, Sangbae Jeong, Minsoo Hahn:

Novel probabilistic control of noise reduction for improved microphone array beamforming. 969-972 - Kai Li, Qiang Fu, Yonghong Yan:

Speech enhancement using improved generalized sidelobe canceller in frequency domain with multi-channel postfiltering. 973-976 - Jani Even, Carlos Toshinori Ishi, Hiroshi Saruwatari, Norihiro Hagita:

Close speaker cancellation for suppression of non-stationary background noise for hands-free speech interface. 977-980 - Ajay Srinivasamurthy, Thippur V. Sreenivas:

Multi-channel iterative dereverberation based on codebook constrained iterative multi-channel wiener filter. 981-984 - Anand Joseph Xavier Medabalimi, Sri Harish Reddy Mallidi, B. Yegnanarayana:

Speaker-dependent mapping of source and system features for enhancement of throat microphone speech. 985-988 - Jun Cai, Stefano Marini, Pierre Malarme, Francis Grenez, Jean Schoentgen:

An analytic modeling approach to enhancing throat microphone speech commands for keyword spotting. 989-992 - Stephen So, Kamil K. Wójcicki, Kuldip K. Paliwal:

Single-channel speech enhancement using kalman filtering in the modulation domain. 993-996 - Miao Yao, Weiqian Liang:

Integrated feedback and noise reduction algorithm in digital hearing aids via oscillation detection. 997-1000 - Charles Mercier, Roch Lefebvre:

A blind signal-to-noise ratio estimator for high noise speech recordings. 1001-1004
Special Session: Fact and Replica of Speech Production (Special Session)
- Hiroshi Imagawa, Ken-Ichi Sakakibara, Isao T. Tokuda, Mamiko Otsuka, Niro Tayama:

Estimation of glottal area function using stereo-endoscopic high-speed digital imaging. 1005-1008 - Kazunori Nozaki, Youhei Ohnishi, Takashi Suda, Shigeo Wada, Shinji Shimojo:

Toward aero-acoustical analysis of the sibilant /s/: an oral cavity modeling. 1009-1012 - Kunitoshi Motoki:

Effects of wall impedance on transmission and attenuation of higher-order modes in vocal-tract model. 1013-1016 - Peter Birkholz, Bernd J. Kröger, Christiane Neuschaefer-Rube:

Articulatory synthesis and perception of plosive-vowel syllables with virtual consonant targets. 1017-1020 - Kotaro Fukui, Toshihiro Kusano, Yoshikazu Mukaeda, Yuto Suzuki, Atsuo Takanishi, Masaaki Honda:

Speech robot mimicking human articulatory motion. 1021-1024 - Takayuki Arai:

Mechanical vocal-tract models for speech dynamics. 1025-1028 - Michael C. Brady:

Prosodic timing analysis for articulatory re-synthesis using a bank of resonators with an adaptive oscillator. 1029-1032
ASR: Language Modeling
- Ahmad Emami, Stanley F. Chen

, Abraham Ittycheriah, Hagen Soltau, Bing Zhao:
Decoding with shrinkage-based language models. 1033-1036 - Stanley F. Chen

, Stephen M. Chu:
Enhanced word classing for model M. 1037-1040 - Junho Park, Xunying Liu, Mark J. F. Gales, Philip C. Woodland:

Improved neural network based language modelling and adaptation. 1041-1044 - Tomás Mikolov, Martin Karafiát, Lukás Burget, Jan Cernocký, Sanjeev Khudanpur

:
Recurrent neural network based language model. 1045-1048 - Preethi Jyothi, Eric Fosler-Lussier:

Discriminative language modeling using simulated ASR errors. 1049-1052 - Graham Neubig, Masato Mimura, Shinsuke Mori, Tatsuya Kawahara:

Learning a language model from continuous speech. 1053-1056
Single-Channel Speech Enhancement
- Stephen So, Kuldip K. Paliwal:

Fast converging iterative kalman filtering for speech enhancement using long and overlapped tapered windows with large side lobe attenuation. 1081-1084 - Xuejing Sun, Kuan-Chieh Yen, Rogerio Guedes Alves:

Robust noise estimation using minimum correction with harmonicity control. 1085-1088 - Mahdi Triki:

New insights into subspace noise tracking. 1089-1092 - Mahdi Triki, Kees Janse:

Bias considerations for minimum subspace noise tracking. 1093-1096 - Ji Ming, Ramji Srinivasan, Danny Crookes:

A corpus-based approach to speech enhancement from nonstationary noise. 1097-1100 - Zhe Chen, You-Chi Cheng, Fuliang Yin, Chin-Hui Lee:

Bandwidth expansion of speech based on wavelet transform modulus maxima vector mapping. 1101-1104
Speech Synthesis: Miscellaneous Topics
- Kalu U. Ogbureke, Peter Cahill, Julie Carson-Berndsen:

Hidden Markov models with context-sensitive observations for grapheme-to-phoneme conversion. 1105-1108 - Brian Langner, Stephan Vogel, Alan W. Black:

Evaluating a dialog language generation system: comparing the mountain system to other NLG approaches. 1109-1112 - Wesley Mattheyses, Lukas Latacz, Werner Verhelst:

Active appearance models for photorealistic visual speech synthesis. 1113-1116 - Jerome R. Bellegarda:

Latent affective mapping: a novel framework for the data-driven analysis of emotion in text. 1117-1120 - Anna C. Janska, Robert A. J. Clark:

Native and non-native speaker judgements on the quality of synthesized speech. 1121-1124 - Dominic Espinosa, Michael White, Eric Fosler-Lussier, Chris Brew:

Machine learning for text selection with expressive unit-selection voices. 1125-1128
Prosody: Basics Applications
- Alexei V. Ivanov, Giuseppe Riccardi, Sucheta Ghosh, Sara Tonelli, Evgeny A. Stepanov:

Acoustic correlates of meaning structure in conversational speech. 1129-1132 - Nicolas Obin, Xavier Rodet, Anne Lacheret:

HMM-based prosodic structure model using rich linguistic context. 1133-1136 - Charlotte Wollermann, Bernhard Schröder, Ulrich Schade:

Audiovisual congruence and pragmatic focus marking. 1137-1140 - Margaret Zellers, Michele Gubian, Brechtje Post:

Redescribing intonational categories with functional data analysis. 1141-1144 - Shen Huang, Hongyan Li, Shijin Wang, Jiaen Liang, Bo Xu:

Exploring goodness of prosody by diverse matching templates. 1145-1148 - Mickael Rouvier, Richard Dufour, Georges Linarès, Yannick Estève:

A language-identification inspired method for spontaneous speech detection. 1149-1152 - Gérard Bailly, Amélie Lelong:

Speech dominoes and phonetic convergence. 1153-1156 - Mátyás Brendel, Riccardo Zaccarelli, Laurence Devillers:

A quick sequential forward floating feature selection algorithm for emotion detection from speech. 1157-1160 - Géza Kiss, Jan P. H. van Santen:

Automated vocal emotion recognition using phoneme class specific features. 1161-1164 - Adrian Pass, Jianguo Zhang, Darryl Stewart:

Feature selection for pose invariant lip biometrics. 1165-1168 - Hussein Hussein, Rüdiger Hoffmann:

Signal-based accent and phrase marking using the fujisaki model. 1169-1172 - Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan:

A study of interplay between articulatory movement and prosodic characteristics in emotional speech production. 1173-1176
ASR: Feature Extraction I, II
- Shang-wen Li, Liang-Che Sun, Lin-Shan Lee:

Improved phoneme recognition by integrating evidence from spectro-temporal and cepstral features. 1177-1180 - Suman V. Ravuri, Nelson Morgan:

Using spectro-temporal features to improve AFE feature extraction for ASR. 1181-1184 - Ibon Saratxaga, Inma Hernáez, Igor Odriozola, Eva Navas, Iker Luengo, Daniel Erro:

Using harmonic phase information to improve ASR rate. 1185-1188 - Kazumasa Yamamoto, Eiichi Sueyoshi, Seiichi Nakagawa:

Speech recognition using long-term phase information. 1189-1192 - Jan Zelinka, Jan Trmal, Ludek Müller:

Low-dimensional space transforms of posteriors in speech recognition. 1193-1196 - Christian Plahl, Ralf Schlüter, Hermann Ney:

Hierarchical bottle neck features for LVCSR. 1197-1200 - Frantisek Grézl, Martin Karafiát:

Hierarchical neural net architectures for feature extraction in ASR. 1201-1204 - Vivek Kumar Rangarajan Sridhar, Rohit Prasad, Prem Natarajan:

Mutual information analysis for feature and sensor subset selection in surface electromyography based speech recognition. 1205-1208 - Bernd T. Meyer, Birger Kollmeier:

Learning from human errors: prediction of phoneme confusions based on modified ASR training. 1209-1212 - Bo Li, Khe Chai Sim:

Hidden logistic linear regression for support vector machine based phone verification. 2614-2617 - Tim Ng, Bing Zhang, Long Nguyen:

Jointly optimized discriminative features for speech recognition. 2618-2621 - Florian Müller, Alfred Mertins:

Invariant integration features combined with speaker-adaptation methods. 2622-2625 - Mark Raugas, Vivek Kumar Rangarajan Sridhar, Rohit Prasad, Prem Natarajan:

Multi resolution discriminative models for subvocalic speech recognition. 2626-2629 - Fabio Valente, Mathew Magimai-Doss, Christian Plahl, Suman V. Ravuri, Wen Wang:

A comparative large scale study of MLP features for Mandarin ASR. 2630-2633 - Cong-Thanh Do, Dominique Pastor, Gaël Le Lan, André Goalic:

Recognizing cochlear implant-like spectrally reduced speech with HMM-based ASR: experiments with MFCCs and PLP coefficients. 2634-2637
Speech Perception: Cross Language and Age
- Kazuhiro Kondo

, Takayuki Kanda, Yosuke Kobayashi, Hiroyuki Yagyu:
Speech intelligibility of diagonally localized speech with competing noise using bone-conduction headphones. 1213-1216 - Pierre L. Divenyi:

Masking of vowel-analog transitions by vowel-analog distracters. 1217-1220 - François Pellegrino, Emmanuel Ferragne, Fanny Meunier:

2010, a speech oddity: phonetic transcription of reversed speech. 1221-1224 - Hsin-Yi Lin, Janice Fon:

Perception on pitch reset at discourse boundaries. 1225-1228 - Marjorie Dole, Michel Hoen, Fanny Meunier:

Effect of spatial separation on speech-in-noise comprehension in dyslexic adults. 1229-1232 - Ellen Marklund, Francisco Lacerda, Anna Ericsson:

Speech categorization context effects in seven- to nine-month-old infants. 1233-1236 - Diane Kewley-Port, Larry E. Humes, Daniel Fogerty:

Changes in temporal processing of speech across the adult lifespan. 1237-1240 - Jared Bernstein, Jian Cheng, Masanori Suzuki:

Fluency and structural complexity as predictors of L2 oral proficiency. 1241-1244 - Marco van de Ven, Benjamin V. Tucker, Mirjam Ernestus:

Semantic facilitation in bilingual everyday speech comprehension. 1245-1248 - Bo-ren Hsieh, Ho-hsien Pan:

L2 experience and non-native vowel categorization of L1-Mandarin speakers. 1249-1252 - Mirjam Wester:

Cross-lingual talker discrimination. 1253-1256 - Takashi Otake:

Dajare is not the lowest form of wit. 1257-1260
SLP Systems
- Rafael Torres, Shota Takeuchi, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano:

Comparison of methods for topic classification in a speech-oriented guidance system. 1261-1264 - Pere Comas, Jordi Turmo, Lluís Màrquez:

Using dependency parsing and machine learning for factoid question answering on spoken documents. 1265-1268 - Carolina Parada, Abhinav Sethy, Mark Dredze, Frederick Jelinek:

A spoken term detection framework for recovering out-of-vocabulary words using the web. 1269-1272 - Hung-yi Lee, Chia-Ping Chen, Ching-feng Yeh, Lin-Shan Lee:

Improved spoken term detection by discriminative training of acoustic models based on user relevance feedback. 1273-1276 - Sebastian Tschöpel, Daniel Schneider:

A lightweight keyword and tag-cloud retrieval algorithm for automatic speech recognition transcripts. 1277-1280 - Noboru Kanedera, Tetsuo Funada, Seiichi Nakagawa:

Lecture subtopic retrieval by retrieval keyword expansion using subordinate concept. 1281-1284 - Hiroaki Nanjo, Yusuke Iyonaga, Takehiko Yoshimi:

Spoken document retrieval for oral presentations integrating global document similarities into local document similarities. 1285-1288 - Joseph Polifroni, Stephanie Seneff:

Combining word-based features, statistical language models, and parsing for named entity recognition. 1289-1292 - Azeddine Zidouni, Sophie Rosset, Hervé Glotin:

Efficient combined approach for named entity recognition in spoken language. 1293-1296 - Sree Harsha Yella, Vasudeva Varma, Kishore Prahallad:

Prominence based scoring of speech segments for automatic speech-to-speech summarization. 1297-1300 - Zihan Liu, Lei Xie, Wei Feng:

Maximum lexical cohesion for fine-grained news story segmentation. 1301-1304 - Xiaoxuan Wang, Lei Xie, Bin Ma, Engsiong Chng, Haizhou Li:

Phoneme lattice based texttiling towards multilingual story segmentation. 1305-1308
Quality of Experiencing Speech Services (Special Session)
- Anton Schlesinger, Marinus M. Boone:

The characterization of the relative information content by spectral features for the objective intelligibility assessment of nonlinearly processed speech. 1309-1312 - Marcel Wältermann, Alexander Raake, Sebastian Möller:

Analytical assessment and distance modeling of speech transmission quality. 1313-1316 - Nicolas Côté, Vincent Koehl, Valérie Gautier-Turbin, Alexander Raake, Sebastian Möller:

An intrusive super-wideband speech quality model: DIAL. 1317-1320 - Sebastian Egger, Raimund Schatz, Stefan Scherer:

It takes two to tango - assessing the impact of delay on conversational interactivity on perceived speech quality. 1321-1324 - Sebastian Möller, Florian Hinterleitner, Tiago H. Falk, Tim Polzehl:

Comparison of approaches for instrumentally predicting the quality of text-to-speech systems. 1325-1328 - Imre Kiss, Joseph Polifroni, Chao Wang, Ghinwa F. Choueiter, Mike Phillips:

A hybrid architecture for mobile voice user interfaces. 1329-1332 - Markku Turunen, Jaakko Hakulinen, Tomi Heimonen:

Assessment of spoken and multimodal applications: lessons learned from laboratory and field studies. 1333-1336 - Klaus-Peter Engelbrecht, Hamed Ketabdar, Sebastian Möller:

Improving cross database prediction of dialogue quality using mixture of experts. 1337-1340
Language Processing
- Camille Guinaudeau, Guillaume Gravier, Pascale Sébillot:

Improving ASR-based topic segmentation of TV programs with confidence measures and semantic relations. 1365-1368 - Saturnino Luz, Jing Su:

The relevance of timing, pauses and overlaps in dialogues: detecting topic changes in scenario based meetings. 1369-1372 - Richard Dufour, Benoît Favre:

Semi-supervised part-of-speech tagging in speech applications. 1373-1376 - Frédéric Tantini, Christophe Cerisara, Claire Gardent:

Memory-based active learning for French broadcast news. 1377-1380 - Dan Gillick:

Can conversational word usage be used to predict speaker demographics?. 1381-1384 - Chao-Hong Liu, Chung-Hsien Wu:

Prosodic word-based error correction in speech recognition using prosodic word expansion and contextual information. 1385-1388
Speech and Audio Segmentation
- Sarah Hoffmann, Beat Pfister:

Fully automatic segmentation for prosodic speech corpora. 1389-1392 - Vahid Khanagha, Khalid Daoudi, Oriol Pont, Hussein M. Yahia:

A novel text-independent phonetic segmentation algorithm based on the microcanonical multiscale formalism. 1393-1396 - You-Yu Lin, Yih-Ru Wang, Yuan-Fu Liao:

Phone boundary detection using sample-based acoustic parameters. 1397-1400 - Utpala Musti, Asterios Toutios, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, Marie-Odile Berger:

HMM-based automatic visual speech segmentation using facial data. 1401-1404 - David Wang, Robert Vogt, Sridha Sridharan:

Bayes factor based speaker segmentation for speaker diarization. 1405-1408 - Qiang Huang, Stephen J. Cox:

Using high-level information to detect key audio events in a tennis game. 1409-1412
Prosody: Analysis
- Catherine Lai:

What do you mean, you're uncertain?: the interpretation of cue words and rising intonation in dialogue. 1413-1416 - Yi-Fen Liu, Shu-Chuan Tseng, Jyh-Shing Roger Jang, C.-H. Alvin Chen:

Coping imbalanced prosodic unit boundary detection with linguistically-motivated prosodic features. 1417-1420 - Zhigang Chen, Guoping Hu, Wei Jiang:

Improving prosodic phrase prediction by unsupervised adaptation and syntactic features extraction. 1421-1424 - Yujia Li, Tan Lee:

Perception-based automatic approximation of F0 contours in Cantonese speech. 1425-1428 - Raul Fernandez, Bhuvana Ramabhadran:

Discriminative training and unsupervised adaptation for labeling prosodic events with limited training data. 1429-1432 - Erin Cvejic

, Jeesun Kim, Chris Davis, Guillaume Gibert:
Prosody for the eyes: quantifying visual prosody using guided principal component analysis. 1433-1436
Systems for LVCSR and Rich Transcription
- Naveen Parihar, Ralf Schlüter, David Rybach, Eric A. Hansen:

Parallel lexical-tree based LVCSR on multi-core processors. 1485-1488 - Jike Chong, Ekaterina Gonina, Kisun You, Kurt Keutzer:

Exploring recognition network representations for efficient speech inference on highly parallel platforms. 1489-1492 - Diamantino Caseiro:

WFST compression for automatic speech recognition. 1493-1496 - Ivan Bulyko:

Speech recognizer optimization under speed constraints. 1497-1500 - Florian Metze, Roger Hsiao, Qin Jin, Udhyakumar Nallasamy, Tanja Schultz:

The 2010 CMU GALE speech-to-text system. 1501-1504 - Tin Lay Nwe, Hanwu Sun, Bin Ma, Haizhou Li:

Speaker diarization in meeting audio for single distant microphone. 1505-1508 - Fernando Batista, Helena Moniz, Isabel Trancoso, Hugo Meinedo, Ana Isabel Mata, Nuno J. Mamede:

Extending the punctuation module for european portuguese. 1509-1512 - Sakriani Sakti, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:

Utilizing a noisy-channel approach for Korean LVCSR. 1513-1516 - Markus Nußbaum-Thom, Simon Wiesler, Martin Sundermeyer, Christian Plahl, Stefan Hahn, Ralf Schlüter, Hermann Ney:

The RWTH 2009 quaero ASR evaluation system for English and German. 1517-1520
Phonetics
- Benjamin Munson, Renata Solum:

When is indexical information about speech activated? evidence from a cross-modal priming experiment. 1521-1524 - Benjamin Munson:

The influence of actual and perceived sexual orientation on diadochokinetic rate in women and men. 1525-1528 - Kristine M. Yu:

Laryngealization and features for Chinese tonal recognition. 1529-1532 - Viet Son Nguyen, Eric Castelli, René Carré:

Production and perception of vietnamese short vowels in V1V2 context. 1533-1536 - Gertraud Fenk-Oczlon, August Fenk:

Measuring basic tempo across languages and some implications for speech rhythm. 1537-1540 - Yukari Hirata, Shigeaki Amano:

Durational structure of Japanese single/geminate stops in three- and four-mora words spoken at varied rates. 1541-1544 - Shin-ichiro Sano, Tomohiko Ooigawa:

Distribution and trichotomic realization of voiced velars in Japanese - an experimental study. 1545-1548 - Jagoda Sieczkowska, Bernd Möbius, Grzegorz Dogil:

Specification in context - devoicing processes in Polish, French, american English and German sonorants. 1549-1552 - Kuniko Y. Nielsen:

Phonetic imitation of Japanese vowel devoicing. 1553-1556 - Mary Stevens, John Hajek:

Post-aspiration in standard Italian: some first cross-regional acoustic evidence. 1557-1560 - Mirko Grimaldi, Andrea Calabrese, Francesco Sigona

, Luigia Garrapa, Bianca Sisinni:
Articulatory grounding of southern salentino harmony processes. 1561-1564 - Yuuki Tanida

, Taiji Ueno, Satoru Saito, Matthew A. Lambon Ralph:
Effects of accent typicality and phonotactic frequency on nonword immediate serial recall performance in Japanese. 1565-1567 - Osamu Fujimura:

How abstract is phonetics?. 1568-1571
Speech Production: Vocal Tract Modeling and Imaging
- Adam C. Lammert, Michael I. Proctor, Shrikanth S. Narayanan:

Data-driven analysis of realtime vocal tract MRI using correlated image regions. 1572-1575 - Michael I. Proctor, Daniel Bone, Athanasios Katsamanis, Shrikanth S. Narayanan:

Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis. 1576-1579 - Yoon-Chul Kim, Shrikanth S. Narayanan, Krishna S. Nayak:

Improved real-time MRI of oral-velar coordination using a golden-ratio spiral view order. 1580-1583 - Erik Bresch, Athanasios Katsamanis, Louis Goldstein, Shrikanth S. Narayanan:

Statistical multi-stream modeling of real-time MRI articulatory speech data. 1584-1587 - Gopal Ananthakrishnan, Pierre Badin, Julián Andrés Valdés Vargas, Olov Engwall:

Predicting unseen articulations from multi-speaker articulatory models. 1588-1591 - Chao Qin, Miguel Á. Carreira-Perpiñán:

Estimating missing data sequences in x-ray microbeam recordings. 1592-1595 - Chao Qin, Miguel Á. Carreira-Perpiñán, Mohsen Farhadloo:

Adaptation of a tongue shape model by local feature transformations. 1596-1599 - Sungbok Lee, Shrikanth S. Narayanan:

Vocal tract contour analysis of emotional speech by the functional data curve representation. 1600-1603 - Adam C. Lammert, Louis Goldstein, Khalil Iskarous:

Locally-weighted regression for estimating the forward kinematics of a geometric vocal tract model. 1604-1607 - Michael Reimer, Frank Rudzicz:

Identifying articulatory goals from kinematic data using principal differential analysis. 1608-1611 - Zuheng Ming, Denis Beautemps, Gang Feng, Sébastien Schmerber:

Estimation of speech lip features from discrete cosinus transform. 1612-1615 - Farzaneh Ahmadi, Ian Vince McLoughlin, Hamid R. Sharifzadeh:

Autoregressive modelling for linear prediction of ultrasonic speech. 1616-1619
Speech Intelligibility Enhancement for All Ages, Health Conditions and Environments (Special Session)
- Takayuki Arai, Nao Hodoshima:

Enhanced speech yielding higher intelligibility for all listeners and environments. 1620-1623 - Seyed Omid Sadjadi, Sanjay A. Patil, John H. L. Hansen:

Quality conversion of non-acoustic signals for facilitating human-to-human speech communication under harsh acoustic conditions. 1624-1627 - Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:

The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion. 1628-1631 - Gibak Kim, Philipos C. Loizou:

A new binary mask based on noise constraints for improved speech intelligibility. 1632-1635 - Yan Tang, Martin Cooke:

Energy reallocation strategies for speech enhancement in known noise conditions. 1636-1639 - Jing Chen, Thomas Baer, Brian C. J. Moore:

Effects of enhancement of spectral changes on speech quality and subjective speech intelligibility. 1640-1643
ASR: Acoustic Model Adaptation
- Catherine Breslin, K. K. Chin, Mark J. F. Gales, Kate M. Knill, Haitian Xu:

Prior information for rapid speaker adaptation. 1644-1647 - Jonas Lööf, Ralf Schlüter, Hermann Ney:

Discriminative adaptation for log-linear acoustic models. 1648-1651 - Dimitra Vergyri, Lori Lamel, Jean-Luc Gauvain:

Automatic speech recognition of multiple accented English data. 1652-1655 - Jinyu Li, Yu Tsao, Chin-Hui Lee:

Shrinkage model adaptation in automatic speech recognition. 1656-1659 - Jinyu Li, Dong Yu, Yifan Gong, Li Deng:

Unscented transform with online distortion estimation for HMM adaptation. 1660-1663 - Michael L. Seltzer, Alex Acero

:
HMM adaptation using linear spline interpolation with integrated spline parameter training for robust speech recognition. 1664-1667
SLP Systems for Information Extraction/Retrieval
- Dong Wang, Simon King, Nicholas W. D. Evans, Raphaël Troncy:

CRF-based stochastic pronunciation modeling for out-of-vocabulary spoken term detection. 1668-1671 - Chia-Ping Chen, Hung-yi Lee, Ching-feng Yeh, Lin-Shan Lee:

Improved spoken term detection by feature space pseudo-relevance feedback. 1672-1675 - Aren Jansen, Kenneth Church, Hynek Hermansky:

Towards spoken term discovery at scale with zero resources. 1676-1679 - Evandro B. Gouvêa, Tony Ezzat:

Vocabulary independent spoken query: a case for subword units. 1680-1683 - Shih-Hsiang Lin, Yao-Ming Yeh, Berlin Chen:

Extractive speech summarization - from the view of decision theory. 1684-1687 - Gabriel Murray, Giuseppe Carenini, Raymond T. Ng:

The impact of ASR on abstractive vs. extractive meeting summaries. 1688-1691
Speech Representation
- Li Deng, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, Geoffrey E. Hinton:

Binary coding of speech spectrograms using a deep auto-encoder. 1692-1695 - Juhan Nam, Gautham J. Mysore, Joachim Ganseman, Kyogu Lee, Jonathan S. Abel:

A super-resolution spectrogram using coupled PLCA. 1696-1699 - Georgios Tzedakis, Yannis Pantazis, Olivier Rosec, Yannis Stylianou:

Fast least-squares solution for sinusoidal, harmonic and quasi-harmonic models. 1700-1703 - Afsaneh Asaei, Hervé Bourlard, Philip N. Garner:

Sparse component analysis for speech recognition in multi-speaker environment. 1704-1707 - Trond Skogstad, Torbjørn Svendsen:

Intra-frame variability as a predictor of frame classifiability. 1708-1711 - Tetsuya Shimamura, Ngoc Dinh Nguyen:

Autocorrelation and double autocorrelation based spectral representations for a noisy word recognition system. 1712-1715
Voice Conversion
- Elina Helander, Hanna Silén, Joaquín Míguez

, Moncef Gabbouj:
Maximum a posteriori voice conversion using sequential monte carlo methods. 1716-1719 - Pierre Lanchantin, Xavier Rodet:

Dynamic model selection for spectral voice conversion. 1720-1723 - Takashi Nose, Takao Kobayashi:

Speaker-independent HMM-based voice conversion using quantized fundamental frequency. 1724-1727 - Daisuke Saito, Shinji Watanabe, Atsushi Nakamura, Nobuaki Minematsu:

Probabilistic integration of joint density model and speaker model for voice conversion. 1728-1731 - Zhizheng Wu, Tomi Kinnunen, Engsiong Chng, Haizhou Li:

Text-independent F0 transformation with non-parallel data for voice conversion. 1732-1735 - Xiaodan Zhuang, Lijuan Wang, Frank K. Soong, Mark Hasegawa-Johnson:

A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion. 1736-1739
Prosody: Language-Specific Models
- Anastasia Karlsson, David House, Jan-Olof Svantesson, Damrong Tayanin:

Influence of lexical tones on intonation in kammu. 1740-1743 - Satoshi Nambu, Yong-cheol Lee:

Phonetic realization of second occurrence focus in Japanese. 1744-1747 - Jianjing Kuang:

Prosodic grouping and relative clause disambiguation in Mandarin. 1748-1751 - Ya Li, Jianhua Tao, Meng Zhang, Shifeng Pan, Xiaoying Xu:

Text-based unstressed syllable prediction in Mandarin. 1752-1755 - Tomás Dubeda:

"flat pitch accents" in Czech. 1756-1759 - Tomás Dubeda:

Positional variability of pitch accents in Czech. 1760-1763 - Shyamal Kr. Das Mandal, Arup Saha, Tulika Basu, Keikichi Hirose, Hiroya Fujisaki:

Modeling of sentence-medial pauses in bangla readout speech: occurrence and duration. 1764-1767 - Adrian Leemann, Lucy Zuberbühler:

Declarative sentence intonation patterns in 8 swiss German dialects. 1768-1771 - Je Hun Jeon, Yang Liu:

Syllable-level prominence detection with acoustic evidence. 1772-1775 - Sankalan Prasad, Kalika Bali:

Prosody cues for classification of the discourse particle "hã" in hindi. 1776-1779 - Yuan Jia, Aijun Li:

Interaction of syntax-marked focus and wh-question induced focus in standard Chinese. 1780-1783 - Samer Al Moubayed, Jonas Beskow:

Prominence detection in Swedish using syllable correlates. 1784-1787 - Na Zhi, Daniel Hirst, Pier Marco Bertinetto:

Automatic analysis of the intonation of a tone language. applying the momel algorithm to spontaneous standard Chinese (beijing). 1788-1791 - Raymond W. M. Ng, Cheung-Chi Leung, Ville Hautamäki, Tan Lee, Bin Ma, Haizhou Li:

Towards long-range prosodic attribute modeling for language recognition. 1792-1795 - Robert Schubert, Oliver Jokisch, Diane Hirschfeld:

A modified parameterization of the Fujisaki model. 1796-1799
ASR: Language Modeling and Speech Understanding I
- Saeedeh Momtazi, Friedrich Faubel, Dietrich Klakow:

Within and across sentence boundary language model. 1800-1803 - Ruhi Sarikaya, Stanley F. Chen

, Abhinav Sethy, Bhuvana Ramabhadran:
Impact of word classing on shrinkage-based language models. 1804-1807 - Stanislas Oger, Vladimir Popescu, Georges Linarès:

Combination of probabilistic and possibilistic language models. 1808-1811 - Brandon Ballinger, Cyril Allauzen, Alexander Gruenstein, Johan Schalkwyk:

On-demand language model interpolation for mobile speech input. 1812-1815 - Tim Schlippe, Chenfei Zhu, Jan Gebhardt, Tanja Schultz:

Text normalization based on statistical machine translation and internet user support. 1816-1819 - Tanel Alumäe, Mikko Kurimo:

Efficient estimation of maximum entropy language models with n-gram features: an SRILM extension. 1820-1823 - Christian Gillot, Christophe Cerisara, David Langlois, Jean Paul Haton:

Similar n-gram language model. 1824-1827 - Markpong Jongtaveesataporn, Sadaoki Furui:

Topic and style-adapted language modeling for Thai broadcast news ASR. 1828-1831 - Ahmad Emami, Hong-Kwang Jeff Kuo, Imed Zitouni, Lidia Mangu:

Augmented context features for Arabic speech recognition. 1832-1835 - Lucía Ortega, Isabel Galiano, Lluís F. Hurtado, Emilio Sanchis, Encarna Segarra:

A statistical segment-based approach for spoken language understanding. 1836-1839 - Benjamin Lecouteux, Raphaël Rubino, Georges Linarès:

Improving back-off models with bag of words and hollow-grams. 2418-2421 - Ciprian Chelba, Thorsten Brants, Will Neveitt, Peng Xu:

Study on interaction between entropy pruning and kneser-ney smoothing. 2422-2425 - Hitoshi Yamamoto, Ken Hanazawa, Kiyokazu Miki, Koichi Shinoda:

Dynamic language model adaptation using keyword category classification. 2426-2429 - Welly Naptali, Masatoshi Tsuchiya, Seiichi Nakagawa:

Integration of cache-based model and topic dependent class model with soft clustering and soft voting. 2430-2433 - Frédéric Duvert, Renato de Mori:

Conditional models for detecting lambda-functions in a spoken language understanding system. 2434-2437 - Md. Akmal Haidar, Douglas D. O'Shaughnessy:

Novel weighting scheme for unsupervised language model adaptation using latent dirichlet allocation. 2438-2441 - Qun Feng Tan, Kartik Audhkhasi, Panayiotis G. Georgiou, Emil Ettelaie, Shrikanth S. Narayanan:

Automatic speech recognition system channel modeling. 2442-2445 - Takanobu Oba, Takaaki Hori, Atsushi Nakamura:

Round-robin discrimination model for reranking ASR hypotheses. 2446-2449 - Hasim Sak, Murat Saraclar, Tunga Güngör:

On-the-fly lattice rescoring for real-time automatic speech recognition. 2450-2453
First and Second Language Acquisition
- Angela Cooper, Yue Wang:

Cantonese tone word learning by tone and non-tone language speakers. 1840-1843 - Anne Cutler, Janise Shanley:

Validation of a training method for L2 continuous-speech segmentation. 1844-1847 - Jiahong Yuan:

Linguistic rhythm in foreign accent. 1848-1849 - Mee Sonu, Keiichi Tajima, Hiroaki Kato, Yoshinori Sagisaka:

The effect of a word embedded in a sentence and speaking rate variation on the perceptual training of geminate and singleton consonant distinction. 1850-1853 - Chiharu Tsurutani:

Foreign accent matters most when timing is wrong. 1854-1857 - Hyejin Hong, Jina Kim, Minhwa Chung:

Effects of Korean learners' consonant cluster reduction strategies on English speech recognition performance. 1858-1861 - June S. Levitt, William F. Katz:

The effects of EMA-based augmented visual feedback on the English speakers' acquisition of the Japanese flap: a perceptual study. 1862-1865 - Hinako Masuda, Takayuki Arai:

Perception of voiceless fricatives by Japanese listeners of advanced and intermediate level English proficiency. 1866-1869 - Lya Meister, Einar Meister

:
Perception of estonian vowel categories by native and non-native speakers. 1870-1873 - Qin Shi, Kun Li, Shilei Zhang, Stephen M. Chu, Ji Xiao, Zhijian Ou:

Spoken English assessment system for non-native speakers using acoustic and prosodic features. 1874-1877 - Elena E. Lyakso, Olga V. Frolova, Anna V. Kurazhova, Julia S. Gaikova:

Russian infants and children's sounds and speech corpuses for language acquisition studies. 1878-1881 - Julia Monnin, Hélène Loevenbruck:

Language-specific influence on phoneme development: French and drehu data. 1882-1885 - Jeffrey J. Holliday, Mary E. Beckman, Chanelle Mays:

Did you say susi or shushi? measuring the emergence of robust fricative contrasts in English- and Japanese-acquiring children. 1886-1889
Spoken Language Resources, Systems and Evaluation I, II
- Josef R. Novak, Paul R. Dixon, Sadaoki Furui:

An empirical comparison of the t3, juicer, HDecode and sphinx3 decoders. 1890-1893 - Philip N. Garner, John Dines:

Tracter: a lightweight dataflow framework. 1894-1897 - Marelie H. Davel, Febe de Wet:

Verifying pronunciation dictionaries using conflict analysis. 1898-1901 - Brandon Roy, Soroush Vosoughi, Deb Roy:

Automatic estimation of transcription accuracy and difficulty. 1902-1905 - Benjamin Lambert, Rita Singh, Bhiksha Raj:

Creating a linguistic plausibility dataset with non-expert annotators. 1906-1909 - Xinhui Hu, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:

Construction and evaluations of an annotated Chinese conversational corpus in travel domain for the language model of speech recognition. 1910-1913 - Thad Hughes, Kaisuke Nakajima, Linne Ha, Atul Vasu, Pedro J. Moreno, Mike LeBeau:

Building transcribed speech corpora quickly and cheaply for many languages. 1914-1917 - Heidi Christensen, Jon Barker, Ning Ma, Phil D. Green:

The CHiME corpus: a resource and a challenge for computational hearing in multisource environments. 1918-1921 - Wen Cao, Dongning Wang, Jinsong Zhang, Ziyu Xiong:

Developing a Chinese L2 speech database of Japanese learners with narrow-phonetic labels for computer assisted pronunciation training. 1922-1925 - Shogo Ishikawa, Shinya Kiriyama, Yoichi Takebayashi, Shigeyoshi Kitazawa:

How children acquire situation understanding skills?: a developmental analysis utilizing multimodal speech behavior corpus. 1926-1929 - Ina Wechsung, Stefan Schaffer, Robert Schleicher, Anja Naumann, Sebastian Möller:

The influence of expertise and efficiency on modality selection strategies and perceived mental effort. 1930-1933 - Christine Kühnel, Benjamin Weiss, Sebastian Möller:

Parameters describing multimodal interaction - definitions and three usage scenarios. 1934-1937 - Alexander Zgorzelski, Alexander Schmitt, Tobias Heinroth, Wolfgang Minker:

Repair strategies on trial: which error recovery do users like best?. 1938-1941 - Maryam Kamvar, Doug Beeferman:

Say what? why users choose to speak their web queries. 1966-1969 - Jonathan Teutenberg, Catherine Inez Watson:

The effect of audience familiarity on the perception of modified accent. 1970-1973 - Korin Richmond, Robert A. J. Clark, Susan Fitt:

On generating combilex pronunciations via morphological analysis. 1974-1977 - Florian Gödde, Sebastian Möller:

Say it as you mean it - analyzing free user comments in the VOICE awards corpus. 1978-1981 - Viktor Rozgic, Bo Xiao, Athanasios Katsamanis, Brian R. Baucom

, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
A new multichannel multi modal dyadic interaction database. 1982-1985 - Dau-Cheng Lyu, Tien Ping Tan, Engsiong Chng, Haizhou Li:

SEAME: a Mandarin-English code-switching speech corpus in south-east asia. 1986-1989
Speech Production: Analysis
- Daniel Felps, Christian Geng, Michael Berger, Korin Richmond, Ricardo Gutierrez-Osuna:

Relying on critical articulators to estimate vocal tract spectra in an articulatory-acoustic database. 1990-1993 - Vikram Ramanarayanan, Dani Byrd, Louis Goldstein, Shrikanth S. Narayanan:

Investigating articulatory setting - pauses, ready position, and rest - using real-time MRI. 1994-1997 - Chao Qin, Miguel Á. Carreira-Perpiñán:

Articulatory inversion of american English /turnr/ by conditional density modes. 1998-2001 - Atef Ben Youssef, Pierre Badin, Gérard Bailly:

Can tongue be recovered from face? the answer of data-driven statistical models. 2002-2005 - Francisco Torreira, Mirjam Ernestus:

Phrase-medial vowel devoicing in spontaneous French. 2006-2009 - Chierh Cheng, Yi Xu, Michele Gubian:

Exploring the mechanism of tonal contraction in taiwan Mandarin. 2010-2013
Paralanguage Cognition
- Benjamin Weiss, Felix Burkhardt:

Voice attributes affecting likability perception. 2014-2017 - Kristiina Jokinen, Kazuaki Harada, Masafumi Nishida, Seiichi Yamamoto:

Turn-alignment using eye-gaze and speech in conversational interaction. 2018-2021 - Tet Fei Yap, Julien Epps, Eliathamby Ambikairajah, Eric H. C. Choi:

An investigation of formant frequencies for cognitive load classification. 2022-2025 - Martijn Goudbeek, Mirjam Broersma:

Language specific effects of emotion on phoneme duration. 2026-2029 - Matthew Black, Athanasios Katsamanis, Chi-Chun Lee, Adam C. Lammert, Brian R. Baucom

, Andrew Christensen, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Automatic classification of married couples' behavior using audio features. 2030-2033 - Gideon Kowadlo, Patrick Ye, Ingrid Zukerman:

Influence of gestural salience on the interpretation of spoken requests. 2034-2037
Robust ASR Against Noise
- Vikramjit Mitra, Hosung Nam, Carol Y. Espy-Wilson, Elliot Saltzman, Louis Goldstein:

Robust word recognition using articulatory trajectories and gestures. 2038-2041 - Takeshi Yamada, Tomohiro Nakajima, Nobuhiko Kitawaki, Shoji Makino:

Performance estimation of noisy speech recognition considering recognition task complexity. 2042-2045 - Friedrich Faubel, Dietrich Klakow:

Estimating noise from noisy speech features with a monte carlo variant of the expectation maximization algorithm. 2046-2049 - Satoshi Tamura, Eriko Hishikawa, Wataru Taguchi, Satoru Hayamizu:

Template-based spectral estimation using microphone array for speech recognition. 2050-2053 - Aleem Mushtaq, Yu Tsao, Chin-Hui Lee:

A particle filter feature compensation approach to robust speech recognition. 2054-2057 - Chanwoo Kim

, Richard M. Stern:
Nonlinear enhancement of onset for robust speech recognition. 2058-2061 - Shirin Badiezadegan, Richard C. Rose:

Mask estimation in non-stationary noise environments for missing feature based robust speech recognition. 2062-2065 - Lae-Hoon Kim, Kyung-Tae Kim, Mark Hasegawa-Johnson:

Robust automatic speech recognition with decoder oriented ideal binary mask estimation. 2066-2069 - Gökhan Ince, Kazuhiro Nakadai, Tobias Rodemann, Hiroshi Tsujino, Jun-ichi Imura:

A robust speech recognition system against the ego noise of a robot. 2070-2073 - Kuo-Hao Wu, Chia-Ping Chen:

Empirical mode decomposition for noise-robust automatic speech recognition. 2074-2077 - Wooil Kim, Jun-Won Suh, John H. L. Hansen:

An effective feature compensation scheme tightly matched with speech recognizer employing SVM-based GMM generation. 2078-2081 - Jort F. Gemmeke, Tuomas Virtanen:

Artificial and online acquired noise dictionaries for noise robust ASR. 2082-2085 - Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:

Voice activity detection based on conditional random fields using multiple features. 2086-2089 - Yong Zhao, Biing-Hwang Juang:

A comparative study of noise estimation algorithms for VTS-based robust speech recognition. 2090-2093 - Frank Seide, Pei Zhao:

On using missing-feature theory with cepstral features - approximations to the multivariate integral. 2094-2097 - Yang Sun, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves:

Using a DBN to integrate sparse classification and GMM-based ASR. 2098-2101
Voice Conversion and Speech Synthesis
- Axel Röbel:

Shape-invariant speech transformation with the phase vocoder. 2146-2149 - Kayoko Yanagisawa, Mark A. Huckvale:

A phonetic alternative to cross-language voice conversion in a text-dependent context: evaluation of speaker identity. 2150-2153 - Esther Klabbers, Alexander Kain, Jan P. H. van Santen:

Evaluation of speaker mimic technology for personalizing SGD voices. 2154-2157 - Kumi Ohta, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano:

Adaptive voice-quality control based on one-to-many eigenvoice conversion. 2158-2161 - Fernando Villavicencio, Jordi Bonada:

Applying voice conversion to concatenative singing-voice synthesis. 2162-2165 - Miaomiao Wang, Miaomiao Wen, Keikichi Hirose, Nobuaki Minematsu:

Improved generation of fundamental frequency in HMM-based speech synthesis using generation process model. 2166-2169 - Ming Lei, Yi-Jian Wu, Frank K. Soong, Zhen-Hua Ling, Li-Rong Dai:

A hierarchical F0 modeling method for HMM-based speech synthesis. 2170-2173 - Javier Latorre, Mark J. F. Gales, Heiga Zen:

Training a parametric-based logF0 model with the minimum generation error criterion. 2174-2177 - Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, Nobuaki Minematsu:

Improving Mandarin segmental duration prediction with automatically extracted syntax features. 2178-2181 - Daniel R. van Niekerk, Etienne Barnard:

An intonation model for TTS in sepedi. 2182-2185 - Michael Pucher, Dietmar Schabus, Junichi Yamagishi:

Synthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners. 2186-2189 - Gabriel Webster, Sacha Krstulovic, Kate M. Knill:

A comparison of pronunciation modeling approaches for HMM-TTS. 2190-2193 - Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi:

HMM-based text-to-articulatory-movement prediction and analysis of critical articulators. 2194-2197
Detection, Classification, and Segmentation
- Jiaxing Ye, Takumi Kobayashi, Tetsuya Higuchi:

Audio-based sports highlight detection by fourier local auto-correlations. 2198-2201 - Hynek Boril, Abhijeet Sangwan, Taufiq Hasan, John H. L. Hansen:

Automatic excitement-level detection for sports highlights generation. 2202-2205 - Jörg-Hendrik Bach, Jörn Anemüller:

Detecting novel objects in acoustic scenes through classifier incongruence. 2206-2209 - Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis:

A multidomain approach for automatic home environmental sound classification. 2210-2213 - Patrick Cardinal, Vishwa Gupta, Gilles Boulianne

:
Content-based advertisement detection. 2214-2217 - Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis:

Identification of abnormal audio events based on probabilistic novelty detection. 2218-2221 - Norbert Braunschweiler, Mark J. F. Gales, Sabine Buchholz:

Lightly supervised recognition for automatic alignment of large coherent speech recordings. 2222-2225 - Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman:

Incremental diarization of telephone conversations. 2226-2229 - Srikanth Cherla, V. Ramasubramanian:

Audio analytics by template modeling and 1-pass DP based decoding. 2230-2233 - Mariusz Ziólko, Jakub Galka, Bartosz Ziólko, Tomasz Drwiega:

Perceptual wavelet decomposition for speech segmentation. 2234-2237 - Venkatesh Keri, Kishore Prahallad:

A comparative study of constrained and unconstrained approaches for segmentation of speech signal. 2238-2241 - Morgan Sonderegger, Joseph Keshet:

Automatic discriminative measurement of voice onset time. 2242-2245 - Yi Ren Leng, Tran Huy Dat, Norihide Kitaoka, Haizhou Li:

Selective gammatone filterbank feature for robust sound event recognition. 2246-2249
Compressive Sensing for Speech and Language Processing (Special Session)
- Allen Y. Yang, Zihan Zhou, Yi Ma, Shankar Sastry:

Towards a robust face recognition system using compressive sensing. 2250-2253 - Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky, Abhinav Sethy:

Sparse representation features for speech recognition. 2254-2257 - Abhinav Sethy, Tara N. Sainath, Bhuvana Ramabhadran, Dimitri Kanevsky:

Data selection for language modeling using sparse representations. 2258-2261 - Jort F. Gemmeke, Ulpu Remes, Kalle J. Palomäki:

Observation uncertainty measures for sparse imputation. 2262-2265 - Tara N. Sainath, Sameer Maskey, Dimitri Kanevsky, Bhuvana Ramabhadran, David Nahamoo, Julia Hirschberg:

Sparse representations for text categorization. 2266-2269 - Garimella S. V. S. Sivaram, Sriram Ganapathy, Hynek Hermansky:

Sparse auto-associative neural networks: theory and application to speech recognition. 2270-2273
ASR: Lexical and Pronunciation Modeling
- Chi Hu, Xiaodan Zhuang, Mark Hasegawa-Johnson:

FSM-based pronunciation modeling using articulatory phonological code. 2274-2277 - Denis Jouvet, Dominique Fohr, Irina Illina:

Detailed pronunciation variant modeling for speech transcription. 2278-2281 - Line Adde, Bert Réveil, Jean-Pierre Martens, Torbjørn Svendsen:

A minimum classification error approach to pronunciation variation modeling of non-native proper names. 2282-2285 - Antoine Laurent, Sylvain Meignier, Téva Merlin, Paul Deléglise:

Acoustics-based phonetic transcription method for proper nouns. 2286-2289 - Tim Schlippe, Sebastian Ochs, Tanja Schultz:

Wiktionary as a source for automatic pronunciation extraction. 2290-2293 - Ibrahim Badr, Ian McGraw, James R. Glass:

Learning new word pronunciations from spoken examples. 2294-2297
Speaker Recognition and Diarization
- I-Fan Chen, Shih-Sian Cheng, Hsin-Min Wang:

Phonetic subspace mixture model for speaker diarization. 2298-2301 - Martin Zelenák, Carlos Segura, Javier Hernando:

Overlap detection for speaker diarization by fusing spectral and spatial features. 2302-2305 - Alfred Dielmann, Giulia Garau, Hervé Bourlard:

Floor holder detection and end of speaker turn prediction in meetings. 2306-2309 - Carlos Vaquero, Alfonso Ortega

, Jesús Antonio Villalba López, Antonio Miguel, Eduardo Lleida:
Confidence measures for speaker segmentation and their relation to speaker verification. 2310-2313 - Anthony Larcher, Christophe Lévy, Driss Matrouf, Jean-François Bonastre:

Decoupling session variability modelling and speaker characterisation. 2314-2317 - Cheung-Chi Leung, Donglai Zhu, Kong-Aik Lee, Bin Ma, Haizhou Li:

Incorporating MAP estimation and covariance transform for SVM based speaker recognition. 2318-2321
Speech and Audio Classification
- Stéphane Rossignol, Olivier Pietquin:

Single-speaker/multi-speaker co-channel speech classification. 2322-2325 - Oriol Vinyals, Gerald Friedland, Nelson Morgan:

Discriminative training for hierarchical clustering in speaker diarization. 2326-2329 - Jürgen T. Geiger, Frank Wallhoff, Gerhard Rigoll:

GMM-UBM based open-set online speaker diarization. 2330-2333 - Ladan Golipour, Douglas D. O'Shaughnessy:

A segment-based non-parametric approach for monophone recognition. 2334-2337 - Taras Butko, Climent Nadeu:

A fast one-pass-training feature selection technique for GMM-based acoustic event detection with audio-visual data. 2338-2341 - Nobuhide Yamakawa, Tetsuro Kitahara, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:

Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition. 2342-2345
Emotion Recognition
- Ling He, Margaret Lech, Nicholas B. Allen:

On the importance of glottal flow spectral energy for the recognition of emotions in speech. 2346-2349 - Laurence Devillers, Christophe Vaudable, Clément Chastagnol:

Real-life emotion-related states detection in call centers: a cross-corpora study. 2350-2353 - Ali Hassan, Robert I. Damper:

Multi-class and hierarchical SVMs for emotion recognition. 2354-2357 - David Philippou-Hübner, Bogdan Vlasenko, Tobias Grosser, Andreas Wendemuth:

Determining optimal features for emotion recognition from speech by applying an evolutionary algorithm. 2358-2361 - Martin Wöllmer, Angeliki Metallinou, Florian Eyben, Björn W. Schuller

, Shrikanth S. Narayanan:
Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling. 2362-2365 - Kartik Audhkhasi, Shrikanth S. Narayanan:

Data-dependent evaluator modeling and its application to emotional valence classification from speech. 2366-2369
Speech Coding, Modeling, and Transmission
- Zhanyu Ma, Arne Leijon:

Modelling speech line spectral frequencies with dirichlet mixture models. 2370-2373 - Zhanyu Ma, Arne Leijon:

PDF-optimized LSF vector quantization based on beta mixture models. 2374-2377 - José Enrique García Laínez, Alfonso Ortega

, Antonio Miguel, Eduardo Lleida:
Non-linear predictive vector quantization of feature vectors for distributed speech recognition. 2378-2381 - Lasse Laaksonen, Mikko Tammi, Vladimir Malenovsky, Tommy Vaillancourt, Mi Suk Lee, Tomofumi Yamanashi, Masahiro Oshikiri, Claude Lamblin, Balázs Kövesi, Lei Miao, Deming Zhang, Jon Gibbs, Holly Francois:

Superwideband extension of g.718 and g.729.1 speech codecs. 2382-2385 - José L. Carmona, Angel M. Gomez, Antonio M. Peinado, José L. Pérez-Córdoba, José A. González:

A multipulse FEC scheme based on amplitude estimation for CELP codecs over packet networks. 2386-2389 - Anssi Rämö, Henri Toukomaa:

Voice quality evaluation of recent open source codecs. 2390-2393 - Bengt J. Borgström, Per Henrik Borgström, Abeer Alwan:

Efficient HMM-based estimation of missing features, with applications to packet loss concealment. 2394-2397 - Xiaoqiang Xiao, Robert M. Nickel

:
Speech inventory based discriminative training for joint speech enhancement and low-rate speech coding. 2398-2401 - Qipeng Gong, Peter Kabal:

Quality-based playout buffering with FEC for conversational voIP. 2402-2405 - Masatsune Tamura, Takehiko Kagoshima, Masami Akamine:

Sub-band basis spectrum model for pitch-synchronous log-spectrum and phase based on approximation of sparse coding. 2406-2409 - Sundar Harshavardhan, Chandra Sekhar Seelamantula, Thippur V. Sreenivas:

A multimodal density function estimation approach to formant tracking. 2410-2413 - Heikki Rasilo, Unto K. Laine, Okko Johannes Räsänen:

Estimation studies of vocal tract shape trajectory using a variable length and lossy kelly-lochbaum model. 2414-2417
Speech Perception: Processing and Intelligibility
- Serajul Haque, Roberto Togneri:

A feature extraction method for automatic speech recognition based on the cochlear nucleus. 2454-2457 - Samuel Thomas, Kailash Patil, Sriram Ganapathy, Nima Mesgarani, Hynek Hermansky:

A phoneme recognition framework based on auditory spectro-temporal receptive fields. 2458-2461 - Amy V. Beeston, Guy J. Brown:

Perceptual compensation for effects of reverberation in speech identification: a computer model based on auditory efferent processing. 2462-2465 - Barbara Schuppler, Mirjam Ernestus, Wim A. van Dommelen, Jacques C. Koreman:

Predicting human perception and ASR classification of word-final [t] by its acoustic sub-segmental properties. 2466-2469 - Matthew Robertson, Guy J. Brown, Wendy Lecluyse, Manasa Panda, Christine M. Tan:

A speech-in-noise test based on spoken digits: comparison of normal and impaired listeners using a computer model. 2470-2473 - Takayuki Kagomiya, Seiji Nakagawa:

Evaluation of bone-conducted ultrasonic hearing-aid regarding transmission of paralinguistic information: a comparison with cochlear implant simulator. 2474-2477 - Tim Jürgens, Stefan Fredelake, Ralf M. Meyer, Birger Kollmeier, Thomas Brand:

Challenging the speech intelligibility index: macroscopic vs. microscopic prediction of sentence recognition in normal and hearing-impaired listeners. 2478-2481 - Verena N. Uslar, Thomas Brand, Mirko Hanke, Rebecca Carroll, Esther Ruigendijk, Cornelia Hamann, Birger Kollmeier:

Does sentence complexity interfere with intelligibility in noise? evaluation of the oldenburg linguistically and audiologically controlled sentence test (OLACS). 2482-2485 - Juan-Pablo Ramirez, Hamed Ketabdar, Alexander Raake:

Intelligibility predictions for speech against fluctuating masker. 2486-2489 


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID