Stop the war!
Остановите войну!
for scientists:
default search action
INTERSPEECH 2009: Brighton, UK
- 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009, Brighton, United Kingdom, September 6-10, 2009. ISCA 2009
Keynotes
- Sadaoki Furui:
Selected topics from 40 years of research on speech and speaker recognition. 1-8 - Thomas L. Griffiths:
Connecting human and machine learning via probabilistic models of cognition. 9-12 - Deb Roy:
New horizons in the study of child language acquisition. 13-20 - Mari Ostendorf:
Transcribing human-directed speech for spoken language processing. 21-27
ASR: Features for Noise Robustness
- Chanwoo Kim, Richard M. Stern:
Feature extraction for robust speech recognition using a power-law nonlinearity and power-bias subtraction. 28-31 - Yu-Hsiang Bosco Chiu, Bhiksha Raj, Richard M. Stern:
Towards fusion of feature extraction and acoustic model training: a top down process for robust speech recognition. 32-35 - Hong You, Abeer Alwan:
Temporal modulation processing of speech signals for noise robust ASR. 36-39 - Luz García, Roberto Gemello, Franco Mana, José C. Segura:
Progressive memory-based parametric non-linear feature equalization. 40-43 - Osamu Ichikawa, Takashi Fukuda, Ryuki Tachibana, Masafumi Nishimura:
Dynamic features in the linear domain for robust automatic speech recognition in a reverberant environment. 44-47 - Antonio Miguel, Alfonso Ortega, Luis Buera, Eduardo Lleida:
Local projections and support vector based feature selection in speech recognition. 48-51
Production: Articulatory Modelling
- Qiang Fang, Akikazu Nishikido, Jianwu Dang, Aijun Li:
Feedforward control of a 3d physiological articulatory model for vowel production. 52-55 - Jun Cai, Yves Laprie, Julie Busset, Fabrice Hirsch:
Articulatory modeling based on semi-polar coordinates and guided PCA technique. 56-59 - Juraj Simko, Fred Cummins:
Sequencing of articulatory gestures using cost optimization. 60-63 - Xiao Bo Lu, William Thorpe, Kylie Foster, Peter Hunter:
From experiments to articulatory motion - a three dimensional talking head model. 64-67 - Javier Pérez, Antonio Bonafonte:
Towards robust glottal source modeling. 68-71 - Takayuki Arai:
Sliding vocal-tract model and its application for vowel production. 72-75
Systems for LVCSR and Rich Transcription
- Haihua Xu, Daniel Povey, Jie Zhu, Guanyong Wu:
Minimum hypothesis phone error as a decoding method for speech recognition. 76-79 - Stefan Kombrink, Lukás Burget, Pavel Matejka, Martin Karafiát, Hynek Hermansky:
Posterior-based out of vocabulary word detection in telephone speech. 80-83 - Yuya Akita, Masato Mimura, Tatsuya Kawahara:
Automatic transcription system for meetings of the Japanese national congress. 84-87 - Jonas Lööf, Christian Gollan, Hermann Ney:
Cross-language bootstrapping for unsupervised acoustic model training: rapid development of a Polish speech recognition system. 88-91 - Alberto Abad, Isabel Trancoso, Nelson Neto, Céu Viana:
Porting an european portuguese broadcast news recognition system to brazilian portuguese. 92-95 - Julien Despres, Petr Fousek, Jean-Luc Gauvain, Sandrine Gay, Yvan Josse, Lori Lamel, Abdelkhalek Messaoudi:
Modeling northern and southern varieties of dutch for STT. 96-99
Speech Analysis and Processing I-III
- Thomas Ewender, Sarah Hoffmann, Beat Pfister:
Nearly perfect detection of continuous f_0 contour and frame classification for TTS synthesis. 100-103 - Yannis Pantazis, Olivier Rosec, Yannis Stylianou:
AM-FM estimation for speech based on a time-varying sinusoidal model. 104-107 - Jón Guðnason, Mark R. P. Thomas, Patrick A. Naylor, Daniel P. W. Ellis:
Voice source waveform analysis and synthesis using principal component analysis and Gaussian mixture modelling. 108-111 - Jung Ook Hong, Patrick J. Wolfe:
Model-based estimation of instantaneous pitch in noisy speech. 112-115 - Thomas Drugman, Baris Bozkurt, Thierry Dutoit:
Complex cepstrum-based decomposition of speech for glottal source estimation. 116-119 - Frank Tompkins, Patrick J. Wolfe:
Approximate intrinsic fourier analysis of speech. 120-123 - Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu:
Spectral and temporal modulation features for phonetic recognition. 1071-1074 - Ibon Saratxaga, Daniel Erro, Inmaculada Hernáez, Iñaki Sainz, Eva Navas:
Use of harmonic phase information for polarity detection in speech signals. 1075-1078 - Michael Wohlmayr, Franz Pernkopf:
Finite mixture spectrogram modeling for multipitch tracking using a factorial hidden Markov model. 1079-1082 - Anthony P. Stark, Kuldip K. Paliwal:
Group-delay-deviation based spectral analysis of speech. 1083-1086 - Joseph M. Anand, B. Yegnanarayana, Sanjeev Gupta, M. R. Kesheorey:
Speaker dependent mapping for low bit rate coding of throat microphone speech. 1087-1090 - G. Bapineedu, B. Avinash, Suryakanth V. Gangashetty, B. Yegnanarayana:
Analysis of Lombard speech using excitation source information. 1091-1094 - Andrew Errity, John McKenna:
A comparison of linear and nonlinear dimensionality reduction methods applied to synthetic speech. 1095-1098 - Christian Fischer Pedersen, Ove Andersen, Paul Dalsgaard:
ZZT-domain immiscibility of the opening and closing phases of the LF GFM under frame length variations. 1099-1102 - Hongjun Sun, Jianhua Tao, Huibin Jia:
Dimension reducing of LSF parameters based on radial basis function neural network. 1103-1106 - A. N. Harish, D. Rama Sanand, Srinivasan Umesh:
Characterizing speaker variability using spectral envelopes of vowel sounds. 1107-1110 - Tharmarajah Thiruvaran, Eliathamby Ambikairajah, Julien Epps:
Analysis of band structures for speaker-specific information in FM feature extraction. 1111-1114 - Karl Schnell, Arild Lacroix:
Artificial nasalization of speech sounds based on pole-zero models of spectral relations between mouth and nose signals. 1115-1118 - Andrew Hines, Naomi Harte:
Error metrics for impaired auditory nerve responses of different phoneme groups. 1119-1122 - Chatchawarn Hansakunbuntheung, Hiroaki Kato, Yoshinori Sagisaka:
Model-based automatic evaluation of L2 learner's English timing. 2871-2874 - Petko Nikolov Petkov, Iman S. Mossavat, W. Bastiaan Kleijn:
A Bayesian approach to non-intrusive quality assessment of speech. 2875-2878 - Ladan Baghai-Ravary, Greg Kochanski, John S. Coleman:
Precision of phoneme boundaries derived using hidden Markov models. 2879-2882 - Lakshmish Kaushik, Douglas D. O'Shaughnessy:
A novel method for epoch extraction from speech signals. 2883-2886 - Jia Min Karen Kua, Julien Epps, Eliathamby Ambikairajah, Eric H. C. Choi:
LS regularization of group delay features for speaker recognition. 2887-2890 - Thomas Drugman, Thierry Dutoit:
Glottal closure and opening instant detection from speech signals. 2891-2894
Speech Perception I, II
- Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano:
Relative importance of formant and whole-spectral cues for vowel perception. 124-127 - Chihiro Takeshima, Minoru Tsuzaki, Toshio Irino:
Influences of vowel duration on speaker-size estimation and discrimination. 128-131 - Václav Jonás Podlipský, Radek Skarnitzl, Jan Volín:
High front vowels in Czech: a contrast in quantity or quality? 132-135 - Marjorie Dole, Michel Hoen, Fanny Meunier:
Effect of contralateral noise on energetic and informational masking on speech-in-speech intelligibility. 136-139 - Heidi Christensen, Jon Barker:
Using location cues to track speaker changes from mobile, binaural microphones. 140-143 - Ioana Vasilescu, Martine Adda-Decker, Lori Lamel, Pierre A. Hallé:
A perceptual investigation of speech transcription errors involving frequent near-homophones in French and american English. 144-147 - Etienne Gaudrain, Su Li, Vin Shen Ban, Roy D. Patterson:
The role of glottal pulse rate and vocal tract length in the perception of speaker identity. 148-151 - Victoria Medina, Willy Serniclaes:
Development of voicing categorization in deaf children with cochlear implant. 152-155 - Annie Tremblay:
Processing liaison-initial words in native and non-native French: evidence from eye movements. 156-159 - Nigel G. Ward, Benjamin H. Walker:
Estimating the potential of signal and interlocutor-track information for language modeling. 160-163 - Antje Heinrich, Sarah Hawkins:
Effect of r-resonance information on intelligibility. 804-807 - Hsin-Yi Lin, Janice Fon:
Perception of temporal cues at discourse boundaries. 808-811 - Zhanyu Ma, Arne Leijon:
Human audio-visual consonant recognition analyzed with three bimodal integration models. 812-815 - Hanny den Ouden, Hugo Quené:
Effects of tempo in radio commercials on young and elderly listeners. 816-819 - Sofia Strömbergsson:
Self-voice recognition in 4 to 5-year-old children. 820-823 - Olov Engwall, Preben Wik:
Are real tongue movements easier to speech read than synthesized? 824-827 - Carmen Peláez-Moreno, Ana I. García-Moral, Francisco J. Valverde-Albacete:
Eliciting a hierarchical structure of human consonant perception task errors using formal concept analysis. 828-831 - Takeshi Saitou, Masataka Goto:
Acoustic and perceptual effects of vocal training in amateur male singing. 832-835
Accent and Language Recognition
- Florian Verdet, Driss Matrouf, Jean-François Bonastre, Jean Hennebert:
Factor analysis and SVM for language recognition. 164-167 - Sabato Marco Siniscalchi, Jeremy Reed, Torbjørn Svendsen, Chin-Hui Lee:
Exploring universal attribute characterization of spoken languages for spoken language recognition. 168-171 - Abhijeet Sangwan, John H. L. Hansen:
On the use of phonological features for automatic accent analysis. 172-175 - Fabio Castaldo, Sandro Cumani, Pietro Laface, Daniele Colibro:
Language recognition using language factors. 176-179 - Je Hun Jeon, Yang Liu:
Automatic accent detection: effect of base units and boundary information. 180-183 - Ron M. Hecht, Omer Hezroni, Amit Manna, Ruth Aloni-Lavi, Gil Dobry, Amir Alfandary, Yaniv Zigel:
Age verification using a hybrid speech processing approach. 184-187 - Ron M. Hecht, Omer Hezroni, Amit Manna, Gil Dobry, Yaniv Zigel, Naftali Tishby:
Information bottleneck based age verification. 188-191 - Fred S. Richardson, William M. Campbell, Pedro A. Torres-Carrasquillo:
Discriminative n-gram selection for dialect recognition. 192-195 - Linsen Loots, Thomas Niesler:
Data-driven phonetic comparison and conversion between south african, british and american English pronunciations. 196-199 - Rong Tong, Bin Ma, Haizhou Li, Engsiong Chng, Kong-Aik Lee:
Target-aware language models for spoken language recognition. 200-203 - Daniel Chung Yong Lim, Ian R. Lane:
Language identification for speech-to-speech translation. 204-207 - Fadi Biadsy, Julia Hirschberg:
Using prosody and phonotactics in Arabic dialect identification. 208-211
ASR: Acoustic Model Training and Combination
- Pierre L. Dognin, John R. Hershey, Vaibhava Goel, Peder A. Olsen:
Refactoring acoustic models using variational expectation-maximization. 212-215 - Georg Heigold, David Rybach, Ralf Schlüter, Hermann Ney:
Investigations on convex optimization using log-linear HMMs for digit string recognition. 216-219 - Janne Pylkkönen:
Investigations on discriminative training in large scale acoustic model estimation. 220-223 - Erik McDermott, Shinji Watanabe, Atsushi Nakamura:
Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training. 224-227 - Etienne Marcheret, Jia-Yu Chen, Petr Fousek, Peder A. Olsen, Vaibhava Goel:
Compacting discriminative feature space transforms for embedded devices. 228-231 - Hung-An Chang, James R. Glass:
A back-off discriminative acoustic model for automatic speech recognition. 232-235 - Junho Park, Frank Diehl, Mark J. F. Gales, Marcus Tomalin, Philip C. Woodland:
Efficient generation and use of MLP features for Arabic speech recognition. 236-239 - Xiaodong Cui, Jian Xue, Bing Xiang, Bowen Zhou:
A study of bootstrapping with multiple acoustic features for improved automatic speech recognition. 240-243 - Scott Novotney, Richard M. Schwartz:
Analysis of low-resource acoustic model self-training. 244-247 - Björn Hoffmeister, Ruoying Liang, Ralf Schlüter, Hermann Ney:
Log-linear model combination with word-dependent scaling factors. 248-251
Spoken Dialogue Systems
- Kyoko Matsuyama, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Enabling a user to specify an item at any time during system enumeration - item identification for barge-in-able conversational dialogue systems. 252-255 - Tomoyuki Yamagata, Tetsuya Takiguchi, Yasuo Ariki:
System request detection in human conversation based on multi-resolution Gabor wavelet features. 256-259 - Stefan Schwärzler, Stefan Maier, Joachim Schenk, Frank Wallhoff, Gerhard Rigoll:
Using graphical models for mixed-initiative dialog management systems with realtime Policies. 260-263 - Shinya Fujie, Yoichi Matsuyama, Hikaru Taniyama, Tetsunori Kobayashi:
Conversation robot participating in and activating a group communication. 264-267 - Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, Hideki Kashioka, Satoshi Nakamura:
Recent advances in WFST-based dialog system. 268-271 - David Griol, Giuseppe Riccardi, Emilio Sanchis:
A statistical dialog manager for the LUNA project. 272-275 - Heriberto Cuayáhuitl, Juventino Montiel-Hernández:
A Policy-switching learning approach for adaptive spoken dialogue agents. 276-279 - Luis Fernando D'Haro, Ricardo de Córdoba, Rubén San Segundo, Javier Macías Guarasa, José Manuel Pardo:
Strategies for accelerating the design of dialogue applications using heuristic information from the backend database. 280-283 - Florian Pinault, Fabrice Lefèvre, Renato de Mori:
Feature-based summary space for stochastic dialogue modeling with hierarchical semantic frames. 284-287 - Rajesh Balchandran, Leonid Rachevsky, Larry Sansone:
Language modeling and dialog management for address recognition. 288-291 - Ea-Ee Jan, Hong-Kwang Kuo, Osamuyimen Stewart, David M. Lubensky:
A framework for rapid development of conversational natural language call routing systems for call centers. 292-295 - Jonas Beskow, Jens Edlund, Björn Granström, Joakim Gustafson, Gabriel Skantze, Helena Tobiasson:
The MonAMI reminder: a spoken dialogue system for face-to-face interaction. 296-299 - Julia Seebode, Stefan Schaffer, Ina Wechsung, Florian Metze:
Influence of training on direct and indirect measures for the evaluation of multimodal systems. 300-303 - Christine Kühnel, Benjamin Weiss, Sebastian Möller:
Talking heads for interacting with spoken dialog smart-home systems. 304-307 - Aki Kunikoshi, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose:
Speech generation from hand gestures based on space mapping. 308-311
Special Session: INTERSPEECH 2009 Emotion Challenge
- Björn W. Schuller, Stefan Steidl, Anton Batliner:
The INTERSPEECH 2009 emotion challenge. 312-315 - Santiago Planet, Ignasi Iriondo Sanz, Joan Claudi Socoró, Carlos Monzo, Jordi Adell:
GTM-URL contribution to the INTERSPEECH 2009 emotion challenge. 316-319 - Chi-Chun Lee, Emily Mower, Carlos Busso, Sungbok Lee, Shrikanth S. Narayanan:
Emotion recognition using a hierarchical binary decision tree approach. 320-323 - Elif Bozkurt, Engin Erzin, Çigdem Eroglu Erdem, A. Tanju Erdem:
Improving automatic emotion recognition from speech signals. 324-327 - Thurid Vogt, Elisabeth André:
Exploring the benefits of discretization of acoustic features for speech emotion recognition. 328-331 - Iker Luengo, Eva Navas, Inmaculada Hernáez:
Combining spectral and prosodic information for emotion recognition in the interspeech 2009 emotion challenge. 332-335 - Roberto Barra-Chicote, Fernando Fernández Martínez, Syaheerah L. Lutfi, Juan Manuel Lucas-Cuesta, Javier Macías Guarasa, Juan Manuel Montero, Rubén San Segundo, José Manuel Pardo:
Acoustic emotion recognition using dynamic Bayesian networks and multi-space distributions. 336-339 - Tim Polzehl, Shiva Sundaram, Hamed Ketabdar, Michael Wagner, Florian Metze:
Emotion classification in children's speech using fusion of acoustic and linguistic features. 340-343 - Pierre Dumouchel, Najim Dehak, Yazid Attabi, Réda Dehak, Narjès Boufaden:
Cepstral and long-term features for emotion recognition. 344-347 - Marcel Kockmann, Lukás Burget, Jan Cernocký:
Brno University of Technology system for Interspeech 2009 emotion challenge. 348-351
Automatic Speech Recognition: Language Models I, II
- Boulos Harb, Ciprian Chelba, Jeffrey Dean, Sanjay Ghemawat:
Back-off language model compression. 352-355 - Tobias Kaufmann, Thomas Ewender, Beat Pfister:
Improving broadcast news transcription with a precision grammar and discriminative reranking. 356-359 - Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Use of contexts in language model interpolation and adaptation. 360-363 - Jim L. Hieronymus, Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Exploiting Chinese character models to improve speech recognition performance. 364-367 - Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot:
Constraint selection for topic-based MDI adaptation of language models. 368-371 - Chuang-Hua Chueh, Jen-Tzung Chien:
Nonstationary latent Dirichlet allocation for speech recognition. 372-375 - Sopheap Seng, Laurent Besacier, Brigitte Bigi, Eric Castelli:
Multiple text segmentation for statistical language modeling. 2663-2666 - Denis Filimonov, Mary P. Harper:
Measuring tagging performance of a joint language model. 2667-2670 - Langzhou Chen, K. K. Chin, Kate M. Knill:
Improved language modelling using bag of word pairs. 2671-2674 - Frank Diehl, Mark J. F. Gales, Marcus Tomalin, Philip C. Woodland:
Morphological analysis and decomposition for Arabic speech-to-text systems. 2675-2678 - Amr El-Desoky, Christian Gollan, David Rybach, Ralf Schlüter, Hermann Ney:
Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR. 2679-2682 - Welly Naptali, Masatoshi Tsuchiya, Seiichi Nakagawa:
Topic dependent language model based on topic voting on noun history. 2683-2686 - Péter Mihajlik, Balázs Tarján, Zoltán Tüske, Tibor Fegyó:
Investigation of morph-based speech recognition improvements across speech genres. 2687-2690 - Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa:
Effective use of pause information in language modelling for speech recognition. 2691-2694 - Songfang Huang, Steve Renals:
A parallel training algorithm for hierarchical pi