default search action
INTERSPEECH 2008: Brisbane, Australia
- 9th Annual Conference of the International Speech Communication Association, INTERSPEECH 2008, Brisbane, Australia, September 22-26, 2008. ISCA 2008
Keynote Sessions
- Hiroya Fujisaki:
In search of models in speech communication research. 1-10 - Abeer Alwan:
Dealing with limited and noisy data in ASR: a hybrid knowledge-based and statistical approach. 11-15 - Joaquin Gonzalez-Rodriguez:
Forensic automatic speaker recognition: fiction or science? 16-17 - Justine Cassell:
Modelling rapport in embodied conversational agents. 18-19
Segmentation and Classification
- Kyu Jeong Han, Shrikanth S. Narayanan:
Agglomerative hierarchical speaker clustering using incremental Gaussian mixture cluster modeling. 20-23 - Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman:
Weighted segmental k-means initialization for SOM-based speaker clustering. 24-27 - Shajith Ikbal, Karthik Visweswariah:
Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering. 28-31 - Kofi Boakye, Oriol Vinyals, Gerald Friedland:
Two's a crowd: improving speaker diarization by automatically identifying and excluding overlapped speech. 32-35 - Trung Hieu Nguyen, Engsiong Chng, Haizhou Li:
T-test distance and clustering criterion for speaker diarization. 36-39 - Deepu Vijayasenan, Fabio Valente, Hervé Bourlard:
Integration of TDOA features in information bottleneck framework for fast speaker diarization. 40-43
Speech Coding
- V. Ramasubramanian, D. Harish:
Low complexity near-optimal unit-selection algorithm for ultra low bit-rate speech coding based on n-best lattice and Viterbi search. 44 - Vaclav Eksler, Redwan Salami, Milan Jelinek:
A new fast algebraic fixed codebook search algorithm in CELP speech coding. 45-48 - Hao Xu, Changchun Bao:
A novel transcoding algorithm between 3GPP AMR-NB (7.95kbit/s) and ITU-t g.729a (8kbit/s). 49-52 - Amr H. Nour-Eldin, Peter Kabal:
Mel-frequency cepstral coefficient-based bandwidth extension of narrowband speech. 53-56 - Jean-Luc Garcia, Claude Marro, Balázs Kövesi:
A PCM coding noise reduction for ITU-t g.711.1. 57-60 - Marcel Wältermann, Kirstin Scholz, Sebastian Möller, Lu Huo, Alexander Raake, Ulrich Heute:
An instrumental measure for end-to-end speech transmission quality based on perceptual dimensions: framework and realization. 61-64
Human Conversation and Communication
- Benno Peters, Hartmut R. Pfitzinger:
Duration and F0 interval of utterance-final intonation contours in the perception of German sentence modality. 65-68 - Bettina Braun, Lara Tagliapietra, Anne Cutler:
Contrastive utterances make alternatives salient - cross-modal priming evidence. 69 - Masato Ishizaki, Yasuharu Den, Senshi Fukashiro:
Exploring a mechanism of speech sychronization using auditory delayed experiments. 70-73 - Heather Pon-Barry:
Prosodic manifestations of confidence and uncertainty in spoken language. 74-77 - Raquel Fernández, Matthew Frampton, John Dowding, Anish Adukuzhiyil, Patrick Ehlen, Stanley Peters:
Identifying relevant phrases to summarize decisions in spoken meetings. 78-81 - Kornel Laskowski, Tanja Schultz:
Recovering participant identities in meetings from a probabilistic description of vocal interaction. 82-85
OzPhon08 - Phonetics and Phonology of Australian Aboriginal Languages (Special Session)
- Janet Fletcher, Deborah Loakes, Andrew Butcher:
Coarticulation in nasal and lateral clusters in Warlpiri. 86-89 - Deborah Loakes, Andrew Butcher, Janet Fletcher, Hywel Stoakes:
Phonetically prestopped laterals in Australian languages: a preliminary investigation of Warlpiri. 90-93 - John Ingram, Mary Laughren, Jeff Chapman:
Connected speech processes in Warlpiri. 94 - Christina Pentland:
Consonant enhancement in Lamalama, an initial-dropping language of Cape York Peninsula, North Queensland. 95 - Myfany Turpin:
Text, rhythm and metrical form in an Aboriginal song series. 96-98
Acoustic Activity Detection, Pitch Tracking and Analysis
- Kentaro Ishizuka, Shoko Araki, Tatsuya Kawahara:
Statistical speech activity detection based on spatial power distribution for analyses of poster presentations. 99-102 - Sang-Ick Kang, Ji-Hyun Song, Kye-Hwan Lee, Yun-Sik Park, Joon-Hyuk Chang:
A statistical model-based voice activity detection employing minimum classification error technique. 103-106 - Hongfei Ding, Koichi Yamamoto, Masami Akamine:
Comparative evaluation of different methods for voice activity detection. 107-110 - Soheil Shafiee, Farshad Almasganj, Ayyoob Jafari:
Speech/non-speech segments detection based on chaotic and prosodic features. 111-114 - Christian Zieger, Maurizio Omologo:
Acoustic event classification using a distributed microphone network with a GMM/SVM combined algorithm. 115-118 - Yasunari Obuchi, Masahito Togami, Takashi Sumiyoshi:
Intentional voice command detection for completely hands-free speech interface in home environments. 119-122 - Taras Butko, Andrey Temko, Climent Nadeu, Cristian Canton-Ferrer:
Fusion of audio and video modalities for detection of acoustic events. 123-126 - Ron J. Weiss, Trausti T. Kristjansson:
DySANA: dynamic speech and noise adaptation for voice activity detection. 127-130 - Rico Petrick, Masashi Unoki, Anish Mittal, Carlos Segura, Rüdiger Hoffmann:
A comprehensive study on the effects of room reverberation on fundamental frequency estimation. 131-134 - Hussein Hussein, Matthias Wolff, Oliver Jokisch, Frank Duckhorn, Guntram Strecha, Rüdiger Hoffmann:
A hybrid speech signal based algorithm for pitch marking using finite state machines. 135-138 - Yasunori Ohishi, Hirokazu Kameoka, Kunio Kashino, Kazuya Takeda:
Parameter estimation method of F0 control model for singing voices. 139-142 - Srikanth Vishnubhotla, Carol Y. Espy-Wilson:
An algorithm for multi-pitch tracking in co-channel speech. 143-146 - Michael Wohlmayr, Franz Pernkopf:
Multipitch tracking using a factorial hidden Markov model. 147-150 - Ming Li, Chuan Cao, Di Wang, Ping Lu, Qiang Fu, Yonghong Yan:
Cochannel speech separation using multi-pitch estimation and model based voiced sequential grouping. 151-154 - Philippe Martin:
Crosscorrelation of adjacent spectra enhances fundamental frequency tracking. 155-158
Single- and Multichannel Speech Enhancement I, II
- Jirí Málek, Zbynek Koldovský, Jindrich Zdánský, Jan Nouza:
Enhancement of noisy speech recordings via blind source separation. 159-162 - Takaaki Ishibashi, Hidetoshi Nakashima, Hiromu Gotanda:
Studies on estimation of the number of sources in blind source separation. 163-166 - V. Ramasubramanian, Deepak Vijaywargi:
Speech enhancement based on hypothesized Wiener filtering. 167-170 - Junfeng Li, Hui Jiang, Masato Akagi:
Psychoacoustically-motivated adaptive β-order generalized spectral subtraction based on data-driven optimization. 171-174 - Krishna Nand K., T. V. Sreenivas:
Two stage iterative Wiener filtering for speech enhancement. 175-178 - Pei Ding, Jie Hao:
Assessment of correlation between objective measures and speech recognition performance in the evaluation of speech enhancement. 179-182
Spoken Language Systems I, II
- Kazunori Komatani, Tatsuya Kawahara, Hiroshi G. Okuno:
Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems. 183-186 - Masaki Katsumaru, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Expanding vocabulary for recognizing user's abbreviations of proper nouns without increasing ASR error rates in spoken dialogue systems. 187-190 - Jason D. Williams:
Exploiting the ASR n-best by tracking multiple dialog state hypotheses. 191-194 - Enes Makalic, Ingrid Zukerman, Michael Niemann:
A spoken language interpretation component for a robot dialogue system. 195-198 - Federico Cesari, Horacio Franco, Gregory K. Myers, Harry Bratt:
MUESLI: multiple utterance error correction for a spoken language interface. 199-202 - Sarah Conrod, Sara H. Basson, Dimitri Kanevsky:
Methods to optimize transcription of on-line media. 203-206 - Akinori Ito, Toyomi Meguro, Shozo Makino, Motoyuki Suzuki:
Discrimination of task-related words for vocabulary design of spoken dialog systems. 207-210 - Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, Hideki Kashioka, Satoshi Nakamura:
Dialog management using weighted finite-state transducers. 211-214 - Yoshitaka Yoshimi, Ryota Kakitsuba, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Probabilistic answer selection based on conditional random fields for spoken dialog system. 215-218 - Maxine Eskénazi, Alan W. Black, Antoine Raux, Brian Langner:
Let's go lab: a platform for evaluation of spoken dialog systems with real world users. 219 - Fernando Batista, Nuno J. Mamede, Isabel Trancoso:
The impact of language dynamics on the capitalization of broadcast news. 220-223 - Matthias Paulik, Alex Waibel:
Lightly supervised acoustic model training on EPPS recordings. 224-227 - Christophe Servan, Frédéric Béchet:
Fast call-classification system development without in-domain training data. 228-231 - Björn Hoffmeister, Ralf Schlüter, Hermann Ney:
iCNC and iROVER: the limits of improving system combination with classification? 232-235 - Stefan Hahn, Patrick Lehnen, Hermann Ney:
System combination for spoken language understanding. 236-239
Emotion and Expression I, II
- Tomoko Suzuki, Machiko Ikemoto, Tomoko Sano, Toshihiko Kinoshita:
Multidimensional features of emotional speech. 240 - Narjès Boufaden, Pierre Dumouchel:
Leveraging emotion detection using emotions from yes-no answers. 241-244 - Thomas John Millhouse, Dianna T. Kenny:
Vowel placement during operatic singing: 'come si parla' or 'aggiustamento'? 245-248 - Yumiko O. Kato, Yoshifumi Hirose, Takahiro Kamai:
Study on strained rough voice as a conveyer of rage. 249-252 - Mumtaz Begum, Raja Noor Ainon, Roziati Zainuddin, Zuraidah M. Don, Gerry Knowles:
Integrating rule and template-based approaches for emotional Malay speech synthesis. 253-256 - Carlos Busso, Shrikanth S. Narayanan:
The expression and perception of emotions: comparing assessments of self versus others. 257-260 - Emiel Krahmer, Marc Swerts:
On the role of acting skills for the collection of simulated emotional speech. 261-264 - Björn W. Schuller, Matthias Wimmer, Dejan Arsic, Tobias Moosmayr, Gerhard Rigoll:
Detection of security related affect and behaviour in passenger transport. 265-268
Automatic Speech Recognition: Acoustic Models I-III
- Jinyu Li, Zhi-Jie Yan, Chin-Hui Lee, Ren-Hua Wang:
Soft margin estimation with various separation levels for LVCSR. 269-272 - Georg Heigold, Patrick Lehnen, Ralf Schlüter, Hermann Ney:
On the equivalence of Gaussian and log-linear HMMs. 273-276 - Dimitri Kanevsky, Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo:
Generalization of extended baum-welch parameter estimation for discriminative training and decoding. 277-280 - Peng Liu, Frank K. Soong:
An ellipsoid constrained quadratic programming perspective to discriminative training of HMMs. 281-284 - Dong Yu, Li Deng, Yifan Gong, Alex Acero:
Discriminative training of variable-parameter HMMs for noise robust speech recognition. 285-288 - Jasha Droppo, Michael L. Seltzer, Alex Acero, Yu-Hsiang Bosco Chiu:
Towards a non-parametric acoustic model: an acoustic decision tree for observation probability calculation. 289-292
Accent and Language Identification
- Shona D'Arcy, Martin J. Russell:
Experiments with the ABI (accents of the british isles) speech corpus. 293-296 - Fabio Castaldo, Emanuele Dalmasso, Pietro Laface, Daniele Colibro, Claudio Vair:
Politecnico di Torino system for the 2007 NIST language recognition evaluation. 297-300 - Valiantsina Hubeika, Lukás Burget, Pavel Matejka, Petr Schwarz:
Discriminative training and channel compensation for acoustic language recognition. 301-304 - Tingyao Wu, Peter Karsmakers, Hugo Van hamme, Dirk Van Compernolle:
Comparison of variable selection methods and classifiers for native accent identification. 305-308 - William M. Campbell, Douglas E. Sturim, Pedro A. Torres-Carrasquillo, Douglas A. Reynolds:
A comparison of subspace feature-domain methods for language recognition. 309-312 - Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel:
Context-dependent phone models and models adaptation for phonotactic language recognition. 313-316
Emotion and Expression I, II
- Martijn Goudbeek, Jean-Philippe Goldman, Klaus R. Scherer:
Emotions and articulatory precision. 317 - Khiet P. Truong, Mark A. Neerincx, David A. van Leeuwen:
Assessing agreement of observer- and self-annotations in spontaneous multimodal emotion data. 318-321 - Yoshiko Arimoto, Hiromi Kawatsu, Sumio Ohno, Hitoshi Iida:
Emotion recognition in spontaneous emotional speech for anonymity-protected voice chat systems. 322-325 - Shaikh Mostafa Al Masum, M. Khademul Islam Molla, Keikichi Hirose:
Assigning suitable phrasal tones and pitch accents by sensing affective information from text to synthesize human-like speech. 326-329 - Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl:
Cross-language study of vocal correlates of affective states. 330-333 - Marc Swerts, Emiel Krahmer:
Gender-related differences in the production and perception of emotion. 334-337
Special Session: PANZE 2008 - Phonetics and Phonology of Australian and New Zealand English
- Catherine Inez Watson, Margaret Maclagan, Jeanette King, Ray Harlow:
The English pronunciation of successive groups of Maori speakers. 338-341 - Felicity Cox, Sallyanne Palethorpe:
Reversal of short front vowel raising in Australian English. 342-345 - Jennifer Price:
GOOSE on the move: a study of /u/-fronting in Australian news speech. 346 - Andrew Butcher, Victoria Anderson:
The vowels of Australian Aboriginal English. 347-350 - Robert H. Mannell:
Perception and production of /i: /, /i@/ and /e: / in australian English. 351-354
Speaker Recognition and Diarisation
- Zbynek Zajíc, Lukás Machlica, Ales Padrta, Jan Vanek, Vlasta Radová:
An expert system in speaker verification task. 355-358 - David Dean, Sridha Sridharan, Patrick Lucey:
Cascading appearance-based features for visual speaker verification. 359-362 - Konstantin Markov, Satoshi Nakamura:
Improved novelty detection for online GMM based speaker diarization. 363-366 - Salah Eddine Mezaache, Jean-François Bonastre, Driss Matrouf:
Analysis of impostor tests with high scores in NIST-SRE context. 367-370 - Anthony Larcher, Jean-François Bonastre, John S. D. Mason:
Reinforced temporal structure information for embedded utterance-based speaker recognition. 371-374 - Michael Gerber, Beat Pfister:
Fast search for common segments in speech signals for speaker verification. 375-378 - Girija Chetty, Michael Wagner:
Audio-visual multilevel fusion for speech and speaker recognition. 379-382 - Jordi Luque, Carlos Segura, Javier Hernando:
Clustering initialization based on spatial information for speaker diarization of meetings. 383-386
Single- and Multichannel Speech Enhancement I, II
- James G. Lyons, Kuldip K. Paliwal:
Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement. 387-390 - Stephen So, Kuldip K. Paliwal:
A long state vector kalman filter for speech enhancement. 391-394 - Achintya Kundu, Saikat Chatterjee, T. V. Sreenivas:
Subspace based speech enhancement using Gaussian mixture model. 395-398 - Amit Das, John H. L. Hansen:
Generalized parametric spectral subtraction using weighted Euclidean distortion. 399-402 - Nobuyuki Miyake, Tetsuya Takiguchi, Yasuo Ariki:
Sudden noise reduction based on GMM with noise power estimation. 403-406 - Md. Jahangir Alam, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy, Sofia Ben Jebara:
Speech enhancement using a wiener denoising technique and musical noise reduction. 407-410 - Kevin W. Wilson, Bhiksha Raj, Paris Smaragdis:
Regularized non-negative matrix factorization with temporal dependencies for speech denoising. 411-414 - Xin Zou, Peter Jancovic, Münevver Köküer, Martin J. Russell:
ICA-based MAP speech enhancement with multiple variable speech distribution models. 415-418 - Ron J. Weiss, Michael I. Mandel, Daniel P. W. Ellis:
Source separation based on binaural cues and source model constraints. 419-422 - Ken'ichi Kumatani, John W. McDonough, Barbara Rauch, Philip N. Garner, Weifeng Li, John Dines:
Maximum kurtosis beamforming with the generalized sidelobe canceller. 423-426 - Ken'ichi Furuya, Akitoshi Kataoka, Youichi Haneda:
Noise robust speech dereverberation using constrained inverse filter. 427-430 - Mohsen Rahmani, Ahmad Akbari, Beghdad Ayad:
A dual microphone coherence based method for speech enhancement in headsets. 431-434 - Ivan Tashev, Slavy Mihov, Tyler Gleghorn, Alex Acero:
Sound capture system and spatial filter for small devices. 435-438 - Ning Cheng, Wenju Liu, Peng Li, Bo Xu:
An effective microphone array post-filter in arbitrary environments. 439-442 - Kook Cho, Hajime Okumura, Takanobu Nishiura, Yoichi Yamashita:
Localization of multiple sound sources based on inter-channel correlation using a distributed microphone system. 443-446 - Heng Zhang, Qiang Fu, Yonghong Yan:
A frequency domain approach for speech enhancement with directionality using compact microphone array. 447-450
Spoken Language Systems I, II
- Shota Takeuchi, Tobias Cincarek, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Question and answer database optimization using speech recognition results. 451-454 - Hiroshi Saruwatari, Yu Takahashi, Hiroyuki Sakai, Shota Takeuchi, Tobias Cincarek, Hiromichi Kawanami, Kiyohiro Shikano:
Development and evaluation of hands-free spoken dialogue system for railway station guidance. 455-458 - Amanda J. Stent, Srinivas Bangalore:
Statistical shared plan-based dialog management. 459-462 - Ota Herm, Alexander Schmitt, Jackson Liscombe:
When calls go wrong: how to detect problematic calls based on log-files and emotions? 463-466 - Daniel Gillick, Dilek Hakkani-Tür, Michael Levit:
Unsupervised learning of edit parameters for matching name variants. 467-470 - Mert Cevik, Fuliang Weng, Chin-Hui Lee:
Detection of repetitions in spontaneous speech in dialogue sessions. 471-474 - Nathalie Camelin, Géraldine Damnati, Frédéric Béchet, Renato de Mori:
Automatic customer feedback processing: alarm detection in open question spoken messages. 475-478 - Mithun Balakrishna, Marta Tatu, Dan I. Moldovan:
Minimal training based semantic categorization in a voice activated question answering (VAQA) system. 479-482