default search action
ICASSP 2005: Philadelphia, Pennsylvania, USA
- 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05, Philadelphia, Pennsylvania, USA, March 18-23, 2005. IEEE 2005, ISBN 0-7803-8874-7
Volume 1
Voice Morphing
- Javier Latorre, Koji Iwano, Sadaoki Furui:
Polyglot Synthesis Using a Mixture of Monolingual Corpora. 1-4 - Ashish Verma, Arun Kumar:
Introducing Roughness in Individuality Transformation through Jitter Modeling and Modification. 5-8 - Tomoki Toda, Alan W. Black, Keiichi Tokuda:
Spectral Conversion Based on Maximum Likelihood Estimation Considering Global Variance of Converted Parameter. 9-12 - David Suendermann, Antonio Bonafonte, Hermann Ney, Harald Höge:
A Study on Residual Prediction Techniques for Voice Conversion. 13-16 - Patrick Perrot, Guido Aversano, Raphaël Blouet, Maurice Charbit, Gérard Chollet:
Voice Forgery Using ALISP: Indexation in a Client Memory. 17-20 - Long Qin, Gao Peng Chen, Zhen-Hua Ling, Li-Rong Dai:
An Improved Spectral and Prosodic Transformation Method in STRAIGHT-based Voice Conversion. 21-24
Spoken Language Understanding and Dialog
- Ryuichiro Higashinaka, Katsuhito Sudoh, Mikio Nakano:
Incorporating Discourse Features into Confidence Scoring of Intention Recognition Results in Spoken Dialogue Systems. 25-28 - Christian Raymond, Frédéric Béchet, Nathalie Camelin, Renato De Mori, Géraldine Damnati:
Semantic Interpretation With Error Correction. 29-32 - Gang Ji, Jeff A. Bilmes:
Dialog Act Tagging Using Graphical Models. 33-36 - Charles Lewis, Giuseppe Di Fabbrizio:
A Clarification Algorithm for Spoken Dialogue Systems. 37-40 - Gökhan Tür:
Model Adaptation For Spoken Language Understanding. 41-44 - Xiao Li, Asela Gunawardana, Alex Acero:
Unsupervised Semantic Intent Discovery from Call Log Acoustics. 45-48
Speech Perception and Psychacoustics
- Chiharu Morioka, Atsuko Kurashima, Akira Takahashi:
Proposal on Objective Speech Quality Assessment for Wideband IP Telephony. 49-52 - Qiang Fu, Mark A. Clements, Klaus Mewes:
Neural Cell Type Recognition Between Globus Pallidus Externus and Globus Pallidus Internus By Gaussian Mixture Modeling. 53-56 - Hitoshi Aoki, Akira Takahashi:
Analysis of Relationship Betweeen Overall Quality and Psychological Factors Affecting High-Quality Speech Communication Services. 57-60 - Maria Schuster, Elmar Nöth, Tino Haderlein, Stefan Steidl, Anton Batliner, Frank Rosanowski:
Can you Understand him? Let's Look at his Word Accuracy - Automatic Evaluation of Tracheoesophageal Speech. 61-64 - Marc A. Boillot, John G. Harris:
A Warped Bandwidth Expansion Filter. 65-68 - Sungyub Yoo, J. Robert Boston, John D. Durrant, Kristie Kovacyk, Stacey Karn, Susan Shaiman, Amro El-Jaroudi, Ching-Chung Li:
Relative Energy And Intelligibility Of Transient Speech Information. 69-72
Confidence Measures and Rejection Algorithms
- Enrico Bocchieri, Sarangarajan Parthasarathy:
Rejection Using Rank Statistics Based on HMM State Shortlists. 73-76 - Taeyoon Kim, Hanseok Ko:
Speaker Adaptive Confidence Scoring Using Bayesian Combining. 77-80 - Graham Greenland, Willy Wong, Hans Kunov:
Improving utterance verification using additional confidence measures in isolated speech recognition interfaces. 81-84 - Wai Kit Lo, Frank K. Soong:
Generalized Posterior Probability for Minimum Error Verification of Recognized Sentences. 85-88 - Soundararajan Srinivasan, DeLiang Wang:
Robust Speech Recognition by Integrating Speech Separation and Hypothesis Testing. 89-92 - Yue-wen Fu, Limin Du:
Combination of Multiple predictors to Improve Confidence Measure Based on Local Posterior Probabilities. 93-96
Discriminative Training
- Khe Chai Sim, Mark J. F. Gales:
Adaptation of Precision Matrix Models on Large Vocabulary Continuous Speech Recognition. 97-100 - Chaojun Liu, Hui Jiang, Xinwei Li:
Discriminative Training of CDHMMs for Maximum Relative Separation Margin. 101-104 - Mohamed Afify, Xinwei Li, Hui Jiang:
Statistical Performance Analysis of MCE/GPD Learning in Gaussian Classifiers and Hidden Markov Models. 105-108 - Lambert Mathias, Girija Yegnanarayanan, Jürgen Fritsch:
Discriminative Training of Acoustic Models Applied to Domains with Unreliable Transcripts. 109-112 - Erik McDermott, Shigeru Katagiri:
Minimum Classification Error for Large Scale Speech Recognition Tasks using Weighted Finite State Transducers. 113-116 - Bo Liu, Hui Jiang, Jian-Lai Zhou, Ren-Hua Wang:
Discriminative Training Based on the Criterion of Least Phone Competing Tokens for Large Vocabulary Speech Recognition. 117-120
Quantization and Quality Measurement
- Stephen So, Kuldip K. Paliwal:
Multi-Frame GMM-Based Block Quantisation of Line Spectral Frequencies for Wideband Speech Coding. 121-124 - Tiago H. Falk, Qingfeng Xu, Wai-Yip Chan:
Non-Intrusive GMM-Based Speech Quality Measurement. 125-128 - Stephen D. Voran:
A Multiple-Description PCM Speech Coder using Structured Dual Vector Quantizers. 129-132 - Minoru Kohata, Motoyuki Suzuki, Shozo Makino:
A New Segment Quantizer for Line Spectral Frequencies Using Lempel-Ziv Algorithm. 133-136 - Hiroyuki Ehara, Toshiyuki Morii, Masahiro Oshikiri, Koji Yoshida:
Predictive VQ for Bandwidth Scalable LSP Quantization. 137-140 - Yannis Agiomyrgiannakis, Yannis Stylianou:
Coding with Side Information Techniques for LSF Reconstruction in Voice Over IP. 141-144
Speech Enhancement with Noise Reduction
- Changhuai You, Soo Ngee Koh, Susanto Rahardja:
Signal Subspace Speech Enhancement for Audible Noise Reduction. 145-148 - Ning Ma, Martin Bouchard, Rafik A. Goubran:
A Wavelet Kalman Filter with Perceptual Masking for Speech Enhancement in Colored Noise. 149-152 - Richard C. Hendriks, Richard Heusdens, Jesper Jensen:
Adaptive Time Segmentation of Noisy Speech for Improved Speech Enhancement. 153-156 - Cyril Plapous, Claude Marro, Pascal Scalart:
Speech Enhancement Using Harmonic Regeneration. 157-160 - Zhong Lin, Rafik A. Goubran:
Instant Noise Estimation Using Fourier Transform of AMDF and Variable Start Minima Search. 161-164 - Guo-Hong Ding, Xia Wang, Yang Cao, Feng Ding, Yuezhong Tang:
Speech Enhancement Based on Speech Spectral Complex Gaussian Mixture Model. 165-168
Speaker Recognition Using Acoustic and Higher Level Features
- Andrew O. Hatch, Barbara Peskin, Andreas Stolcke:
Improved Phonetic Speaker Recognition Using Lattice Decoding. 169-172 - Sachin S. Kajarekar, Luciana Ferrer, Elizabeth Shriberg, M. Kemal Sönmez, Andreas Stolcke, Anand Venkataraman, Jing Zheng:
SRI's 2004 NIST Speaker Recognition Evaluation System. 173-176 - Douglas A. Reynolds, William M. Campbell, Terry T. Gleason, Carl Quillen, Douglas E. Sturim, Pedro A. Torres-Carrasquillo, André Adami:
The 2004 MIT Lincoln Laboratory Speaker Recognition System. 177-180 - Ka-Yee Leung, Man-Wai Mak, Man-Hung Siu, Sun-Yuan Kung:
Speaker Verification Using Adapted Articulatory Feature-based Conditional Pronunciation Modeling. 181-184 - Zi-He Chen, Yuan-Fu Liao, Yau-Tarng Juang:
Prosody Modeling and Eigen-Prosody Analysis for Robust Speaker Recognition. 185-188 - André Gustavo Adami:
Prosodic Modeling for Speaker Recognition Based on Sub-Band Energy Temporal Trajectories. 189-192
Large Vocabulary ASR
- Jeff Siu-Kei Au-Yeung, Chak-Fai Li, Man-Hung Siu:
Sub-phonetic Polynomial Segment Model for Large Vocabulary Continuous Speech Recognition. 193-196 - Olivier Siohan, Bhuvana Ramabhadran, Brian Kingsbury:
Contructing Ensembles of ASR Systems Using Randomized Decision Trees. 197-200 - Mike Schuster, Takaaki Hori:
Efficient Generation of high-order context-dependent Weighted Finite State Transducers for Speech Recognition. 201-204 - Hagen Soltau, Brian Kingsbury, Lidia Mangu, Daniel Povey, George Saon, Geoffrey Zweig:
The IBM 2004 Conversational Telephony System for Rich Transcription. 205-208 - Gunnar Evermann, Ho Yin Chan, Mark J. F. Gales, Bin Jia, David Mrva, Philip C. Woodland, Kai Yu:
Training LVCSR Systems on Thousands of Hours of Data. 209-212 - Mark Hasegawa-Johnson, James Baker, Sarah Borys, Ken Chen, Emily Coogan, Steven Greenberg, Amit Juneja, Katrin Kirchhoff, Karen Livescu, Srividya Mohan, Jennifer Muller, M. Kemal Sönmez, Tianyu Wang:
Landmark-Based Speech Recognition: Report of the 2004 Johns Hopkins Summer Workshop. 213-216
Novel Methods for Speech Analysis
- Venkatraman Atti, Andreas Spanias:
Speech Analysis by Estimating Perceptually Relevant Pole Locations. 217-220 - Steven M. Schimmel, Les E. Atlas:
Coherent Envelope Detection for Modulation Filtering of Speech. 221-224 - Kentaro Ishizuka, Hiroko Kato Solvang, Tomohiro Nakatani:
Speech Signal Analysis with Exponential Autoregressive Model. 225-228 - Robert W. Morris, Jon A. Arrowood, Mark A. Clements:
Comparison of Autoregressive Parameter Estimation Algorithms for Speech Processing and Recognition. 229-232 - Princy Dikshit, Stephen A. Zahorian, Shivaram Nagulapati:
An Algorithm for Locating Fundamental Frequency Markers in Speech Signals. 233-236 - Akira Sasou, Masataka Goto, Satoru Hayamizu, Kazuyo Tanaka:
An Auto-Regressive, Non-Stationary Excited Signal Parameter Estimation Method and an Evaluation of a Singing-Voice Recognition. 237-240
Noise Robust Speech Recognition
- Chen Yang, Frank K. Soong, Tan Lee:
Static and Dynamic Spectral Features: Their Noise Robustness and Optimal Weights for ASR. 241-244 - Weizhong Zhu, Douglas D. O'Shaughnessy:
Log-Energy Dynamic Range Normalizaton for Robust Speech Recognition. 245-248 - Jethran Guinness, Bhiksha Raj, Bent Schmidt-Nielsen, Lorenzo Turicchia, Rahul Sarpeshkar:
A Companding Front End for Noise-Robust Automatic Speech Recognition. 249-252 - Hemant Misra, Shajith Ikbal, Sunil Sivadas, Hervé Bourlard:
Multi-resolution Spectral Entropy Feature for Robust ASR. 253-256 - Masakiyo Fujimoto, Satoshi Nakamura:
Particle Filter Based Non-Stationary Noise Tracking for Robust Speech Recognition. 257-260 - Tor André Myrvoll, Satoshi Nakamura:
Online cepstral filtering using a sequential EM approach with Polyak averaging and feedback. 261-264
Prosody and Speech Synthesis
- Brian Langner, Alan W. Black:
Improving the Understandability of Speech Synthesis by Modeling Speech in Noise. 265-268 - Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan:
An Automatic Prosody Recognizer using a Coupled Multi-Stream Acoustic Model and a Syntactic-Prosodic Language Model. 269-272 - Yoko Kokenawa, Minoru Tsuzaki, Hiroaki Kato, Yoshinori Sagisaka:
F0 control characterization by perceptual impressions on speaking attitudes using Multiple Dimensional Scaling analysis. 273-276 - Shinsuke Sakai:
Additive Modeling of English F0 Contour for Speech Synthesis. 277-280 - Dan-Ning Jiang, Wei Zhang, Liqin Shen, Lianhong Cai:
Prosody Analysis and Modeling for Emotional Speech Synthesis. 281-284 - Jianfeng Li, Guoping Hu, Ren-Hua Wang, Li-Rong Dai:
Sliding Window Smoothing For Maximum Entropy Based Intonational Phrase Prediction In Chinese. 285-288 - Wentao Gu, Keikichi Hirose, Hiroya Fujisaki:
Identification and Synthesis of Cantonese Tones Based on the Command-Response Model for F0 Contour Generation. 289-292 - Joram Meron, Peter Veprek:
Compression of Exception Lexicons for Small Footprint Grapheme-To-Phoneme Conversion. 293-296 - Christina L. Bennett, Alan W. Black:
Prediction of Pronunciation Variations for Speech Synthesis: A Data-Driven Approach. 297-300 - Mitsuaki Isogai, Hideyuki Mizuno, Kazunori Mano:
Recording Script Design for Corpus-Based TTS System Based on Coverage of Various Phonetic Elements. 301-304 - Jilei Tian, Jani Nurminen, Imre Kiss:
Optimal Subset Selection from Text Databases. 305-308 - Jordi Adell, Antonio Bonafonte, Jon Ander Gómez, María José Castro:
Comparative study of Automatic Phone Segmentation methods for TTS. 309-312
General Topics in ASR
- Brian Delaney:
Increased Robustness Against Bit Errors for Distributed Speech Recognition in Wireless Environments. 313-316 - Stefan Steidl, Michael Levit, Anton Batliner, Elmar Nöth, Heinrich Niemann:
"Of All Things the Measure Is Man" : Automatic Classification of Emotions and Inter-Labeler Consistency. 317-320 - Lingyun Gu, John G. E. Harris, Rahul Shrivastav, Christine Sapienza:
Disordered Speech Evaluation Using Objective Quality Measures. 321-324 - Björn W. Schuller, Raquel Jiménez Villar, Gerhard Rigoll, Manfred K. Lang:
Meta-Classifiers in Acoustic and Linguistic Feature Fusion-Based Affect Recognition. 325-328 - Antonio M. Peinado, Angel M. Gomez, Victoria E. Sánchez, José L. Pérez-Córdoba, Antonio J. Rubio:
Packet Loss Concealment Based on VQ Replicas and MMSE Estimation Applied to Distributed Speech Recognition. 329-332 - Valentin Ion, Reinhold Haeb-Umbach:
A Comparison of Soft-Feature Distributed Speech Recognition with Candidate Codecs for Speech Enabled Mobile Services. 333-336 - Li Deng, Xiang Li, Dong Yu, Alex Acero:
A Hidden Trajectory Model with Bi-directional Target-Filtering: Cascaded vs. Integrated Implementation for Phonetic Recognition. 337-340 - Izhak Shafran, Mehryar Mohri:
A Comparison of Classifiers for Detecting Emotion from Speech. 341-344 - Alastair Bruce James, Ben Milner:
Soft Decoding of Temporal Derivatives for Robust Distributed Speech Recognition in Packet Loss. 345-348 - Xin Lei, Gang Ji, Tim Ng, Jeff A. Bilmes, Mari Ostendorf:
DBN-Based Multi-stream Models for Mandarin Toneme Recognition. 349-352 - Amaro A. de Lima, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda, Tadashi Kitamura, Fernando Gil Resende:
Sparse KPCA for Feature Extraction in Speech Recognition. 353-356 - Evan Ruzanski, John H. L. Hansen, James Meyerhoff, George Saviolakis, Michael Koenig:
Effects of Phoneme Characteristics on TEO Feature-based Automatic Stress Detection in Speech. 357-360
Speech Analysis and Synthesis
- Masatsune Tamura, Tatsuya Mizutani, Takehiko Kagoshima:
Scalable Concatenative Speech Synthesis Based on the Plural Unit Selection and Fusion Method. 361-364 - Junichi Yamagishi, Takao Kobayashi:
Adaptive Training for Hidden Semi-Markov Model. 365-368 - Mohammad Firouzmand, Laurent Girin:
Perceptually Weighted Long Term Modeling of Sinusoidal Speech Amplitude Trajectories. 369-372 - Toshiyuki Sekiya, Tetsunori Kobayashi:
Speech recognition in the blind condition based on multiple directivity patterns using a microphone array. 373-376 - Dagen Wang, Shrikanth S. Narayanan:
An Unsupervised Quantitative Measure for Word Prominence in Spontaneous Speech. 377-380 - Kostas Kokkinakis, Asoke K. Nandi:
Speech Modelling Based On Generalized Gaussian Probability Density Functions. 381-384 - Guo Chen, Vijay Parsa:
Bayesian Model Based Non-Intrusive Speech Quality Evaluation. 385-388 - Celia Shahnaz, Wei-Ping Zhu, M. Omair Ahmad:
Robust Pitch Estimation At Very Low SNR Exploiting Time and Frequency Domain Cues. 389-392 - Laurence Cnockaert, Francis Grenez, Jean Schoentgen:
Fundamental Frequency Estimation and Vocal Tremor Analysis by means of Morlet Wavelet Transforms. 393-396 - Anindya Sarkar, Thippur V. Sreenivas:
Automatic Speech Segmentation Using Average Level Crossing Rate Information. 397-400 - Van Tuan Pham, Gernot Kubin:
DWT-Based Phonetic Groups Classification Using Neural Networks. 401-404 - Francesco Gianfelici, Giorgio Biagetti, Paolo Crippa, Claudio Turchetti:
A Novel KLT Algorithm Optimized for Small Signal Sets. 405-408 - Yasser A. Mahgoub, Richard M. Dansereau:
Voicing-State Classification of Co-Channel Speech Using Nonlinear State-Space Reconstruction. 409-412 - Shrikanth S. Narayanan, Dagen Wang:
Speech Rate Estimation via Temporal Correlation and Selected Sub-Band Correlation. 413-416
Model-Based Robust Speech Recognition
- Xianyu Zhao, Zhijian Ou, Minhua Chen, Zuoying Wang:
Closely Coupled Array Processing and Model-Based Compensation for Microphone Array Speech Recognition. 417-420 - Daniel Willett:
Context-Dependent Duration Modeling. 421-424 - André Coy, Jon Barker:
Recognising Speech in the Presence of a Competing Speaker using a 'Speech Fragment Decoder'. 425-428 - Jian Wu, Qiang Huo, Donglai Zhu:
An Environment Compensated Maximum Likelihood Training Approach Based on Stochastic Vector Mapping. 429-432 - Veronique Stouten, Hugo Van hamme, Patrick Wambacq:
Effect of Phase-Sensitive Environment Model and Higher Order VTS on Noisy Speech Feature Enhancement. 433-436 - Pamornpol Jinachitra, Ramon Prieto:
Towards Speech Recognition Oriented Dereverberation. 437-440 - Zhipeng Zhang, Sadaoki Furui:
Noisy Speech Recognition Based on Robust End-point Detection and Model Adaptation. 441-444 - Hiroshi Fujimura, Chiyomi Miyajima, Katsunobu Itou, Kazuya Takeda, Fumitada Itakura:
Analysis of a large in-car speech corpus and its application to the multimodel ASR. 445-448 - Goshu Nagino, Makoto Shozakai:
Building an Effective Corpus By Using Acoustic Space Visualization (COSMOS) Method. 449-452 - Shajith Ikbal, Hervé Bourlard, Mathew Magimai-Doss:
HMM/ANN Based Spectral Peak Location Estimation for Noise Robust Speech Recognition. 453-456 - András Zolnay, Ralf Schlüter, Hermann Ney:
Acoustic Feature Combination for Robust Speech Recognition. 457-460 - Stavros Tsakalidis, William Byrne:
Acoustic Training from Heterogeneous Data Sources: Experiments in Mandarin Conversational Telephone Speech Transcription. 461-464
Speech Mining and Audio-Visual Information Processing
- Kishan Thambiratnam, Sridha Sridharan:
Dynamic Match Phone-Lattice Searches For Very Fast And Accurate Unrestricted Vocabulary Keyword Spotting. 465-468 - Satoshi Tamura, Koji Iwano, Sadaoki Furui:
A Stream-Weight Optimization Method for Multi-Stream HMMS Based on Likelihood Value Normalization. 469-472 - Jesus F. Guitarte Perez, Alejandro F. Frangi, Eduardo Lleida-Solano, Klaus Lukas:
Lip Reading for Robust Speech Recognition on Embedded Devices. 473-476 - Simon Tucker, Steve Whittaker:
Novel Techniques For Time-Compressing Speech: An Exploratory Study. 477-480 - Peng Yu, Frank Seide:
Fast Two-Stage Vocabulary-Independent Search In Spontaneous Speech. 481-484 - Takafumi Koshinaka, Ken-ichi Iso, Akitoshi Okumura:
An HMM-based Text Segmentation Method Using Variational Bayes Approach and Its Application to LVCSR for Broadcast News. 485-488 - Daniel Gatica-Perez, Iain McCowan, Dong Zhang, Samy Bengio:
Detecting Group Interest-Level in Meetings. 489-492