default search action
ICASSP 2004: Montreal, Quebec, Canada
- 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2004, Montreal, Quebec, Canada, May 17-21, 2004. IEEE 2004, ISBN 0-7803-8484-9
Volume 1
Voice Conversion and Morphing Algorithms for TTS Systems
- Athanasios Mouchtaris, Jan Van der Spiegel, Paul Mueller:
Non-parallel training for voice conversion by maximum likelihood constrained adaptation. 1-4 - Junichi Yamagishi, Makoto Tachibana, Takashi Masuko, Takao Kobayashi:
Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis. 5-8 - Hui Ye, Steve J. Young:
High quality voice morphing. 9-12 - Hideki Kawahara, Hideki Banno, Toshio Irino, Parham Zolfaghari:
Algorithm amalgam: morphing waveform based methods, sinusoidal models and STRAIGHT. 13-16 - Matthias Eichner, Matthias Wolff, Rüdiger Hoffmann:
Voice characteristics conversion for TTS using reverse VTLN. 17-20 - Dimitrios Rentzos, Saeed Vaseghi, Qin Yan, Ching-Hsiang Ho:
Voice conversion through transformation of spectral and intonation features. 21-24
Modeling Approaches in Speaker Recognition
- Q. Y. Hong, Sam Kwong:
Discriminative training for speaker identification based on maximum model distance algorithm. 25-28 - Hiroyoshi Yamamoto, Yoshihiko Nankaku, Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura:
Parameter sharing and minimum classification error training of mixtures of factor analyzers for speaker identification. 29-32 - Qi Li:
Discovering relations among discriminative training objectives [speak recognition applications]. 33-36 - Patrick Kenny, Pierre Dumouchel:
Disentangling speaker and channel effects in speaker verification. 37-40 - Todor Ganchev, Nikos Fakotakis, Dimitris K. Tasoulis, Michael N. Vrahatis:
Generalized locally recurrent probabilistic neural networks for text-independent speaker verification. 41-44 - Siu Man Chan, Man-Hung Siu:
Discrimination power weighted subword-based speaker verification. 45-48
Distributed Speech Recognition
- Antonio Cardenal López, Laura Docío Fernández, Carmen García-Mateo:
Soft decoding strategies for distributed speech recognition over IP networks. 49-51 - Ruhi Sarikaya, Yuqing Gao, George Saon:
Fractional Fourier transform features for speech recognition. 52 - Tenkasi Ramabadran, Alexander Sorin, Michael J. McLaughlin, Dan Chazan, David Pearce, Ron Hoory:
The ETSI extended distributed speech recognition (DSR) standards: server-side speech reconstruction. 53-56 - Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg:
A subvector-based error concealment algorithm for speech recognition over mobile networks. 57-60 - Jin-Yu Li, Bo Liu, Ren-Hua Wang, Li-Rong Dai:
A complexity reduction of ETSI advanced front-end for DSR. 61-64 - Lionel Delphin-Poulat:
Robust speech recognition techniques evaluation for telephony server based in-car applications. 65-68 - Wei-Hao Hsu, Lin-Shan Lee:
Efficient and robust distributed speech recognition (DSR) over wireless fading channels: 2D-DCT compression, iterative bit allocation, short BCH code and interleaving. 69-72
Higher-Level Knowledge in Speaker Recognition
- William M. Campbell, Joseph P. Campbell, Douglas A. Reynolds, Douglas A. Jones, Timothy R. Leek:
High-level speaker verification with support vector machines. 73-76 - Nengheng Zheng, P. C. Ching:
Using Haar transformed vocal source information for automatic speaker recognition. 77-80 - Seiichi Nakagawa, Wei Zhang, Mitsuo Takahashi:
Text-independent speaker recognition by combining speaker-specific GMM with speaker adapted syllable-based HMM. 81-84 - Ka-Yee Leung, Man-Wai Mak, Sun-Yuan Kung:
Applying articulatory features to telephone-based speaker verification. 85-88 - Farhad Farahani, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Speaker identification using supra-segmental pitch pattern dynamics. 89-92 - Shi-Han Chen, Hsiao-Chuan Wang:
Improvement of speaker recognition by combining residual and prosodic features with acoustic features. 93-96
Pitch and Tone Based Speech Analysis
- Xu Shao, Ben Milner:
Pitch prediction from MFCC vectors for speech reconstruction. 97-100 - Elliot Moore, Mark Clements:
Algorithm for automatic glottal waveform estimation without the reliance on precise glottal closure information. 101-104 - Ye Tian, Jian-Lai Zhou, Min Chu, Eric Chang:
Tone recognition with fractionized models and outlined features. 105-108 - S. R. M. Prasanna, B. Yegnanarayana:
Extraction of pitch in adverse conditions. 109-112 - Luca Armani, Maurizio Omologo:
Weighted autocorrelation-based F0 estimation for distant-talking interaction with a distributed microphone network. 113-116 - Om Deshmukh, Jawahar Singh, Carol Y. Espy-Wilson:
A novel method for computation of periodicity, aperiodicity and pitch of speech signals. 117-120
Feature Analysis for Speech Recognition
- S. V. Bharath Kumar, Srinivasan Umesh, Rohit Sinha:
Non-uniform speaker normalization using affine-transformation. 121-124 - Donglai Zhu, Kuldip K. Paliwal:
Product of power spectrum and group delay function for speech recognition. 125-128 - Alexander Sorin, Tenkasi Ramabadran, Dan Chazan, Ron Hoory, Michael J. McLaughlin, David Pearce, Fan Wang, Yaxin Zhang:
The ETSI extended distributed speech recognition (DSR) standards: client side processing and tonal language recognition evaluation. 129-132 - Shantanu Chakrabartty, Yunbin Deng, Gert Cauwenberghs:
Robust speech feature extraction by growth transformation in reproducing kernel Hilbert space. 133-136 - Xiao-Bing Li, Jin-Yu Li, Ren-Hua Wang:
Dimensionality reduction using MCE-optimized LDA transformation. 137-140 - Kentaro Ishizuka, Noboru Miyazaki:
Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition. 141-144
Quantization Techniques in Speech Coding
- Yongwon Shin, Sangwon Kang, Thomas R. Fischer, Changyong Son, Yongbeom Lee:
Low-complexity predictive trellis coded quantization of wideband speech LSF parameters. 145-148 - Kuldip K. Paliwal, Stephen So:
Multiple frame block quantisation of line spectral frequencies using Gaussian mixture models. 149-152 - Jonas Lindblom, Per Hedelin:
Variable-dimension quantization of sinusoidal amplitudes using Gaussian mixture models. 153-156 - Fredrik Nordén, Thomas Eriksson:
On split quantization of LSF parameters. 157-160 - Ethan Robert Duni, Anand D. Subramaniam, Bhaskar D. Rao:
Improved quantization structures using generalized HMM modelling with application to wideband speech coding. 161-164 - Jonas Samuelsson:
Waveform quantization of speech using Gaussian mixture models. 165-168
Acoustic Modeling: New Search Features and Supervised Training
- Ram Sundaram, Joseph Picone:
Effects on transcription errors on supervised learning in speech recognition. 169-172 - Scott Axelrod, Benoît Maison:
Combination of hidden Markov models with dynamic time warping for speech recognition. 173-176 - Mathew Magimai-Doss, Samy Bengio, Hervé Bourlard:
Joint decoding for phoneme-grapheme continuous speech recognition. 177-180 - Mathias De Wachter, Kris Demuynck, Patrick Wambacq, Dirk Van Compernolle:
A locally weighted distance measure for example based speech recognition. 181-184 - Long Nguyen, Bing Xiang:
Light supervision in acoustic model training. 185-188 - Langzhou Chen, Lori Lamel, Jean-Luc Gauvain:
Lightly supervised acoustic model training using consensus networks. 189-192
Robust Features for Speech Recognition
- Hemant Misra, Shajith Ikbal, Hervé Bourlard, Hynek Hermansky:
Spectral entropy based feature for robust ASR. 193-196 - Chang-Wen Hsu, Lin-Shan Lee:
Higher order cepstral moment normalization (HOCMN) for robust speech recognition. 197-200 - Sid-Ahmed Selouani, Douglas D. O'Shaughnessy:
Robustness of speech recognition using genetic algorithms and a Mel-cepstral subspace approach. 201-204 - Shajith Ikbal, Hemant Misra, Hervé Bourlard, Hynek Hermansky:
Phase autocorrelation (PAC) features in entropy based multi-stream for robust speech recognition. 205-208 - Shingo Yoshizawa, Noboru Hayasaka, Naoya Wada, Yoshikazu Miyanaga:
Cepstral gain normalization for noise robust speech recognition. 209-212 - Hugo Van hamme:
Robust speech recognition using cepstral domain missing data techniques and noisy masks. 213-216
Multichannel Speech Enhancement
- Kostas Kokkinakis, Asoke K. Nandi:
Optimal blind separation of convolutive audio mixtures without temporal constraints. 217-220 - Jean-Marc Valin, Jean Rouat, François Michaud:
Microphone array post-filter for separation of simultaneous non-stationary sources. 221-224 - Tsuyoki Nishikawa, Hiroshi Abe, Hiroshi Saruwatari, Kiyohiro Shikano:
Overdetermined blind separation for convolutive mixtures of speech based on multistage ICA using subarray processing. 225-228 - Xianxian Zhang, John H. L. Hansen, Kathryn Hoberg Arehart:
Speech enhancement based on a combined multi-channel array with constrained iterative and auditory masked processing. 229-232 - Calvin Yiu-Kit Lai, Parham Aarabi:
Multiple-microphone time-varying filters for robust speech recognition. 233-236 - Martin Fuchs, Tim Haulick, Gerhard Schmidt:
Noise suppression for automotive applications based on directional information. 237-240
Language Modeling and Search
- Michiel Bacchiani, Brian Roark:
Meta-data conditional language modeling. 241-244 - Ahmad Emami, Frederick Jelinek:
Exact training of a neural syntactic language model. 245-248 - Gunnar Evermann, Ho Yin Chan, Mark J. F. Gales, Thomas Hain, Xunying Liu, David Mrva, Lan Wang, Philip C. Woodland:
Development of the 2003 CU-HTK conversational telephone speech transcription system. 249-252 - Frank Seide, Peng Yu, Chengyuan Ma, Eric Chang:
Vocabulary-independent search in spontaneous speech. 253-256 - Woosung Kim, Sanjeev Khudanpur:
Cross-lingual latent semantic analysis for language modeling. 257-260 - Wen Wang, Andreas Stolcke, Mary P. Harper:
The use of a linguistically motivated language model in conversational speech recognition. 261-264
Speech Coding for Networks / Single-Channel Speech Enhancement
- Roch Lefebvre, Philippe Gournay, Redwan Salami:
A study of design compromises for speech coders in packet networks. 265-268 - Jin-Kyu Choi, Chang-Heon Lee, Hong-Goo Kang, Young-Cheol Park, Dae Hee Youn:
Improvement issues on transcoding algorithms: for the flexible usage to the various pairs of speech codec. 269-272 - Balázs Kövesi, Dominique Massaloux, Aurélien Sollaud:
A scalable speech and audio coding scheme with continuous bitrate flexibility. 273-276 - Hui Dong, Allen Gersho, Jerry D. Gibson, Vladimir Cuperman:
A multiple description speech coder based on AMR-WB for mobile ad hoc networks. 277-280 - Milan Jelinek, Redwan Salami, Sassan Ahmadi, Bruno Bessette, Philippe Gournay, Claude Laflamme:
On the architecture of the cdma2000® variable-rate multimode wideband (VMR-WB) speech coding standard. 281-284 - Sung-Kyo Jung, Kyung-Tae Kim, Hong-Goo Kang:
A bit-rate/bandwidth scalable speech coder based on ITU-T G.723.1 standard. 285-288 - Cyril Plapous, Claude Marro, Laurent Mauuary, Pascal Scalart:
A two-step noise reduction technique. 289-292 - Israel Cohen:
On the decision-directed estimation approach of Ephraim and Malah. 293-296 - Saeed Gazor:
Employing Laplacian-Gaussian densities for speech enhancement. 297-300 - Marcel Gabrea:
Robust adaptive Kalman filtering-based speech enhancement algorithm. 301-304 - Sundarrajan Rangachari, Philipos C. Loizou, Yi Hu:
A noise estimation algorithm with rapid adaptation for highly nonstationary environments. 305-308 - Ningping Fan:
Low distortion speech denoising using an adaptive parametric Wiener filter. 309-312
Speaker Adaptation
- John W. McDonough, Alex Waibel:
Performance comparisons of all-pass transform adaptation with maximum likelihood linear regression. 313-316 - Kai Yu, Mark J. F. Gales:
Adaptive training using structured transforms. 317-320 - Lan Wang, Philip C. Woodland:
MPE-based discriminative linear transform for speaker adaptation. 321-324 - Brian Mak, James T. Kwok, Simon Ka-Lung Ho:
A study of various composite kernels for kernel eigenvoice speaker adaptation. 325-328 - George Saon, Satya Dharanipragada, Daniel Povey:
Feature space Gaussianization. 329-332 - Daben Liu, Francis Kubala:
Online speaker clustering. 333-336 - Xiaodong He, Yunxin Zhao:
Prior knowledge guided MEL based model selection and adaptation for nonnative speech recognition. 337-340 - Sabine Deligne, Satya Dharanipragada:
Enrollment in low-resource speech recognition systems. 341-344 - Srinivasan Umesh, Rohit Sinha, S. V. Bharath Kumar:
An investigation into front-end signal processing for speaker normalization. 345-348 - Xavier L. Aubert:
Eigen-MLLRs applied to unsupervised speaker enrollment for large vocabulary continuous speech recognition. 349-352 - Masafumi Nishida, Tatsuya Kawahara:
Speaker indexing and adaptation using speaker clustering based on statistical model selection. 353-356 - Vlasios Doumpiotis, Yonggang Deng:
Eigenspace-based MLLR with speaker adaptive training in large vocabulary conversational speech recognition. 357-360
Topics in Speaker and Language Recognition
- Nikki Mirghafori, Matthieu Hébert:
Parameterization of the score threshold for a text-dependent adaptive speaker verification system. 361-364 - Matthieu Hébert, Nikki Mirghafori:
Desperately seeking impostors: data-mining for competitive impostor testing in a text-dependent speaker verification system. 365-368 - Luis Pérez-Freire, Carmen García-Mateo:
A multimedia approach for audio segmentation in TV broadcast news. 369-372 - Daniel Moraru, Sylvain Meignier, Corinne Fredouille, Laurent Besacier, Jean-François Bonastre:
The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation. 373-376 - Waleed Fakhr, Ahmed Abdelsalam, Nadder Hamdy:
Enhancement of mismatched conditions in speaker recognition for multimedia applications. 377-380 - Chi-Jiun Shia, Yu-Hsien Chiu, Jia-Hsin Hsieh, Chung-Hsien Wu:
Language boundary detection and identification of mixed-language speech based on MAP estimation. 381-384 - Jorge Gutiérrez, Jean-Luc Rouas, Régine André-Obrecht:
Fusing language identification systems using performance confidence indexes. 385-388 - Mohamed Faouzi BenZeghiba, Hervé Bourlard:
Confidence measures in multiple pronunciations modeling for speaker verification. 389-392 - Pongtep Angkititrakul, John H. L. Hansen:
Identifying in-set and out-of-set speakers using neighborhood information. 393-396 - Sylvain Meignier, Daniel Moraru, Corinne Fredouille, Laurent Besacier, Jean-François Bonastre:
Benefits of prior acoustic segmentation for automatic speaker segmentation. 397-400 - Nagarajan Thangavelu, Hema A. Murthy:
Language identification using parallel syllable-like unit recognition. 401-404 - Samuel Kim, Thomas Eriksson, Hong-Goo Kang, Dae Hee Youn:
A pitch synchronous feature extraction method for speaker recognition. 405-408
Topics in Speech Understanding Systems
- Maximilian Bisani, Hermann Ney:
Bootstrap estimates for confidence intervals in ASR performance evaluation. 409-412 - Kuansan Wang:
A detection based approach to robust speech understanding. 413-416 - Srinivas Bangalore, Michael Johnston:
Robust multimodal understanding. 417-420 - Iker Arizmendi, Richard C. Rose:
A distributed framework for enterprise level speech recognition services. 421-424 - Christian Raymond, Frédéric Béchet, Renato De Mori, Géraldine Damnati, Yannick Estève:
Automatic learning of interpretation strategies for spoken dialogue systems. 425-428 - Dilek Hakkani-Tür, Gökhan Tür, Mazin G. Rahim, Giuseppe Riccardi:
Unsupervised and active learning in automatic speech recognition for call classification. 429-432 - Ryuichi Nisimura, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Public speech-oriented guidance system with adult and child discrimination capability. 433-436 - Gökhan Tür, Dilek Hakkani-Tür, Giuseppe Riccardi:
Extending boosting for call classification using word confusion networks. 437-440 - Alicia Abella, Jerry H. Wright, Allen L. Gorin:
Dialog trajectory analysis. 441-444 - Qiang Huang, Stephen J. Cox:
Improving phoneme recognition of telephone quality speech. 445-448 - Hiroaki Nanjo, Tasuku Kitade, Tatsuya Kawahara:
Automatic indexing of key sentences for lecture archives using statistics of presumed discourse markers. 449-452 - Shin-ya Ishikawa, Takahiro Ikeda, Kiyokazu Miki, Fumihiro Adachi, Ryosuke Isotani, Ken-ichi Iso, Akitoshi Okumura:
Speech-activated text retrieval system for multimodal cellular phones. 453-456
Topics in Speech Coding
- Volodya Grancharov, Jonas Samuelsson, W. Bastiaan Kleijn:
Noise-dependent postfiltering. 457-460 - Wei Zha, Wai-Yip Geoffrey Chan:
A data mining approach to objective speech quality measurement. 461-464 - Christoffer Rødbro, Jesper Jensen, Richard Heusdens:
Adaptive time-segmentation for speech coding with limited delay. 465-468 - Yannis Agiomyrgiannakis, Yannis Stylianou:
Combined estimation/coding of highband spectral envelopes for speech spectrum expansion. 469-472 - V. Ramasubramanian, Thippur V. Sreenivas:
Automatically derived units for segment vocoders. 473-476 - Kevin Brady, Thomas F. Quatieri, Joseph P. Campbell, William M. Campbell, Michael S. Brandstein, Clifford J. Weinstein:
Multisensor MELPe using parameter substitution. 477-480 - Masahiro Oshikiri, Hiroyuki Ehara, Koji Yoshida:
Efficient spectrum coding for super-wideband speech and its application to 7/10/15 kHz bandwidth scalable coders. 481-484 - Naveen Srinivasamurthy, Antonio Ortega, Shrikanth S. Narayanan:
Enhanced standard compliant distributed speech recognition (Aurora encoder) using rate allocation. 485-488 - Heping Ding:
Wideband audio over narrowband low-resolution media. 489-492 - Deep Sen: