default search action
INTERSPEECH 2013: Lyon, France
- Frédéric Bimbot, Christophe Cerisara, Cécile Fougeron, Guillaume Gravier, Lori Lamel, François Pellegrino, Pascal Perrier:
14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013, Lyon, France, August 25-29, 2013. ISCA 2013
Systems for Search/Retrieval of Speech Documents
- Xavier Anguera:
Information retrieval-based dynamic time warping. 1-5 - Dogan Can, Shrikanth S. Narayanan:
On the computation of document frequency statistics from spoken corpora using factor automata. 6-10 - Kouichi Katsurada, Seiichi Miura, Kheang Seng, Yurie Iribe, Tsuneo Nitta:
Acceleration of spoken term detection using a suffix array by assigning optimal threshold values to sub-keywords. 11-14 - Arindam Mandal, Julien van Hout, Yik-Cheung Tam, Vikramjit Mitra, Yun Lei, Jing Zheng, Dimitra Vergyri, Luciana Ferrer, Martin Graciarena, Andreas Kathol, Horacio Franco:
Strategies for high accuracy keyword detection in noisy channels. 15-19 - Alberto Abad, Luis Javier Rodríguez-Fuentes, Mikel Peñagarikano, Amparo Varona, Germán Bordel:
On the calibration and fusion of heterogeneous spoken term detection systems. 20-24 - Shiro Narumi, Kazuma Konno, Takuya Nakano, Yoshiaki Itoh, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee:
Intensive acoustic models constructed by integrating low-occurrence models for spoken term detection. 25-28
Speech Analysis I-IV
- John Kane, Irena Yanushevskaya, John Dalton, Christer Gobl, Ailbhe Ní Chasaide:
Using phonetic feature extraction to determine optimal speech regions for maximising the effectiveness of glottal source analysis. 29-33 - Hideki Kawahara, Masanori Morise, Tomoki Toda, Ryuichi Nisimura, Toshio Irino:
Beyond bandlimited sampling of speech spectral envelope imposed by the harmonic structure of voiced sounds. 34-38 - JeeSok Lee, Frank K. Soong, Hong-Goo Kang:
A source-filter based adaptive harmonic model and its application to speech prosody modification. 39-43 - K. Ramesh, S. R. Mahadeva Prasanna, D. Govind:
Detection of glottal opening instants using Hilbert envelope. 44-48 - Dhananjaya N. Gowda, Jouni Pohjalainen, Mikko Kurimo, Paavo Alku:
Robust formant detection using group delay function and stabilized weighted linear prediction. 49-53 - Thomas Hézard, Thomas Hélie, Boris Doval:
A source-filter separation algorithm for voiced sounds based on an exact anticausal/causal pole decomposition for the class of periodic signals. 54-58
Language and Dialect Recognition
- Weiwei Liu, Wei-Qiang Zhang, Zhiyi Li, Jia Liu:
Parallel absolute-relative feature based phonotactic language recognition. 59-63 - Mireia Díez, Amparo Varona, Mikel Peñagarikano, Luis Javier Rodríguez-Fuentes, Germán Bordel:
Dimensionality reduction of phone log-likelihood ratio features for spoken language recognition. 64-68 - Jeff Z. Ma, Bing Zhang, Spyros Matsoukas, Sri Harish Reddy Mallidi, Feipeng Li, Hynek Hermansky:
Improvements in language identification on the RATS noisy speech corpus. 69-73 - Mehdi Soufifar, Lukás Burget, Oldrich Plchot, Sandro Cumani, Jan Cernocký:
Regularized subspace n-gram model for phonotactic ivector extraction. 74-78 - Hamid Behravan, Ville Hautamäki, Tomi Kinnunen:
Foreign accent detection from spoken Finnish using i-vectors. 79-83 - Mitchell McLaren, Aaron Lawson, Yun Lei, Nicolas Scheffer:
Adaptive Gaussian backend for robust language identification. 84-88
ASR - Neural Networks
- Matthias Paulik:
Lattice-based training of bottleneck feature extraction neural networks. 89-93 - Jonas Gehring, Wonkyum Lee, Kevin Kilgour, Ian R. Lane, Yajie Miao, Alex Waibel:
Modular combination of deep neural networks for acoustic modeling. 94-98 - Shuo-Yiin Chang, Nelson Morgan:
Informative spectro-temporal bottleneck features for noise-robust speech recognition. 99-103 - Zhi-Jie Yan, Qiang Huo, Jian Xu:
A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR. 104-108 - Shakti P. Rath, Daniel Povey, Karel Veselý, Jan Cernocký:
Improved feature processing for deep neural networks. 109-113 - Oriol Vinyals, Nelson Morgan:
Deep vs. wide: depth on a budget for robust speech recognition. 114-118
Speech Acoustics
- Angelika Braun:
An early case of "VOT". 119-122 - Robert Allen Fox, Ewa Jacewicz, Jessica Hart:
Pitch pattern variations in three regional varieties of American English. 123-127 - Jean-Sylvain Liénard, Claude Barras:
Fine-grain voice strength estimation from vowel spectral cues. 128-132 - Elizabeth Godoy, Catherine Mayo, Yannis Stylianou:
Linking loudness increases in normal and lombard speech to decreasing vowel formant separation. 133-137 - Kunitoshi Motoki:
Three-dimensional rectangular vocal-tract model with asymmetric wall impedances. 138-142 - Manu Airaksinen, Brad H. Story, Paavo Alku:
Quasi closed phase analysis for glottal inverse filtering. 143-147
Paralinguistic Challenge (Special Session)
- Björn W. Schuller, Stefan Steidl, Anton Batliner, Alessandro Vinciarelli, Klaus R. Scherer, Fabien Ringeval, Mohamed Chetouani, Felix Weninger, Florian Eyben, Erik Marchi, Marcello Mortillaro, Hugues Salamin, Anna Polychroniou, Fabio Valente, Samuel Kim:
The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. 148-152 - Artur Janicki:
Non-linguistic vocalisation recognition based on hybrid GMM-SVM approach. 153-157 - Jieun Oh, Eunjoon Cho, Malcolm Slaney:
Characteristic contours of syllabic-level units in laughter. 158-162 - Teun F. Krikke, Khiet P. Truong:
Detection of nonverbal vocalizations using Gaussian mixture models: looking for fillers and laughter in conversational speech. 163-167 - Johannes Wagner, Florian Lingenfelser, Elisabeth André:
Using phonetic patterns for detecting social cues in natural conversations. 168-172 - Rahul Gupta, Kartik Audhkhasi, Sungbok Lee, Shrikanth S. Narayanan:
Paralinguistic event detection from speech using probabilistic time-series smoothing and masking. 173-177 - Gouzhen An, David Guy Brizan, Andrew Rosenberg:
Detecting laughter and filled pauses using syllable-based features. 178-181 - Daniel Bone, Theodora Chaspari, Kartik Audhkhasi, James Gibson, Andreas Tsiartas, Maarten Van Segbroeck, Ming Li, Sungbok Lee, Shrikanth S. Narayanan:
Classifying language-related developmental disorders from speech cues: the promise and the potential confounds. 182-186 - Katrin Kirchhoff, Yuzong Liu, Jeff A. Bilmes:
Classification of developmental disorders from speech signals using submodular feature selection. 187-190 - Meysam Asgari, Alireza Bayestehtashk, Izhak Shafran:
Robust and accurate features for detecting and diagnosing autism spectrum disorders. 191-194 - David Martínez González, Dayana Ribas, Eduardo Lleida, Alfonso Ortega, Antonio Miguel:
Suprasegmental information modelling for autism disorder spectrum and specific language impairment classification. 195-199 - Félix Grèzes, Justin Richards, Andrew Rosenberg:
Let me finish: automatic conflict detection using speaker overlap. 200-204 - Vidhyasaharan Sethu, Julien Epps, Eliathamby Ambikairajah, Haizhou Li:
GMM based speaker variability compensated system for interspeech 2013 compare emotion challenge. 205-209 - Okko Räsänen, Jouni Pohjalainen:
Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. 210-214 - Hung-yi Lee, Ting-Yao Hu, How Jing, Yun-Fan Chang, Yu Tsao, Yu-Cheng Kao, Tsang-Long Pao:
Ensemble of machine learning and acoustic segment model techniques for speech emotion and autism spectrum disorders recognition. 215-219 - Gábor Gosztolya, Róbert Busa-Fekete, László Tóth:
Detecting autism, emotions and social signals using adaboost. 220-224
Perception of Prosody
- Oliver Niebuhr:
Resistance is futile - the intonation between continuation rise and calling contour in German. 225-229 - Hansjörg Mixdorff, Oliver Niebuhr:
The influence of F0 contour continuity on prominence perception. 230-234 - Caroline L. Smith, Paul Edmunds:
Native English listeners' perceptions of prosody in L1 and L2 reading. 235-238 - Chiharu Tsurutani, Dean Luo:
Naturalness judgement of L2 Mandarin Chinese - does timing matter? 239-242 - Daniel Aalto, Juraj Simko, Martti Vainio:
Language background affects the strength of the pitch bias in a duration discrimination task. 243-247 - Margaret Zellers:
Pitch and lengthening as cues to turn transition in Swedish. 248-252 - Maria Paola Bissiri, Margaret Zellers:
Perception of glottalization in varying pitch contexts across languages. 253-257 - Michael Walsh, Katrin Schweitzer, Nadja Schauffler:
Exemplar-based pitch accent categorisation using the generalized context model. 258-262 - Bettina Braun, Yuki Asano:
Double contrast is signalled by prenuclear and nuclear accent types alone, not by f0-plateaux. 263-266 - Susana Correia, Sónia Frota, Joseph Butler, Marina Vigário:
Word stress perception in European Portuguese. 267-271 - Denis Arnold, Petra Wagner, R. Harald Baayen:
Using generalized additive models and random forests to model prosodic prominence in German. 272-276 - Hartmut R. Pfitzinger, Hansjörg Mixdorff:
Perceiving speech rate differences between natural and time-scale modified utterances. 277-281
Prosody, Phonetics of Language Varieties
- Plínio A. Barbosa, Anders Eriksson, Joel Åkesson:
On the robustness of some acoustic parameters for signalling word stress across styles in Brazilian Portuguese. 282-286 - Shao-Ren Lyu, Ho-hsien Pan:
Reexamine the sandhi rules and the merging tones in hakka language. 287-290 - Marija Tabain, Richard Beare, Andrew Butcher:
A preliminary spectral analysis of palatal and velar stop bursts in pitjantjatjara. 291-295 - Shakuntala Mahanta, A. I. Twaha:
Presentational focus realisation in nalbaria variety of assamese. 296-299 - Marisa Cruz, Sónia Frota:
On the relation between intonational phrasing and pitch accent distribution. evidence from European Portuguese varieties. 300-304 - Rena Nemoto, Martine Adda-Decker:
How are word-final schwas different in the north and south of france? 305-309 - Simone Ashby, Sílvia Barbosa, Catarina Silva, Paulino Fumo, José Pedro Ferreira:
Modeling postcolonial language varieties: challenges and lessons learned from mozambican Portuguese. 310-314 - Heete Sahkai, Mari-Liis Kalvik, Meelis Mihkla:
Prosody of contrastive focus in estonian. 315-319 - Thomas Kisler, Uwe D. Reichel:
Exploring the connection of acoustic and distinctive features. 320-324 - Conceição Cunha, Jonathan Harrington, Phil Hoole:
A physiological analysis of the tense/lax vowel contrast in two varieties of German. 325-329 - Einar Meister, Lya Meister:
Production of estonian quantity contrasts by native speakers of Finnish. 330-334 - Yohann Meynadier, Yulia Gaydina:
Aerodynamic and durational cues of phonological voicing in whisper. 335-339 - Uwe D. Reichel:
Information theoretic syllable structure and its relation to the c-center effect. 340-344 - Bistra Andreeva, William J. Barry, Jacques C. Koreman:
The bulgarian stressed and unstressed vowel system. a corpus study. 345-348
Speech Synthesis I. II
- Santitham Prom-on, Peter Birkholz, Yi Xu:
Training an articulatory synthesizer with continuous acoustic data. 349-353 - Géza Kiss, Jan P. H. van Santen:
Estimating speaker-specific intonation patterns using the linear alignment model. 354-358 - June Sig Sung, Doo Hwa Hong, Hyun Woo Koo, Nam Soo Kim:
Factored maximum likelihood kernelized regression for HMM-based singing voice synthesis. 359-363 - Shinnosuke Takamichi, Tomoki Toda, Yoshinori Shiga, Sakriani Sakti, Graham Neubig, Satoshi Nakamura:
Improvements to HMM-based speech synthesis based on parameter generation with rich context models. 364-368 - Toru Nakashika, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki:
Voice conversion in high-order eigen space using deep belief nets. 369-372 - Hanna Silén, Jani Nurminen, Elina Helander, Moncef Gabbouj:
Voice conversion for non-parallel datasets using dynamic kernel partial least squares regression. 373-377 - Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi:
A style control technique for singing voice synthesis based on multiple-regression HSMM. 378-382 - Florian Hinterleitner, Christoph Norrenbrock, Sebastian Möller, Ulrich Heute:
Predicting the quality of text-to-speech systems from a large-scale feature set. 383-387 - Jani Nurminen, Hanna Silén, Moncef Gabbouj:
Speaker-specific retraining for enhanced compression of unit selection text-to-speech databases. 388-391 - Mark A. Huckvale, Julian Leff, Geoff Williams:
Avatar therapy: an audio-visual dialogue system for treating auditory hallucinations. 392-396 - Prasanna Kumar Muthukumar, Alan W. Black, H. Timothy Bunnell:
Optimizations and fitting procedures for the liljencrants-fant model for statistical parametric speech synthesis. 397-401 - Dirk Hovy, Gopala Krishna Anumanchipalli, Alok Parlikar, Caroline Vaughn, Adam C. Lammert, Eduard H. Hovy, Alan W. Black:
Analysis and modeling of "focus" in context. 402-406
Perception, Dialectal Differences
- Thi Anh Xuan Tran, Viet Son Nguyen, Eric Castelli, René Carré:
Production and perception of pseudo-V1CV2 outside the vowel triangle: speech illusion effects. 407-411 - Maria Candea, Martine Adda-Decker, Lori Lamel:
Recent evolution of non-standard consonantal variants in French broadcast news. 412-416 - Frank Zimmerer, Rei Yasuda, Henning Reetz:
Architekt or archtekt? perception of devoiced vowels produced by Japanese speakers of German. 417-420 - Andrew R. Plummer, Lucie Ménard, Benjamin Munson, Mary E. Beckman:
Comparing vowel category response surfaces over age-varying maximal vowel spaces within and across language communities. 421-425 - Molly Babel, Grant McGuire:
Perceived vocal attractiveness across dialects is similar but not uniform. 426-430 - Hongyan Wang, Vincent J. van Heuven:
Mutual intelligibility of American, Chinese and Dutch-accented speakers of English tested by SUS and SPIN sentences. 431-435
Speech Enhancement - Single Channel
- Xugang Lu, Yu Tsao, Shigeki Matsuda, Chiori Hori:
Speech enhancement based on deep denoising autoencoder. 436-440 - Hiroshi Saruwatari, Suzumi Kanehara, Ryoichi Miyazaki, Kiyohiro Shikano, Kazunobu Kondo:
Musical noise analysis for Bayesian minimum mean-square error speech amplitude estimators based on higher-order statistics. 441-445 - Nikolay Lyubimov, Mikhail Kotov:
Non-negative matrix factorization with linear constraints for single-channel speech enhancement. 446-450 - Hung-Wei Tseng, Srikanth Vishnubhotla, Mingyi Hong, Xiangfeng Wang, Jinjun Xiao, Zhi-Quan Luo, Tao Zhang:
A single channel speech enhancement approach by combining statistical criterion and multi-frame sparse dictionary learning. 451-455 - Majid Mirbagheri, Yanbo Xu, Sahar Akram, Shihab A. Shamma:
Speech enhancement using convolutive nonnegative matrix factorization with cosparsity regularization. 456-459 - Matthew C. McCallum, Bernard J. Guillemin:
Joint stochastic-deterministic wiener filtering with recursive Bayesian estimation of deterministic speech. 460-464
Dialog Modeling
- Juha Knuuttila, Okko Räsänen, Unto K. Laine:
Automatic self-supervised learning of associations between speech and text. 465-469 - Lucie Daubigney, Matthieu Geist, Olivier Pietquin:
Particle swarm optimisation of spoken dialogue system strategies. 470-474 - Pierre Lison:
Model-based Bayesian reinforcement learning for dialogue management. 475-479 - Fabrizio Ghigi, M. Inés Torres, Raquel Justo, José-Miguel Benedí:
Evaluating spoken dialogue models under the interactive pattern recognition framework. 480-484 - Yun-Nung Chen, Florian Metze:
Multi-layer mutually reinforced random walk with hidden parameters for improved multi-party meeting summarization. 485-489 - Pei-hao Su, Yow-Bang Wang, Tsung-Hsien Wen, Tien-han Yu, Lin-Shan Lee:
A recursive dialogue game framework with optimal Policy offering personalized computer-assisted language learning. 490-494
ASR - Lexical, Prosodic and Cross/Multi-Lingual
- Stefan Hahn, Patrick Lehnen, Simon Wiesler, Ralf Schlüter, Hermann Ney:
Improving LVCSR with hidden conditional random fields for grapheme-to-phoneme conversion. 495-499 - Van Hai Do, Xiong Xiao, Engsiong Chng, Haizhou Li:
Context-dependent phone mapping for LVCSR of under-resourced languages. 500-504 - Ramya Rasipuram, Mathew Magimai-Doss:
Improving grapheme-based ASR by probabilistic lexical modeling approach. 505-509 - Petr Motlícek, David Imseng, Philip N. Garner:
Crosslingual tandem-SGMM: exploiting out-of-language data for acoustic model and feature level adaptation. 510-514 - Ngoc Thang Vu, Tanja Schultz:
Multilingual multilayer perceptron for rapid language adaptation between and across language families. 515-519 - Andrew Rosenberg:
Modeling prosodic sequences with k-means and dirichlet process GMMs. 520-524
Phonetic Convergence
- Antje Schweitzer, Natalie Lewandowski:
Convergence of articulation rate in spontaneous speech. 525-529 - Jennifer S. Pardo:
Phonetic convergence in shadowed speech: a comparison of perceptual and acoustic measures. 530-534 - Marcin Wlodarczak, Juraj Simko, Petra Wagner:
Pitch and duration as a basis for entrainment of overlapped speech onsets. 535-538 - Francesca Bonin, Céline De Looze, Sucheta Ghosh, Emer Gilmartin, Carl Vogel, Anna Polychroniou, Hugues Salamin, Alessandro Vinciarelli, Nick Campbell:
Investigating fine temporal dynamics of prosodic and lexical accommodation. 539-543 - Jeesun Kim, Ruben Demirdjian, Chris Davis:
Spontaneous and explicit speech imitation. 544-547 - Václav Jonás Podlipský, Sárka Simácková, Katerina Chládková:
Imitation interacts with one's second-language phonology but it does not operate cross-linguistically. 548-552
Speech Production, Acquisition and Development I, II
- Po-jen Hsieh:
Prosodic markings of semantic predictability in taiwan Mandarin. 553-557 - Rüdiger Hoffmann, Dieter Mehnert, Rolf Dietzel:
How did it work? historic phonetic devices explained by coeval photographs. 558-562 - Lea S. Kohtz, Oliver Niebuhr:
Eliciting speech with sentence lists - a critical evaluation with special emphasis on segmental anchoring. 563-567 - Yuguang Wang, Jianwu Dang, Xi Chen, Jianguo Wei, Hongcui Wang, Kiyoshi Honda:
An MRI-based acoustic study of Mandarin vowels. 568-571 - Daniel Hirst:
Melody metrics for prosodic typology: comparing English, French and Chinese. 572-576 - Michael I. Proctor, Louis Goldstein, Adam C. Lammert, Dani Byrd, Asterios Toutios, Shrikanth S. Narayanan:
Velic coordination in French nasals: a real-time magnetic resonance imaging study. 577-581 - Mark A. Huckvale, Amrita Sharma:
Learning to imitate adult speech with the KLAIR virtual infant. 582-586 - Jorge C. Lucero, Jean Schoentgen, Mara Behlau:
Physics-based synthesis of disordered voices. 587-591 - Sonia D'Apolito, Barbara Gili Fivela:
Place assimilation and articulatory strategies: the case of sibilant sequences in French as L1 and L2. 592-596 - Barbara Samlowski, Petra Wagner, Bernd Möbius:
Effects of lexical class and lemma frequency on German homographs. 597-601