


default search action
12th SSW 2023: Grenoble, France
- Gérard Bailly, Thomas Hueber, Damien Lolive, Nicolas Obin, Olivier Perrotin:

12th ISCA Speech Synthesis Workshop, SSW 2023, Grenoble, France, August 26-28, 2023. ISCA 2023
Orals 1: TTS input
- Gérard Bailly, Martin Lenglet, Olivier Perrotin, Esther Klabbers:

Advocating for text input in multi-speaker text-to-speech systems. 1-7 - Jason Fong, Hao Tang, Simon King:

Spell4TTS: Acoustically-informed spellings for improving text-to-speech pronunciations. 8-13 - Marcel Granero Moya, Penny Karanasou, Sri Karlapati, Bastian Schnell, Nicole Peinelt, Alexis Moinet, Thomas Drugman:

A Comparative Analysis of Pretrained Language Models for Text-to-Speech. 14-20 - Phat Do

, Matt Coler
, Jelske Dijkstra
, Esther Klabbers:
Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection. 21-26
Orals 2: Evaluation
- Lev Finkelstein, Joshua Camp, Rob Clark:

Importance of Human Factors in Text-To-Speech Evaluations. 27-33 - Fritz Seebauer

, Michael Kuhlmann, Reinhold Haeb-Umbach, Petra Wagner
:
Re-examining the quality dimensions of synthetic speech. 34-40 - Ambika Kirkland, Shivam Mehta, Harm Lameris, Gustav Eje Henter, Éva Székely, Joakim Gustafson:

Stuck in the MOS pit: A critical analysis of MOS test methodology in TTS evaluation. 41-47 - Ondrej Plátek, Ondrej Dusek:

MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module. 48-54
Orals 3: Beyond text-to-speech
- Johannes A. Louw

:
Cross-lingual transfer using phonological features for resource-scarce text-to-speech. 55-61 - Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari:

Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion. 62-68 - Harm Lameris, Ambika Kirkland, Joakim Gustafson, Éva Székely:

Situating Speech Synthesis: Investigating Contextual Factors in the Evaluation of Conversational TTS. 69-74 - Johannah O'Mahony, Catherine Lai, Simon King:

Synthesising turn-taking cues using natural conversational data. 75-80
Orals 4: Voice conversion
- Arnab Das

, Suhita Ghosh, Tim Polzehl, Ingo Siegert, Sebastian Stober:
StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings. 81-87 - Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko:

PRVAE-VC: Non-Parallel Many-to-Many Voice Conversion with Perturbation-Resistant Variational Autoencoder. 88-93 - Ryunosuke Hirai, Yuki Saito, Hiroshi Saruwatari:

Federated Learning for Human-in-the-Loop Many-to-Many Voice Conversion. 94-99 - Anton Kashkin, Ivan Karpukhin, Svyatoslav Shishkin:

HiFi-VC: High Quality ASR-based Voice Conversion. 100-105
Orals 5: Expressivity, emotion and styles
- Daria Diatlova, Vitalii Shutov:

EmoSpeech: guiding FastSpeech2 towards Emotional Text to Speech. 106-112 - Arnaud Joly, Marco Nicolis, Ekaterina Peterova, Alessandro Lombardi, Ammar Abbas, Arent van Korlaar, Aman Hussain, Parul Sharma, Alexis Moinet, Mateusz Lajszczak, Penny Karanasou, Antonio Bonafonte, Thomas Drugman, Elena Sokolova:

Controllable Emphasis with zero data for text-to-speech. 113-119 - Martin Lenglet, Olivier Perrotin, Gérard Bailly:

Local Style Tokens: Fine-Grained Prosodic Representations For TTS Expressive Control. 120-126 - Sofoklis Kakouros

, Juraj Simko, Martti Vainio
, Antti Suni:
Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody. 127-133
Orals 6: Long form, multimodal & multi-speaker TTS
- Adriana Stan, Johannah O'Mahony:

An analysis on the effects of speaker embedding choice in non auto-regressive TTS. 134-138 - Weicheng Zhang, Cheng-chieh Yeh, Will Beckman, Tuomo Raitio, Ramya Rasipuram, Ladan Golipour, David Winarsky:

Audiobook synthesis with long-form neural text-to-speech. 139-143 - Tuomo Raitio, Javier Latorre, Andrea Davis, Tuuli Morrill, Ladan Golipour:

Improving the quality of neural TTS using long-form content and multi-speaker multi-style modeling. 144-149 - Shivam Mehta, Siyang Wang, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter:

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis. 150-156
Posters SSW
- Haolin Chen, Philip N. Garner

:
Diffusion Transformer for Adaptive Text-to-Speech. 157-162 - Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely:

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis. 163-169 - David Guennec

, Lily Wadoux, Aghilas Sini
, Nelly Barbot
, Damien Lolive:
Voice Cloning: Training Speaker Selection with Limited Multi-Speaker Corpus. 170-176 - Ravi Shankar, Archana Venkataraman:

Adaptive Duration Modification of Speech using Masked Convolutional Networks and Open-Loop Time Warping. 177-183 - Jarod Duret, Yannick Estève, Titouan Parcollet:

Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data. 184-190 - Kishor Kayyar Lakshminarayana

, Christian Dittmar, Nicola Pia, Emanuël A. P. Habets:
Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination Tests. 191-196 - Sajad Shirali-Shahreza, Gerald Penn

:
Better Replacement for TTS Naturalness Evaluation. 197-203 - Mikey Elmers

, Éva Székely:
The Impact of Pause-Internal Phonetic Particles on Recall in Synthesized Lectures. 204-210 - Takenori Yoshimura, Takato Fujimoto, Keiichiro Oura, Keiichi Tokuda:

SPTK4: An Open-Source Software Toolkit for Speech Signal Processing. 211-217 - Lev Finkelstein, Chun-an Chan, Vincent Wan, Heiga Zen, Rob Clark:

FiPPiE: A Computationally Efficient Differentiable method for Estimating Fundamental Frequency From Spectrograms. 218-224 - Biel Tura Vecino, Adam Gabrys, Daniel Matwicki, Andrzej Pomirski

, Tom Iddon, Marius Cotescu, Jaime Lorenzo-Trueba:
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications. 225-229 - Ibrahim Ibrahimov

, Gábor Gosztolya, Tamás Gábor Csapó:
Data Augmentation Methods on Ultrasound Tongue Images for Articulation-to-Speech Synthesis. 230-235
Late breaking reports (not peer reviewed)
- Shaimaa Alwaisi, Mohammed Salah Al-Radhi, Géza Németh:

Universal Approach to Multilingual Multispeaker Child Speech SynthesisUniversal Approach to Multilingual Multispeaker Child Speech Synthesis. 236-237 - Seraphina Fong, Marco Matassoni, Gianluca Esposito, Alessio Brutti:

Towards Speaker-Independent Voice Conversion for Improving Dysarthric Speech Intelligibility. 238-239 - Maxime Jacquelin, Maeva Garnier, Laurent Girin, Rémy Vincent, Olivier Perrotin:

Exploring the multidimensional representation of individual speech acoustic parameters extracted by deep unsupervised models. 240-241 - Zhu Li, Xiyuan Gao, Shekhar Nayak, Matt Coler:

SarcasticSpeech: Speech Synthesis for Sarcasm in Low-Resource Scenarios. 242-243 - Nicholas Sanders, Korin Richmond:

Recovering Discrete Prosody Inputs via Invert-Classify. 244-245 - Atli Sigurgeirsson, Simon King:

Using a Large Language Model to Control Speaking Style for Expressive TTS. 246-247 - Emmett Strickland, Dana Aubakirova, Dorin Doncenco, Diego Torres, Marc Evrard:

NaijaTTS: A pitch-controllable TTS model for Nigerian Pidgin. 248-249

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














