5th ISCA Speech Synthesis Workshop
14th-16th June 2004, Carnegie Mellon University
     Pittsburgh the City with TTS at its heart     

Home
Workshop Proceedings
Accomodation
Directions
Network Access
Paper Submission

Program Proceedings

The full workshop proceedings is available as single .pdf file: ssw5_proceedings.pdf (10MB), the individual papers and linked into the program below.

  • Monday Tuesday Wednesday
  • Monday 14th
    • 10:00- Registration opens
    • 1:00-1:30 Welcome and Opening Announcements
    • 1:30-3:00 Oral Session 1
      • 1:30-2:00 1025 "Assessing the Acceptability of the Smartkom Speech Synthesis Voices" by Antje Schweitzer, Norbert Braunschweiler, Grzegorz Dogil, Bernd Mobius
      • 2:00-2:30 1027 "Subjective Evaluation of Join Cost and Smoothing Methods", by Jithendra Vepa, Simon King
      • 2:30-3:00 1039 "Optionality in Evaluating Prosody Prediction", by Erwin Marsi
    • 3:00-3:30 Coffee
    • 3:30-5:00 Oral Session 2
      • 3:30-4:00 1041 "Accurate Spectral Envelope Estimation for Articulation-to-Speech Synthesis" by Yoshinori Shiga, Simon King
      • 4:00-4:30 1052 "Formant Re-Synthesis of Dysarthric Speech" by Alexander Kain, Xiaochuan Niu, John-Paul Hosom, Qi Miao, Jan van Santen
      • 4:30-5:00 1064 "Mapping from Articulatory Movements to Vocal Tract Spectrum with Gaussian Mixture Model for Articulatory Speech Synthesis", by Tomoki Toda, Alan W Black, Keiichi Tokuda
    • 5:00-6:00 Short Paper Posters
      • 2025 "rVoice Studio and ActivePrompts" by Peter Rutten, David Talkin
      • 2026 "Language independent phoneme mapping for foreign TTS" by Leonardo Badino, Claudia Barolo, Silvia Quazza
      • 2027 "Towards emotional speech synthesis: a rule based approach" by Enrico Zovato, Alberto Pacchiotti, Silvia Quazza, Stefano Sandri
      • 2028 "Corporoa of Latin American Spanish for research in prosody and synthesis" by Alejandro Renato, Jose Alvarez
      • 2029 "CMU ARCTIC SPEECH DATABASES" by John Kominek, Alan W Black
      • 2030 "Forced Alignment for speech synthesis using duration and prosodic phrase breaks" by Arthur Toth
      • 2031 "Analysis of fundamental frequency contours of cantonese based on a command response model" by Wentao Gu, Hiroya Fujisaki, Keikichi Hirose
      • 2032 "Creating a Database of Speech in Noise for Unit Selection Synthesis" by Brian Langner, Alan W Black
    • 6:30 Dinner
  • Tuesday 15th June
    • 9:00-10:30 Lightning Talks: 5 Minutes each, topics TBA and volunteers requested
    • 10:30-11:00 Coffee
    • 11:00-12:30 Oral Session 3
      • 11:00-11:30 1032 "Using 5 ms segments in concatenative speech synthesis" by Toshio Hirai, Seiichi Tenpaku
      • 11:30-12:00 1033 "Unit Selection using pitch synchronous cross correlation for Japanese concatenative speech synthesis", by Nobuo Nukaga, Ryota Kamoshida, Kenji Nagamatsu
      • 12:00-12:30 1043 "Data-Driven Perceptually-Based Join Costs", Ann K. Syrdal, Alistair D. Conkie
    • 12:30-1:30 Lunch
    • 1:30-3:00 Poster Session 1
      • 1026 "Synthesising Contextually Appropriate Intonation in Limited Domains", by Rachel Baker, Robert A. J. Clark, Michael White
      • 1029 "Frisian TTS, an Example of Bootstrapping TTS for minority Languages", by Jelske Dijkstra, R.J.J.H. van Son, Louis C.W. Pols
      • 1065 "Unit Selection Voice for Amharic Using Festvox", by Sebsibe H/Mariam, S P Kishore, Alan W Black, Rohit Kumar, Rajeev Sangal
      • 1042 "Tools for the Development of a Hindi Speech Synthesis System", by A.G. Ramakrishnan, Kalika Bali, Partha Pratim Talukdar, N. Sridhar Krishna
      • 1038 "A concatenative Speech Synthesis Method using context dependent phoneme sequences with variable length as search units", by Hiroyuki Segi, Tohru Takagi
      • 1040 "Improving Pronunciation Dictionary Coverage Of Names By Modelling Spelling Variation", by Justin Fackrell, Wojciech Skut
      • 1044 "Improving TTS by higher agreement between predicted versus observed pronunciations", by Yeon-Jun Kim, Ann Syrdal, Matthias Jilka
      • 1046 "A Novel Discontinuity Metric for Unit Selection Text-to-Speech Synthesis", by Jerome R. Bellegarda
      • 1048 "Toward Phone Segmentation for Concatenative Speech Synthesis", by Jordi Adell, Antonio Bonafonte
      • 1050 "Voice Creation for Conversational Fairy-Tale Characters", by Kare Sjolander, Joakim Gustafson
    • 12:30-1:30 Lunch
    • 3:30-5:00 Oral Session 4
      • 3:30-4:00 1028 "Merging Data Driven and Rule Based Prosodic Models for Unit Selection TTS", by Matthew Aylett
      • 4:00-4:30 1037 "Estimating Phrase Curves in the General Superpositional Intonation Model" by Jan P.H. van Santen, Taniya Mishra, Esther Klabbers
      • 4:30-5:00 1045 "Intonation Modeling for TTS using a Joint Extraction and Prediction Approach" Pablo Daniel Aguero, Antonio Bonafonte
    • 5:00-6:00 Panel Session: "What do we need for better speech synthesis"
    • 6:30 Dinner
  • Wednesday 16th June
    • 9:00-9:45 Tutorial: Overview of Voice Building by Alan W Black
    • 9:45-10:30 Tutorial: Overview of Voice Conversion by Tomoki Toda
    • 10:30-11:00 Coffee
    • 11:00-12:30 Poster Session 2
      • 1053 "F0 Modeling with Multi-Layer Additive Modeling Based on a Statistical Learning Technique", by Shinsuke Sakai
      • 1054 "Impact of Durational Outlier Removal from Unit Selection Catalogs", by John Kominek, Alan W Black
      • 1055 "Corpus-Based Synthesis of Fundamental Frequency Contours with Various Speaking Styles from Text Using F0 Contour Generation Process Model", by Keikichi Hirose, Kentaro Sato, Nobuaki Minematsu
      • 1056 "Multi-Source Based Acoustic Model for Speech Synthesis", by Jianhua Tao, Yongguo Kang
      • 1047 "Festival 2 - Build Your Own General Purpose Unit Selection Speech Synthesiser", by Robert A. J. Clark, Korin Richmond, Simon King
      • 1057 "Ximera: A New TTS from ATR Based on Corpus-Based Technologies", by Hisashi Kawai, Tomoki Toda, Jinfu Ni, Minoru Tsuzaki, Keiichi Tokuda
      • 1060 "Prosodic Data Driven Modelling of a Narrative Style in Festival TTS", by Fabio Tesser, Piero Cosi, Carlo Drioli, Graziano Tisato
      • 1061 "An introduction of trajectory model into HMM-based speech synthesis", by Heiga Zen, Keiichi Tokuda, Tadashi Kitamura
      • 1063 "Duration Modeling of Indian languages Hindi and Telugu. ", by N. Sridhar Krishna, Hema A. Murthy.
      • 1066 "Prominence Prediction for Super-sentential Prosodic Modeling Based on a new Database", by Jason Y Zhang, Arthur R Toth, Kevyn Collins-Thompson, Alan W Black
      • 1070 "Aligning letters and phonemes for speech synthesis", by Robert Damper, Yannick Marchand, John-David Marsters, Alex Bazin
  • 12:30-1:30 Lunch
  • 1:30-3:00 Oral session 5
    • 1:30-2:00 1051 "Clustering of foot-based pitch contours in expressive speech", by Esther Klabbers, Jan P.H. van Santen
    • 2:00-2:30 1049 "A Corpus-Based Approach to <AHEM/> Expressive Speech Synthesis Authors" by E. Eide, A. Aaron, R. Bakis, W. Hamza, M, Picheny, J. Pitrelli
    • 2:30-3:00 1058 "Audovisual Text-to-Cued Speech Synthesis", by Guillaume Gibert, Gerard Bailly, Frederic Elisei
  • 3:00-4:00 Town Hall Meeting / Closing Remarks
CMU/LTI This page is maintained by Alan W Black (awb@cs.cmu.edu)
and Kevin A. Lenzo (lenzo@cepstral.com)
ISCA Soeech