Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion

Sittichai Jiampojamarn; Colin Cherry; Grzegorz Kondrak

Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion

Sittichai Jiampojamarn ,
Colin Cherry ,
Grzegorz Kondrak

Proceedings of ACL-08: HLT | June 2008

Published by Association for Computational Linguistics

Download BibTex

We present a discriminative structure prediction model for the letter-to-phoneme task, a crucial step in text-to-speech processing. Our method encompasses three tasks that have been previously handled separately: input segmentation, phoneme prediction, and sequence modeling. The key idea is online discriminative training, which updates parameters according to a comparison of the current system output to the desired output, allowing us to train all of our components together. By folding the three steps of a pipeline approach into a unified dynamic programming framework, we are able to achieve substantial performance gains. Our results surpass the current state-of-the-art on six publicly available data sets representing four different languages.