Detection-Based ASR in the Automatic Speech Attribute Transcription Project

Proc. Interspeech |

We present methods of detector design in the Automatic Speech
Attribute Transcription project. This paper details the results of
a student-led, cross-site collaboration between Georgia Institute
of Technology, The Ohio State University and Rutgers University.
The work reported in this paper describes and evaluates the
detection-based ASR paradigm and discusses phonetic attribute
classes, methods of detecting framewise phonetic attributes and
methods of combining attribute detectors for ASR.
We use Multi-Layer Perceptrons, Hidden Markov Models
and Support Vector Machines to compute confidence scores for
several prescribed sets of phonetic attribute classes. We use Conditional
Random Fields (CRFs) and knowledge-based rescoring
of phone lattices to combine framewise detection scores for continuous
phone recognition on the TIMIT database. With CRFs,
we achieve a phone accuracy of 70.63%, outperforming the baseline
and enhanced HMM systems, by incorporating all of the attribute
detectors discussed in the paper.