Recent Progresses in Deep Learning Based Acoustic Models

IEEE/CAA JOURNAL OF AUTOMATICA SINICA |

In this paper, we summarize recent progresses made
in deep learning based acoustic models and the motivation and
insights behind the surveyed techniques. We first discuss models
such as recurrent neural networks (RNNs) and convolutional
neural networks (CNNs) that can effectively exploit variablelength
contextual information, and their various combination
with other models. We then describe models that are optimized
end-to-end and emphasize on feature representations learned
jointly with the rest of the system, the connectionist temporal
classification (CTC) criterion, and the attention-based sequenceto-
sequence translation model. We further illustrate robustness
issues in speech recognition systems, and discuss acoustic model
adaptation, speech enhancement and separation, and robust
training strategies. We also cover modeling techniques that lead
to more efficient decoding and discuss possible future directions
in acoustic model research.
Index Terms—Attention model