Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
- Eric Sun ,
- Jinyu Li ,
- Yuxuan Hu ,
- Yimeng Zhu ,
- Long Zhou ,
- Jian Xue ,
- Peidong Wang ,
- Linquan Liu ,
- Shujie Liu ,
- Edward Lin ,
- Yifan Gong
We propose gated language experts to improve multilingual transformer transducer models without user’s language identification (LID) input during inference. We define gating mechanism and LID loss to let transformer experts learn language-dependent information, construct the multilingual transformer block with gated transformer experts and shared transformer layers, and apply linear experts to better regularize joint network. In addition, a curriculum training scheme is proposed to let LID guide gated experts serve their own languages better. Evaluated on English and Spanish bilingual task, our method achieves average 12.5% and 7.3% relative word error reductions over baseline bilingual and monolingual models, obtaining similar results to the upper-bound model trained and inferred with oracle LID. We further explore our method on trilingual, quadrilingual, and pentalingual models, and observe similar advantages as in bilingual models, demonstrating its easy extension to more languages.