SoundTRC: DNN-based Acoustic Target Region Control

  • Yuhang He ,
  • Andrew Markham ,
  • Okan Köpüklü

International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025) |

Related File

We propose a deep neural network based automatic acoustic target region control framework, where the goal is to maintain all the relevant speech in the designated target region while muting all speech outside the target region in a multi-speaker conferencing room. We discuss three target regions: angle, angle-distance and distance that reflect common region-based speech control requests, and further propose a unified target region encoding strategy to encode the three different target regions into discriminative and compact target region vector. We propose a unified Cross-Attention Transformer based deep neural network, which takes the mixed speech and corresponding target region description as input and outputs ideal ratio mask that is responsible of masking out speech that is outside of the target region while suppressing the noise simultaneously. We run experiments on both simulated shoe-box like 3D room scenes and photo-realistic and complex 3D room scenes, showing the advantage of our proposed framework.