Two-stage speech enhancement algorithm incorporating dual-channel convolution and improved Conformer
Author:
Affiliation:

1.School of Information and Communication, Guilin University of Electronic Technology,Guilin 541004, China; 2.Key Laboratory of Cognitive Radio and Information Processing, Ministry of Education, Guilin University of Electronic Technology, Guilin 541004, China; 3.GUET-Nanning E-Tech Research Institute Co., Ltd.,Nanning 530000, China

Clc Number:

TN912.35

  • Article
  • | |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    In order to solve the problems of insufficient extraction of key speech features and single model structure, a double-stage speech enhancement method incorporating multi-scale features and improved gated Conformer was proposed to solve the problems of insufficient extraction of key features of speech and single model structure. Firstly, in order to solve the problem of insufficient extraction of key features, a two-channel convolutional fusion module was proposed, which used two-dimensional convolutional multi-scale extraction of speech key information with different receptive fields, and combined with the gating mechanism to enhance the short-term and long-term sequence correlation of the network, so as to improve the speech enhancement effect of the model in complex environments. An improved Conformer is proposed, which uses time attention and frequency attention to model in the time and frequency domains respectively, and combines the dilated convolution module to efficiently extract local and global context information, so as to enhance the performance ability of the network in speech sequence modeling. Secondly, for the problem with a single model structure, a two-stage processing structure is adopted to deal with the complex problem step by step. In the first stage, the amplitude of the noise spectrum is received, the amplitude of the clean speech is preliminarily estimated, and the noise phase is reconstructed to obtain the rough complex spectrum. In the second stage, on the basis of the rough spectrum obtained in the first stage, more refined features were further extracted to enhance the detailed expression ability of the speech signal. Finally, the experimental results are carried out on the VoiceBank+DEMAND dataset, and the experimental results show that the objective evaluation index and short-term intelligibility of this model are increased by 50.25% and 3.26%, respectively, compared with the noisy voice, indicating that the proposed algorithm can improve the intelligibility of speech more effectively, and at the same time improve the overall quality of speech signals, and has strong noise reduction ability.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Online: April 10,2025
Article QR Code