Abstract:The safe and stable operation of substation equipment is paramount for power system reliability. In recent years, UAV inspection has emerged as a crucial maintenance tool in substations due to its efficiency and enhanced safety. However, inherent UAV noise, coupled with ambient environmental sounds, often mixes significantly with the vital acoustic signatures of operational equipment. This severe interference substantially hinders acoustic-based equipment status detection and early fault prognosis. To address this challenge and efficiently isolate substation equipment sounds from such complex mixtures, a multi-scale gated source separation network (GSN) model is proposed. The GSN model adopts an encoder-separator-decoder architecture: its encoder incorporates parallel multi-scale 1D depthwise separable convolutions to capture rich features across various temporal scales; the separator constructs a dual-path structure, comprising a local temporal modeling and a global contextual modeling, integrating their outputs via a gated fusion mechanism; the decoder employs layer-by-layer 1D transposed convolutions with skip connections to reconstruct the timedomain signal. Experimental validation was conducted on a tripartite mixed dataset comprising substation equipment sounds, UAV noise, and environmental background noise. Results indicate that GSN has superior performance compared to mainstream models. GSN achieved improvements in SI-SDR by 0.8~7.1 dB, SIR by 1.3~9.7 dB, and PCC by 0.032~0.297. Furthermore, GSN demonstrated notable advantages in training convergence speed and stability. The GSN model effectively suppresses complex interference and faithfully reconstructs target equipment sound sources, thereby providing high-quality signals for acoustic inspection of substation equipment.