Abstract:The multi-scale feature pyramid can alleviate the problems of semantic segmentation in complex traffic scenes, such as missing segmentation, wrong segmentation and unclear boundary segmentation. However, the existing multi-scale feature pyramid has to downsample the feature maps and sacrifice the spatial detail information for rich semantic information, leading to the limited accuracy of the final segmentation result. Aiming at this problem, a feature enhancement module is proposed to further reinforce similar features based on cosine similarity between different vectors before downsampling, alleviating the negative influence of downsampling. In addition, combined with the principle of dilated convolution and strip convolution, the large convolution kernel is modified to build a new multi-scale feature pyramid module for semantic information with different scales and larger receptive fields. The proposed segmentation method is real-time and efficient, and can meet the requirements of automatic driving. Experiments on the VOC2012 dataset show that the mIoU of the proposed method reaches 74.36%, and the FPS reaches 43, which is superior than the current prevailing semantic segmentation methods.