Abstract:To solve the problems of difficulty in segmentation of complex areas, loss of edge details, and insufficient generalization ability in polyp segmentation by existing models.This paper proposed a polyp segmentation model based on fusion of local and global features.Convolutional neural network and Transformer are used as parallel encoders to make the model take into account both the local detail features and global semantic features of multiple scales.The attention enhancement block and the multi-scale residual block are constructed at the jump junction. The former enhances the model′s focus on important information, while the latter efficiently explores the target regions and accurately predicts theirs boundaries, while promoting the interaction between different levels of features.The residual-based stepwise upsampling feature fusion method is used in the decoding stage to gather the features of each stage, which further enhanced the perception ability of the model and enriched the polyp features.Finally, the efficient prediction head is used to promote the fusion of shallow features and output the segmentation results.The model performs best in the comparative experiments. Compared with the sub-optimal model, on the Kvasir and CVC-ClinicDB datasets, it achieved an average mDice improvement of 1.21% and an average mIoU improvement of 1.82%; on the CVC-ColonDB and ETIS datasets, it achieved an average mDice improvement of 2.67% and an average mIoU improvement of 2.83%. The experimental results show that the proposed model has better segmentation accuracy and generalization performance than the existing mainstream models.