Abstract:To address the challenges faced by traditional simultaneous localization and mapping (SLAM) algorithms in dynamic occlusion scenarios—namely, the inability to effectively label occluded objects, accurately determine the motion state of potential objects, and the reduction in feature point count after dynamic object removal—this paper proposes an improved visual SLAM (VSLAM) algorithm based on the Oneformer segmentation network. This algorithm enhances attention to occluded regions by designing feature-enhancing convolutions, feature enhancement modules, and occlusion attention modules. It optimizes relative position encoding to improve semantic accuracy of occluded object boundaries, enabling precise marking of potential dynamic objects. Object motion is assessed by first determining the camera position via camera pose estimation, followed by object motion estimation. An optimal nearest-neighbor pixel matching strategy is employed to repair dynamic regions using static information from adjacent frames, enabling the extraction of repaired feature points for pose estimation. Validation on the TUM public dataset and real-world scenarios demonstrated superior trajectory accuracy. Compared to DS-SLAM and DynaSLAM algorithms, the mean root mean square error of absolute trajectory error decreased by 84.08% and 22.29%, demonstrated excellent trajectory accuracy.