Abstract:To address the limitations of existing weakly supervised semantic segmentation models for point clouds,which struggle to balance local feature correlation, generalization, and feature utilization. This paper proposes WS-MLF, a weakly supervised point cloud semantic segmentation model via multi-scale local feature fusion, based on the RAC-Net baseline. Firstly, the raw point cloud data is taken as input, and a multi-scale spherical sampling methods (MSSM) is employed to capture hierarchical features across varying spatial radii. Secondly, a multi-local feature aggregation enhancement module (MFA) is designed to refine geometric context within neighborhoods. Thirdly, a spatial-channel-fused hybrid attention module (SCH-Att) is proposed to prioritize discriminative channels and key points. Finally, a decoder is utilized for upsampling to generate point-level semantic labels, thereby completing the semantic segmentation task. The proposed model is evaluated on large-scale indoor scene datasets, S3DIS and ScanNet-v2. Experimental results demonstrate that on the S3DIS dataset, when the label ratios are 0.02% and 0.06%, the mIoU surpasses RAC-Net by 2.71% and 0.54%, respectively. On the ScanNet-v2 dataset, with a label ratio of 20 pt, the mIoU increases by 1.55% compared with RAC-Net. These results validate WS-MLF′s effectiveness in extracting key features under weak supervision, enhancing segmentation accuracy.