Abstract:To address the challenges in food image recognition caused by small inter-class differences, large intra-class variations, and complex structures, this paper proposes a food image recognition method that integrates multi-scale features and an attention mechanism. First, the ConvNeXt model, which has stronger feature extraction capabilities, is used as the backbone network to better capture the detailed features of food images. Next, an improved ASPP module is introduced to expand the receptive field and utilize multi-scale information, enhancing the model′s ability to capture features at different scales. Finally, an attention mechanism is added after each convolutional block to improve feature representation and the ability to capture contextual information. Experimental results show that the proposed method achieves accuracies of 91.56% and 87.22% on the extended Vireo Food172 dataset and the ETH Food101 dataset, respectively, which represents an improvement of 2.05% and 1.66% over the original model, thus verifying the effectiveness of the proposed method.