Abstract:The customer flow statistics in indoor video surveillance are affected by factors such as changes in background lighting and crowdedness of people, resulting in lower accuracy of background updating, target extraction, and recognition. Furthermore, due to the algorithm's inability to meet the real-time requirements of high frame rate videos (60 frames per second), the accuracy of recognition and statistics is below 95%. To address the aforementioned issues and design an algorithm for indoor human detection and recognition, we propose the following approach. Firstly, we combine the running average method with Gaussian mixture background modeling. This approach involves merging similar pixels based on their pixel values to reduce redundant calculations. Additionally, we employ the mean method to handle noise points within a certain range, further improving the real-time performance of background extraction. Secondly, we utilize an adaptive thresholding technique that adjusts the segmentation threshold based on regional changes in illumination intensity. This adaptive adjustment helps to avoid detection result variations caused by uneven lighting conditions. For human recognition, we employ a method that combines AdaBoost-based human head-shoulder localization with a shortest distance classifier. Initially, we roughly locate the human head and shoulders based on the actual position of moving objects. Then, we extract features related to the human head, such as circularity and shoulder width ratio. Finally, by combining the shortest distance classifier, we classify and recognize the human body. In high frame rate video experiments, the processing time for each frame of complex and multiple individuals is generally within 15ms, and the accuracy of person recognition reaches 98%. The results of the experiments demonstrate that the proposed method effectively addresses the challenges of multi-target human detection, recognition, and statistics in high frame rate videos with complex and changing backgrounds, as well as crowded scenes.