Abstract:Two-person interaction action recognition based on skeleton sequence data has broad application prospects. To address the issues of insufficient interaction feature representation and redundant intra-class features in current recognition models, we propose a multi-scale deformable graph convolutional network (MD-GCN) for recognizing two-person interaction actions. First, we construct a two-person interaction hypergraph, including a person pair hypergraph and an interaction relationship matrix. Unlike traditional graphs, this hypergraph better captures the interaction between the two people, enabling a more comprehensive representation of the interaction features. Next, three input branches perform data preprocessing and feature extraction, and then the extracted features are fused and fed into the main branch, which is based on the multi-scale deformable graph convolutional network for action classification. This network learns deformable sampling positions in a multi-modal manner, effectively capturing key interaction features while avoiding feature redundancy and information loss. The proposed MD-GCN achieves a recognition accuracy of up to 98.41% on the 26 interaction action classes from the NTU RGB+D 60 and NTU RGB+D 120 datasets. This approach effectively addresses the challenges of feature representation in two-person interaction action recognition. Experimental results show that the method not only maintains high recognition accuracy but also significantly reduces the computational cost, achieving a good balance between inference performance and accuracy, making it highly valuable for practical applications.