Human action recognition based on 2D CNN and Transformer
DOI:
CSTR:
Author:
Affiliation:

College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, China

Clc Number:

TP18

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Human action recognition is one of the research hot-spots in the field of computer vision. It has far-reaching theoretical research significance in human-computer interaction, video surveillance and so on. In order to solve the problem that 2D CNN can not effectively obtain time relationship, based on the advantages of Transformer in modeling long-term dependency, Transformer structure is introduced and combined with 2D CNN for human action recognition to better capture context time information. Firstly, 2D CNN integrating channel-spatial attention module is used to capture the inter spatial features. Then, Transformer is used to capture the temporal feature between frames. Finally, MLP head is used for action classification. The experimental results show that the recognition accuracy of HMDB-51 datasets and UCF-101 datasets is 69.4% and 95.5% respectively.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: April 08,2024
  • Published:
Article QR Code