Robot indoor scene recognition based on fusion of CNN and Transformer

Home > Archive>Volume 37, Issue 5, 2023 >223-229

Robot indoor scene recognition based on fusion of CNN and Transformer
DOI:
                        
CSTR:
                        [cstr]
                    
Author:
                        
Affiliation:
Clc Number:TP242;TN98
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In order to improve the accuracy of robot scene recognition in complex indoor environments, this paper proposes a robot scene recognition model that fuses convolutional neural network (CNN) and visual Transformer structure. The model uses CNN to extract local features of the scene. And the visual Transformer structure is used to capture the distant dependencies in the features. The proposed visual Transformer structure consists of three parts, they are a feature encoding structure (Attention Embedding), an Encoder structure, and a structure that converts high-level semantic features into pixel-level features (Attention Project). The robot scene recognition model studied in this paper uses CNN to improve the description ability of local detail features of the visual Transformer. Furthermore, the visual Transformer helps CNN to construct the dependencies of distant features, which can effectively characterize and utilize the visual features of the robot working scene images. Finally, the effectiveness of the model is verified by experimenting with the dataset collected by the robot in the actual working environment and the open source COLD dataset. The scene recognition accuracy of our model is higher.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:
Revised:
Adopted:
Online: September 18,2023
Published:

Home

Introduction

Editorial Committee

Current Issue

Policy

Contact Us

Chinese

Get Citation

Share

Article Metrics

History

Article QR Code