Video description model of attention mechanism based on dilated convolution

Home > Archive>Volume 44, Issue 23, 2021 >99-104

Video description model of attention mechanism based on dilated convolution
DOI:
                        
CSTR:
                        [cstr]
                    
Author:
                        
Affiliation:School of Electronic Engineering, Guangxi Normal University, Guilin Guangxi 541004, China
Clc Number:TP391.4；TP183
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In order to solve the problems of insufficient correlation between visual features and word features, low training efficiency, errors in generated natural language and low index scores in the process of video description, a video description model based on the attention mechanism of dilated convolution is proposed. In the encoding stage of the model, Inception-v4 is used to encode the video features, and then the encoded visual features and word features are input into the attention mechanism based on dilated convolution. Finally, the video is decoded through the long short-term memory network to generate the natural description statement of the video. A comparative experiment was conducted on the public video description data set MSVD, and the model was verified by evaluation indicators (BLEU, ROUGE_L, CIDEr, METEOR). The experimental results showed that the video description model based on the attention mechanism of dilated convolution has significantly improved in all indicators. Compared with the baseline model SA-LSTM (Inception-V4), the BLEU_4, ROUGE_L, CIDEr and METEOR indicators have increased by 4.23%, 4.73%, 2.11% and 2.45% respectively.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:
Revised:
Adopted:
Online: July 02,2024
Published:

Home

Introduction

Editorial Committee

Policy

Contact Us

中文版

Get Citation

Share

Article Metrics

History

Article QR Code