基于改进 Dueling DQN 的多园区网络动态路由算法

首页 > 过刊浏览>2022年第36卷第11期 >211-220

基于改进 Dueling DQN 的多园区网络动态路由算法
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        李国燕李国燕
1.天津城建大学计算机与信息工程学院
在期刊界中查找
在百度中查找
在本站中查找
史东雨史东雨
1.天津城建大学计算机与信息工程学院
在期刊界中查找
在百度中查找
在本站中查找
张宗辉张宗辉
1.天津城建大学计算机与信息工程学院
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP181;TN91
基金项目:天津市科技计划（19YFZCGX00130）项目资助

Dynamic routing algorithm for multi campus network based on improved Dueling DQN

Author:

Li Guoyan
Li Guoyan
1.School of Computer and Information Engineering, Tianjin Chengjian University
在期刊界中查找
在百度中查找
在本站中查找
Shi Dongyu
Shi Dongyu
1.School of Computer and Information Engineering, Tianjin Chengjian University
在期刊界中查找
在百度中查找
在本站中查找
Zhang Zonghui
Zhang Zonghui
1.School of Computer and Information Engineering, Tianjin Chengjian University
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对高度“中心”连接的多园区网络中,负载不均衡造成传输时延长和网络拥塞问题,提出一种基于自适应多采样机制的决斗深度强化网络(adaptive multi-sampling Dueling deep Q-network, AMD-DQN)动态路由优化算法。首先,在网络模型中引入决斗网络(dueling DQN)的思想,同时对多层感知器组成结构进行中心化处理改进,防止高估计价值函数;然后,经验回放机制采用了自适应多采样机制,该机制融合了随机、就近和优先采样方式,根据负载情况进行自适应调整,并根据权值概率随机选取采样模式;最后,利用 AMD-DQN 网络结构结合强化学习信号和随机梯度下降来训练神经网络,选出每步最大价值动作,直至传输成功。实验结果表明,相比传统的 DQN 和 Dueling DQN 算法,AMD-DQN 算法平均时延为 128. 046 ms,吞吐量达到 5. 726 个/ s, 有效减少了数据包的传输时延,提高了吞吐量,同时从 5 个方向对拥塞程度进行评价,取得了较好的实验结果,进一步缓解了网络的拥塞。

关键词:动态路由;深度强化学习;决斗网络;自适应多采样经验回放

Abstract:

Aiming at the problems of transmission time delay and network congestion caused by load imbalance in highly “ central” connected multi-campus networks, a dynamic routing optimization algorithm based on adaptive multi-sampling Dueling deep Q-Network (AMD-DQN) is proposed. Firstly, the idea of Dueling DQN is introduced into the network model, and the structure of the multilayer perceptron is improved by centralized processing to prevent high estimation of value function. Then, the experience playback mechanism adopts an adaptive multisampling mechanism, which combines random, nearest and priority sampling methods, adjusts adaptively according to the load situation, and randomly selects the sampling mode according to the weighted probability. Finally, the AMD-DQN network structure is combined with reinforcement learning signal and random gradient descent to train the neural network, and the maximum value action of each step is selected till the transmission is successful. The experimental results show that compared with the traditional DQN and Dueling DQN algorithms, the average delay of the AMD-DQN algorithm is 128. 046 ms, and the throughput reaches 5. 726 / s, which effectively reduces the transmission delay of packets and improves the throughput. At the same time, the congestion degree is evaluated from five directions, and good experimental results are obtained, which further alleviates the congestion of the network.

Key words:dynamic routing; deep reinforcement learning; fighting network; adaptive multisampling empirical playback

引用本文

李国燕,史东雨,张宗辉.基于改进 Dueling DQN 的多园区网络动态路由算法[J].电子测量与仪器学报,2022,36(11):211-220

复制

文章指标

点击次数:982
下载次数: 2127
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2023-03-29
出版日期:

网站首页

杂志简介

投稿须知

在线阅读

欢迎订阅

招商合作

联系我们

English

引用本文

分享

文章指标

历史

文章二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码