OPTIMIZING DYNAMIC ROUTING IN 6G NETWORKS
USING A MULTI-AGENT MULTI-STEP DEEP QLEARNING ALGORITHM
Nachimuthu senthil1
and Sumathi Arumugam2
1Department of Computer Science, KPR College of Arts, Science and Research,
Coimbatore- 641407, Tamil Nadu, India
2Department of Information Technology, KPR College of Arts, Science and Research,
Coimbatore- 641407, Tamil Nadu, India
ABSTRACT
The proliferation of 6G networks poses new challenges to traditional methods of network management due to the massive volumes of data generated and the wide variety of devices they link. A paradigm change towards AI frameworks built in Machine Learning (ML) and Deep Learning (DL) is essential due to the shortcomings of these approaches. A Speed-optimized Attention-based Hybrid Graph Convolutional Network-Long Short-Term Memory (SPAH-GCN-LSTM) model and a Reinforcement Learning (RL) framework utilizing Q-Learning (QL) were developed to forecast network congestion and enhance data transmission routes, respectively. Nevertheless, in a dynamic network, uncertainty in routing decisions could be caused by the switching between policies by a single agent. The time spent training one agent is prohibitive with the increase of the size of the network. In spite of the fact that multi-agent RLhas been utilized to alleviate this problem, classical QL can potentially face the challenge of non-stationarity that arises because of the joint learning of other agents in a multi-agent stochastic-game environment. As a result, the present manuscript presents a Multi-Agent Multi-Step Deep QL(MAMS-DQL) system that is aimed at optimizing Washington routes in 6G networks. The main goal is to come up with a decentralized mechanism where every agent is able to choose its best routing strategy independently. The multi-agent dueling deep Q-network architecture is followed in this method so as to optimize routing decisions and identify the most efficient route of the network. It also uses a multi-step experience-replay strategy, which allows agents to modify their routing strategy by taking advantage of multi-step experiences across consecutive time steps of training. Lastly, the outcomes of the simulator show that the MAMS-DQL has a higher routing efficiency compared to traditional reinforcement-learning approaches.
KEYWORDS
6G networks, SPAH-GCN-LSTM, Multi-agent RL, Deep Q-network, Multi-step experience replay strategy
1. INTRODUCTION
In earlier network structures, static routing strategies were beneficial. However, such strategies were insufficient for the complex 6G networks. 6G demands innovative network management techniques to lower latency and enable cutting-edge applications like remote surgeries and driverless cars because of its large data capacity and varied device integration [1]. 6G must expand on the features set by its predecessor, while communication service providers focus on improving monetization tactics for 5G [2]. Furthermore, crowded networks cause data traffic delays, especially during rerouting. Such latency issues may hinder the operation of latencysensitive tasks. This underscores the importance of adaptable and intelligent routing strategies for 6G networks [3, 4]. To address these issues, AI techniques, particularly ML and DL, have been developed for 6G networks [5, 6]. In order to handle 6G networks efficiently, LSTM networks have revolutionized predictive analytics by specializing in modelling complex, non-linear interactions in time-series data. Shi et al. [7] utilized LSTM models to adjust light pathways in diverse data centre networks and implement traffic forecasting. Computing complexity and the maintenance of long-term dependencies are two of the obstacles that LSTM networks come across, despite their strengths.
To improve network performance and maximize resource use, Tshakwanda et al. [8] used dynamic routing and predictive analytics. Combining SP-LSTM and RL for 6G system forecasting and adaptive routing led to the development of the two-tier technique. RL optimizes routing patterns according to predictive outcomes, whereas SP-LSTM forecasts network congestion, facilitating proactive actions. On the other hand, SP-LSTM might not be able to handle sudden changes in network conditions. Since both spatial and temporal interconnections are crucial in 6G networks, they might impact their ability to forecast congestion in such networks. The SPAH-GCN-LSTM was proposed in [9] to anticipate congestion in 6G networks as a solution to this problem.
The model integrates global geographical correlations with local geographical factors in traffic information via the utilization of the local and global spatial-temporal modules, which improves forecast accuracy. In order to depict the spatial-temporal connection, the global module incorporates SP-LSTM and global correlation. The local module incorporates local spatial interactions by integrating a GCN, SP-LSTM and a fully linked layer. After each module’s output is combined using the soft-attention method, the most important factors that lead to accurate predictions are emphasized. It is used to inform the dynamic routing of the RL framework by the use of real-time feedback and expected congestion scenarios. The QL agent keeps on learning and modifying the best routing policies.
However, in dynamic network routing, any agent can experience unpredictability in the decision made by the routing algorithm when switching between policies or when the routing algorithm is part of a real-time application. Due to the absence of a unified network view, a single agent can have difficulty of maximising several, and often competing, objectives, an issue which becomes
especially acute in the 6G large scale networks. In addition, the training of a single agent may become prohibitively long as the network grows. Multi-agent RL has recently been developed to address this issue. Conventional QL approach, however, can experience challenges related to non-stationarity created by the concurrent learning actions of additional agents in a multi-agent
stochastic game (SG) context.
In this regard, the article suggests a path selection and routing decision optimization method, namely, the MAMS-DQL. The problem is modelled as a multi-agent game, where the agents (reflecting different routes) are developed to come up with an effective and scalable route selection strategy. It aims to identify the optimal decentralized routing mechanism for independent agents. The multi-agent deep Q-network is presented in this structure to enhance the routing decisions by choosing the best route in the network.
The Neural Network (NN) can model the state- value functions and utilize the advantages to find the state-action value. Improved Q-function approximation through training the deep NN with system transitions to tweak the trainable parameters is possible. At every learning step, each agent feeds the DQN with the current state and calculates the Q- values after every action. In addition, the system employs a multiple-step experience replay, which enhances the regular experience replay and allows agents to train with many consecutive intervals of experience to adjust the routing strategy. This method ensures maximum use of the time association and learning effectiveness of the model thus simplifying additional optimization of the routing plans.
In turn, the suggested MAMS-DQL process can be successfully used to improve 6G routing decisions.
Here is the structure of the remaining sections: Prior research is covered in Section 2. Section 4 demonstrates the effectiveness of the MAMS-DQL method, and Section 3 describes it in detail. Future work is discussed in Section 5, which closes this study.
LITERATURE SURVEY
Modern network management would be incomplete without dynamic routing, which can instantly adjust to new conditions in the network. It outperforms static routing by facilitating more flexibility in intricate, dynamic networks. AI/ML technology has significantly improved the efficiency of routing operations. The most up-to-date studies on 6G network dynamic routing using AI and ML are summarized here.
In order to decrease the average time it takes for messages to go through the core network’s queuing process and select an alternate route, a novel routing method that takes into account the distribution of messages throughout the network was suggested in [10]. The backbone network reduced processing overhead at intermediate routers by consolidating several messages destined for a single router into a mailbag, thus streamlining the generation of mailbags and route-finding processes inside each router. Nonetheless, the congestion problem remains unresolved, resulting in increased delay at intermediate routers. To implement multi-objective routing in 6G networks, the authors in [11] investigated the use of the quantum approximate optimization algorithm. Nonetheless, quantum computers were incapable of performing this procedure.
A new Energy-Aware Data Collection with Routing Planning system, EADCRP-6G, was presented in [12] for 6G-enabled UAVs. To schedule routes, this method made use of an Artificial Fish Swarm-based Routing (AFSRP) technique and an Improved Red Deer Algorithmbased Clustering (IRDAC) scheme to choose the best cluster heads. In fact, there was a lot of energy lost and delay. As stated in [13], the CFTEERP was developed as a cooperative and feedback-based trustworthy energy-efficient routing protocol. The global and local trust levels of each node were determined using K-means-based feedback assessment methodologies and node attributes. Additionally, by eliminating the requirement to select the closest node for data routing, they increased the network lifespan utilizing the nearest secure node rates. Nonetheless, the computational burden remained high, necessitating the integration of ML/DL methods to enhance network performance further.
In [14], the authors addressed the issue of routing congestion among Secondary Users (SUs) in multi-hop scenarios with a known destination in the presence of Primary Users (PUs) inside 6G IoT systems. The first phase in constructing the routing traffic model was implementing the Poisson process derived from the Markov model. The routing problem was defined as the arbitrary training of non-cooperative events affecting SU routing choices. A method using distributed Non-Cooperative Learning (NCL) was subsequently used to address the problem. Conversely, packet loss and latency were high.
A Collaborative Energy-Efficient Routing Protocol (CEERP) was proposed in [15] to enhance transmission in 5G/6G wireless sensor networks. After sorting nodes according to their remaining energy, the RL chose the CH to improve data transmission. The network’s performance was substantially enhanced after incorporating the CEERP with the Multi-Objective Improved Seagull Algorithm (MOISA.).However, this increased energy usage.An algorithm for loadbalancing satellite routing for low Earth orbit utilizing the Markov Decision Process (MDP) was