Reinforcement learning for the traveling salesman problem: Performance comparison of three algorithms

doi:10.1049/tje2.12303

DOI: 10.1049/tje2.12303 ISSN:

Reinforcement learning for the traveling salesman problem: Performance comparison of three algorithms

Jiaying Wang, Chenglong Xiao, Shanshan Wang, Yaqi Ruan

General Engineering
Energy Engineering and Power Technology
Software

Show PDF Cite

Abstract

Travelling salesman problem (TSP) is one of the most famous problems in graph theory, as well as one of the typical nondeterministic polynomial time (NP)‐hard problems in combinatorial optimization. Reinforcement learning (RL) has been widely regarded as an effective tool for solving combinatorial optimization problems. This paper attempts to solve the TSP problem using different reinforcement learning algorithms and evaluated the performance of three RL algorithms (Q‐Learning, SARSA, and Double Q‐Learning) under different reward functions, ε‐greedy decay strategies, and running times. A comprehensive analysis and comparison of the three algorithms mentioned above were conducted in the experiment. First, the experimental results indicate that the Double Q‐Learning algorithm is the best algorithm. Among the eight TSP instances, the Double Q‐Learning algorithm outperforms the other two algorithms in five instances. Additionally, it has shorter runtimes compared to the SARSA algorithm and similar runtimes to the Q‐Learning algorithm across all instances. Second, upon analysing the results, it was found that using the reward strategy contributes to obtaining the best results for all algorithms. Among the 24 combinations of 3 algorithms and 8 instances, 17 combinations achieved the best results when the reward strategy was set to .

Outline

Reinforcement learning for the traveling salesman problem: Performance comparison of three algorithms

Abstract

More from our Archive