DOI: 10.1049/tje2.12303 ISSN:

Reinforcement learning for the traveling salesman problem: Performance comparison of three algorithms

Jiaying Wang, Chenglong Xiao, Shanshan Wang, Yaqi Ruan
  • General Engineering
  • Energy Engineering and Power Technology
  • Software


Travelling salesman problem (TSP) is one of the most famous problems in graph theory, as well as one of the typical nondeterministic polynomial time (NP)‐hard problems in combinatorial optimization. Reinforcement learning (RL) has been widely regarded as an effective tool for solving combinatorial optimization problems. This paper attempts to solve the TSP problem using different reinforcement learning algorithms and evaluated the performance of three RL algorithms (Q‐Learning, SARSA, and Double Q‐Learning) under different reward functions, ε‐greedy decay strategies, and running times. A comprehensive analysis and comparison of the three algorithms mentioned above were conducted in the experiment. First, the experimental results indicate that the Double Q‐Learning algorithm is the best algorithm. Among the eight TSP instances, the Double Q‐Learning algorithm outperforms the other two algorithms in five instances. Additionally, it has shorter runtimes compared to the SARSA algorithm and similar runtimes to the Q‐Learning algorithm across all instances. Second, upon analysing the results, it was found that using the reward strategy contributes to obtaining the best results for all algorithms. Among the 24 combinations of 3 algorithms and 8 instances, 17 combinations achieved the best results when the reward strategy was set to .

More from our Archive