A reinforcement learning-Variable neighborhood search method for the capacitated Vehicle Routing Problem

作者：

Highlights：

• An online-learning, multi-agent hyper-heuristic framework applied on CVRP.

• Five UCB algorithms are analyzed to find the optimum learning scheme.

• Introduction of a hybrid UCB method by combining different UCB algorithmic elements.

• Integration of a concept drift detector to auto-correct the learning procedure.

摘要

•An online-learning, multi-agent hyper-heuristic framework applied on CVRP.•Five UCB algorithms are analyzed to find the optimum learning scheme.•Introduction of a hybrid UCB method by combining different UCB algorithmic elements.•Integration of a concept drift detector to auto-correct the learning procedure.

论文关键词：Reinforcement Learning,Multi-Armed Bandits,Intelligent Optimization,Bandit Learning,Metaheuristics,Variable Neighborhood Search,Vehicle Routing Problem

论文评审过程：Received 24 October 2021, Revised 12 August 2022, Accepted 8 September 2022, Available online 17 September 2022, Version of Record 30 September 2022.

论文官网地址：https://doi.org/10.1016/j.eswa.2022.118812