TradeBot: Bandit learning for hyper-parameters optimization of high frequency trading strategy
作者:
Highlights:
• We formulate the quantitative trading policy learning as a reinforcement learning problem and propose reward-agnostic UCB to learn the dynamically adjustable trading strategies’ hyper-parameters with a powerful back-testing system.
• We leverage inverse reinforcement learning to learn a reward function for accurately estimating the profits of each trading order.
• We show promising performance on real-world high-frequency trading in China Commodity Future market. To our best knowledge, this is the first work deployed in online trading system via reinforcement learning.
摘要
•We formulate the quantitative trading policy learning as a reinforcement learning problem and propose reward-agnostic UCB to learn the dynamically adjustable trading strategies’ hyper-parameters with a powerful back-testing system.•We leverage inverse reinforcement learning to learn a reward function for accurately estimating the profits of each trading order.•We show promising performance on real-world high-frequency trading in China Commodity Future market. To our best knowledge, this is the first work deployed in online trading system via reinforcement learning.
论文关键词:High-Frequency trading,Hyper-parameter optimization,Multi-armed bandit learning,Inverse reinforcement learning
论文评审过程:Received 1 April 2021, Revised 1 December 2021, Accepted 4 December 2021, Available online 7 December 2021, Version of Record 27 December 2021.
论文官网地址:https://doi.org/10.1016/j.patcog.2021.108490