Inhomogeneous deep Q-network for time sensitive applications

作者:

摘要

Deep Q-network (DQN) has attracted increasing attention from both industry and academic communities. Existing methods mostly formulate the decision process as discrete agent-environment interactions, while the intervals between successive interactions are largely neglected, which may otherwise reveal important signals in real-world applications. To bridge this gap, this paper proposes to explicitly model the time intervals in DQN. Specifically, we first cast the agent-environment interactions onto a continuous time dimension, and then define a time-aware learning objective and the corresponding Bellman operator. For sample efficient training, we approximate the Q-function with a neural network, where the time information is modeled by the point process. The intensity function in point process and Q-function are seamlessly integrated by sharing the same history summarization module, such that the time interval information can directly influence the model optimization process. To close the gap between the approximated and optimal Q-function, we theoretically analyze the sample complexity of our model by deriving the finite time bound in continuous time. We conduct both simulation and real-world experiments to demonstrate our model's effectiveness.

论文关键词:Reinforcement learning,Point process,Time-sensitive applications

论文评审过程:Received 8 May 2021, Revised 27 June 2022, Accepted 7 July 2022, Available online 15 July 2022, Version of Record 22 August 2022.

论文官网地址:https://doi.org/10.1016/j.artint.2022.103757