Dr. Howard M. Schwartz: Publication Abstract

Abstract: The main contribution of this work is a novel machine reinforcement learning algorithm for problems where a Poissonian stochastic time delay is present in the agent's reinforcement signal. Despite the presence of the reinforcement noise, the algorithm can craft a suitable control policy for the agent's environment. The novel approach can deal with reinforcements which may be received out of order in time or may even overlap, which was not previously considered in the literature. The proposed algorithm is simulated and its performance is compared to a standard Q-learning algorithm. Through simulation, the proposed method is found to improve the performance of a learning agent in an environment with Poissonian-type stochastically delayed rewards. PDF
Keywords: Reinforcement learning, Markov Decision Process, stochastic time delay, reward, cost, jitter, multiple models

Department of Systems and Computer Engineering
Ottawa, Canada

Dr. Howard Schwartz: Publication Abstract

Department of Systems and Computer Engineering Ottawa, Canada

Dr. Howard Schwartz: Publication Abstract

Department of Systems and Computer Engineering
Ottawa, Canada