Dr. Howard M. Schwartz: Publication Abstract

Abstract: Two multi-agent policy iteration learning algorithms are proposed in this work. The two proposed algorithms use the Exponential Moving Average (EMA) approach along with the Q-learning algorithm as a basis to update the policy for the learning agent so that the agent's policy converges to a Nash equilibrium policy against the policies of the other agents. The proposed algorithms have been theoretically analysed and a mathematical proof of convergence for each algorithm is provided. The proposed algorithms are examined on a variety of matrix and stochastic games. Simulation results show that the proposed EMA Q-learning algorithm converges in a wider variety of situations than state-of-the-art multi-agent reinforcement learning (MARL) algorithms. PDF
Keywords: Q-Learning, Reinforcement Learning, Multi-Agent Learning

Department of Systems and Computer Engineering
Ottawa, Canada

Dr. Howard Schwartz: Publication Abstract

Department of Systems and Computer Engineering Ottawa, Canada

Dr. Howard Schwartz: Publication Abstract

Department of Systems and Computer Engineering
Ottawa, Canada