On-policy State Distributions & Optimisation Objectives for Episodic / Continuing (Average-Reward, Discounting Settings) Tasks
This article summarises concepts related to on-policy Reinforcement Learning (RL) algorithms. Specifically the focus is on looking at the optimisation objectives, on-policy distributions and compare between different RL task settings.