Reinforcement Learning – The balance between exploration and exploitation
* How software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. * The focus is finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). * The environment is typically formulated as a Markov decision process (MDP) utilizingContinue Reading