Publiée le 08/04/11 à 16h52

Licence Creative Commons CC-By-NC-Sa

Animal studies indicate that learning is driven by reward and punishment, leading to an optimized behaviour in a given environment. Reinforcement learning is a formal computational model of such reward-based learning, enabling an autonomous agent to learn optimal control policies through trial and error interaction with a dynamic environment. The reinforcement learning problem can be solved by using dynamic programming methods to estimate the Q-function which represents the utility of taking action a in a given state s.
Recently, Ernst introduced the Fitted Q Iteration algorithm, that makes efficient use of the data gathered from the system. While classical algorithms only use the current environment feedback to adapt the Q-function, Fitted Q Iteration implements a special form of long-term memory, such that all interaction experience can be utilized at each optimization step. This enables the agent to reflect on past decisions as new information about the system is revealed.
Fitted Q Iteration can be used with any function approximator to model the Q- function. Existing algorithms are based on regression trees, neural networks or kernel-methods. In this work the performance of different variations of the algorithm is compared on several reinforcement learning benchmark problems.