Reinforcement learning algorithms aim to determine the ideal behavior within a specific context based on simple reward feedback on their actions; the self-driving car is a typical example.
This training process is an iterative process in which machine learning (ML) algorithms try to find the optimal combination of variables and weights given to the input variables (referred to as features) of the model with the goal of minimizing the training error as judged by the difference between predicted outcome and actual outcome 2).
Several studies have shown that natural gradient descent for on-line learning is much more efficient than standard gradient descent. In this article, we derive natural gradients in a slightly different manner and discuss implications for batch-mode learning and pruning, linking them to existing algorithms such as Levenberg-Marquardt optimization and optimal brain surgeon. The Fisher matrix plays an important role in all these algorithms. The second half of the article discusses a layered approximation of the Fisher matrix specific to multilayered perceptrons. Using this approximation rather than the exact Fisher matrix, we arrive at much faster “natural” learning algorithms and more robust pruning procedures 3).