Reward maximization has been proposed as a sufficient requirement for general intelligence, unifying abilities such as knowledge acquisition, perception, social interaction, and generalization through a single objective. In this post, I share my personal reflections on both the strengths and the limitations of this claim. To the end, in order to make progress toward general intelligence, I believe that developing methods for selecting effective reward functions and advancing techniques for interpreting learned reward functions will be essential.
“Reward is Enough”
First of all, as shown in Section 6
On the other hand, some arguments remain subject to debate. For instance, the claim that reward-maximizing behavior could align with specific behaviors from distinct goals and therefore can provide general intelligence requires further justification. While it offers explanations for various abilities, every section relies solely on assumptions without rigorous mathematical proof. Especially, even if we accept this, the challenge of defining the reward function remains. For example, with a predefined fixed reward, the agent may struggle to adapt to drastic environmental changes, even when acting based on the reward function. Even if the reward function is updated, establishing clear criteria for constructing an optimal reward function that effectively guides the agent toward the ultimate goal remains difficult, given the complexity of the environment. Furthermore, it may not be feasible to incorporate all relevant factors into a single reward function. This challenge becomes even more pronounced in dynamically changing environments, which are characteristic of natural systems. In such cases, the ability to select and adapt to the optimal reward function from a set of available reward functions (whether predefined or not) or applying strategies other than reinforcement learning might be more beneficial than relying on a single reward function. This, in turn, brings us back to the fundamental question: how do we build such strategies for generalization? Finally, the idea that reward maximization provides a deeper understanding of why such an ability arises does not seem convincing, as it once again leads to another question: how should we interpret the agent’s behavior based on the reward function?
Therefore, to establish a robust foundation, we should focus on providing an accurate framework for selecting, constructing, and utilizing reward functions to make them a key component of general intelligence. For this, recent work such as “To the Max: Reinventing Reward in Reinforcement Learning”