Agent57 became the first deep reinforcement learning agent to score above the human baseline on all 57 Atari 2600 games, a major milestone reported by DeepMind. Agent57's achievement demonstrated the power of artificial intelligence to master complex environments, even those requiring nuanced strategies, consistently surpassing human performance benchmarks. The agent's success spanned the entire suite of classic games, including those historically challenging for AI.
However, AI agents are achieving unprecedented performance and learning complex reasoning, but they are doing so through self-discovered trial-and-error processes that are not explicitly taught by humans. The decoupling of problem-solving capabilities from human-defined reasoning represents a significant shift in AI development. The process of reinforcement learning, which involves an agent learning optimal actions through interaction and feedback, is central to these advancements, allowing for the discovery of non-human-like strategies.
The autonomous reasoning capability will likely lead to AI solving problems in novel, non-human-intuitive ways, accelerating progress in fields previously constrained by human cognitive biases. The ability for AI to set and achieve its own goals through iterative trial and error, particularly by 2026, could redefine how complex challenges are approached.
Agent57, a deep reinforcement learning agent, achieved a notable feat by scoring above the human baseline on all 57 Atari 2600 games, as detailed by DeepMind. This agent not only matched but surpassed human-level performance on the most challenging Atari games, not just the easier ones. Furthermore, this model outperformed all previous approaches on six of these games, according to arXiv. Agent57's achievements demonstrate reinforcement learning's power to master complex tasks through self-play, even those requiring nuanced strategy, consistently surpassing human benchmarks.
How AI Learns Through Trial and Error
In a demonstration of efficiency, IRIS, a data-efficient agent, achieved a mean human normalized score of 1.046 on the Atari 100k benchmark with only two hours of gameplay, according to ICLR. IRIS outperformed humans on 10 out of 26 Atari games, setting a new state of the art for methods without lookahead search. Reinforcement learning (RL) allows AI agents to learn optimal strategies by interacting with an environment and receiving rewards, mimicking a trial-and-error process that can quickly surpass human performance with remarkable efficiency. While IRIS demonstrated superior performance on a subset of Atari games, Agent57 was the first to surpass the human baseline on all 57 games, as reported by DeepMind. The difference highlights that "human-level performance" can be a moving target, with varying benchmarks and game subsets used across different studies, making direct comparisons complex without further context.
Beyond Games: AI That Explains Itself
Reinforcement learning is now extending its reach beyond game environments into more complex cognitive tasks. Large language models (LLMs) can be taught to reason using a trial-and-error process, without being shown any examples of human reasoning, as reported by Nature. When these LLMs are trained with reinforcement learning for correct answers, they naturally learn to output their reasoning processes. The capability allows LLMs to discover their own reasoning behaviors that earn high rewards, rather than being limited to human-defined patterns or predefined logical structures. The application of RL is crucial because it enables AIs to develop their own internal logic and explain their decisions, moving beyond mere pattern recognition to self-discovered reasoning processes.
Redefining AI Performance Benchmarks
The progression of AI capabilities in complex environments suggests a rapid redefinition of what constitutes "human-level" performance. The swift advancement from IRIS outperforming humans on 10 Atari games to Agent57 surpassing human baselines on all 57 Atari games indicates that the ceiling for AI performance is far higher than previously imagined. The rapid advancement makes "human-level" a rapidly outdated and insufficient benchmark for evaluating advanced AI capabilities. These systems demonstrate an ability to discover problem-solving strategies that may not align with human intuition, yet prove more effective. The focus is shifting from merely matching human abilities to exploring and optimizing solutions that transcend human cognitive limitations.
The Future of Autonomous AI Reasoning
The DeepSeek AI team implemented a model called DeepSeek-R1 in January 2025, which uses reinforcement learning to elicit reasoning steps, according to Nature. The DeepSeek-R1 model's development underscores a future where AI systems will not only provide answers but also transparently articulate their self-discovered logic. Large language models (LLMs) already perform better at complex tasks when they write down their reasoning process before answering, a finding reinforced by recent research. Additionally, the model discussed by arXiv, in a different context, surpasses a human expert on three games. The ability of AIs to independently develop and articulate their reasoning processes, as seen in models like DeepSeek-R1, suggests a future where AI can tackle highly complex, multi-step problems with greater autonomy, transparency, and potentially superior solutions.
How does trial and error work in AI?
In artificial intelligence, trial and error, often facilitated by reinforcement learning, involves an agent interacting with an environment to learn optimal actions. The agent performs an action, receives feedback in the form of a reward or penalty, and uses this feedback to adjust its strategy to maximize cumulative reward over time, a concept detailed by Spinning Up Documentation. The iterative process allows the AI to discover effective behaviors without explicit programming for every scenario.
What are the goals of reinforcement learning?
The primary goal of reinforcement learning is for an AI agent to learn a policy, or mapping from states to actions, that maximizes the total cumulative reward it receives from an environment over the long run. This involves balancing exploration of new actions with exploitation of known good actions. For example, in a robotic control task, the goal might be to learn efficient movement patterns to reach a target without crashing.
Can AI learn through trial and error?
Yes, AI can effectively learn through trial and error, particularly through reinforcement learning algorithms. This method enables AI systems to acquire complex skills and strategies in dynamic environments, from mastering video games to controlling robotic systems. For instance, early AI research explored simple trial-and-error methods to solve mazes, demonstrating the fundamental viability of this learning paradigm.
Based on Nature's findings that LLMs learn to reason via trial-and-error without human examples, companies relying solely on supervised learning for AI reasoning risk developing systems limited by human cognitive biases. Competitors embracing reinforcement learning could unlock truly novel problem-solving approaches by allowing AI to discover its own optimal logical patterns. By Q4 2026, organizations that fail to integrate autonomous reasoning capabilities into their AI development pipelines may find their systems outpaced by those leveraging RL for self-discovered, non-human-like solutions, particularly in areas requiring advanced strategic planning or scientific discovery.










