Unlocking Agentic Reasoning in LLMs
The Challenge and the Opportunity
Large language models (LLMs) have achieved remarkable progress in solving complex reasoning tasks, from math word problems to code generation and beyond. Yet, despite these advances, most LLMs remain fundamentally limited: they rely on static internal knowledge and text-only reasoning. This means they often struggle with tasks that require up-to-date information, precise computation, or dynamic interaction with external tools and environments.
Real-world problem solving is rarely a single-turn Q&A. It demands multi-step reasoning, adaptive decision making, and the ability to orchestrate external resources—whether that’s searching the web, running code, or interacting with APIs. The opportunity is: if LLMs could reason agentically—planning, adapting, and leveraging tools as needed—they could unlock a new frontier of robust, generalizable, and interpretable AI.
Introducing ARTIST: Agentic Reasoning and Tool Integration in Self-Improving Transformers
Our work introduces ARTIST (Agentic Reasoning and Tool Integration in Self-Improving Transformers), a unified framework that tightly couples agentic reasoning, reinforcement learning (RL), and dynamic tool integration for LLMs.

The ARTIST architecture. Agentic reasoning is achieved by interleaving text-based thinking, tool queries, and tool outputs, enabling dynamic coordination of reasoning, tool use, and environment interaction within a unified framework.
What makes ARTIST different?
- Agentic Reasoning: The model doesn’t just answer; it plans, adapts, and decides when, how, and which tools to use—mid-reasoning and across multiple steps.
- Tool Integration: ARTIST supports seamless interaction with a wide range of external tools and environments, from code interpreters and web search to APIs and file systems.
- Reinforcement Learning: Rather than relying on hand-crafted prompts or supervised fine-tuning, ARTIST uses outcome-based RL (specifically, Group Relative Policy Optimization, or GRPO) to teach the model robust strategies for tool use and environment interaction—without step-level supervision.
How Does it Work?
At its core, ARTIST treats tool usage and environment interaction as first-class operations within the reasoning process. For each task, the model alternates between:
- Text-based thinking (think…/think)
- Tool queries (tool_name…/tool_name)
- Tool outputs (output…/output)
- Final answers (answer…/answer)
This structure enables the model to reason, plan, and interact with external resources in a tightly coupled loop. During training, ARTIST uses RL to optimize not just for correct answers, but for coherent, efficient, and context-aware tool use.

Overview of the ARTIST methodology. The framework illustrates how reasoning rollouts alternate between internal thinking, tool use, and environment interaction, with outcome-based rewards guiding learning. This enables the model to iteratively refine its reasoning and tool-use strategies through reinforcement learning.
Results: Extensive Experiments Across Two Domains
We conducted comprehensive experiments to rigorously evaluate ARTIST in two core domains: complex mathematical problem solving and multi-turn function calling. Benchmarks included MATH-500, AIME, AMC, and Olympiad Bench for math, and t-bench and BFCL v3 for multi-turn tool use.
Key results:
- ARTIST achieves up to 22% absolute improvement over base models on the hardest math tasks.
- On multi-turn function calling, ARTIST more than doubles the accuracy of prompt-based baselines on t-bench.
- ARTIST consistently outperforms state-of-the-art frontier LLMs and tool-augmented models across all evaluated settings.
Key Insights & Emergent Capabilities
Our analyses reveal that ARTIST’s agentic RL training leads to several emergent behaviors:
- Adaptive Tool Selection: The model learns to invoke the right tool at the right time, based on context.
- Iterative Self-Correction: ARTIST recovers from errors by reflecting on tool outputs and adjusting its strategy.
- Context-Aware Multi-Step Reasoning: The model generates longer, richer reasoning traces, showing deeper understanding.
- Efficient Task Completion: ARTIST completes tasks in fewer steps, thanks to smarter planning and error recovery.
Conclusion: The Future of Agentic LLMs
ARTIST demonstrates that tightly integrating agentic reasoning, reinforcement learning, and dynamic tool use marks a paradigm shift in LLM capabilities. By enabling models to reason adaptively, invoke tools strategically, and self-correct through environment interaction, ARTIST sets a new standard for robust, interpretable, and generalizable problem-solving in real-world scenarios.