AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

Xu, Ran; Zhuang, Yuchen; Dong, Zihan; Wang, Ruiyu; Yu, Yue; Ho, Joyce; Zhang, Linjun; Wang, Haoyu; Shi, Wenqi; Yang, Carl

AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

Ran Xu, Yuchen Zhuang, Zihan Dong, Ruiyu Wang, Yue Yu, Joyce Ho, Linjun Zhang, Haoyu Wang, Wenqi Shi, Carl Yang

Advances in Neural Information Processing Systems 38 (NeurIPS 2025) Main Conference Track

Abstract

Search-augmented LLMs often struggle with complex reasoning tasks due to ineffective multi-hop retrieval and limited reasoning ability. We propose AceSearcher, a cooperative self-play framework that trains a single large language model (LLM) to alternate between two roles: a decomposer that breaks down complex queries and a solver that integrates retrieved contexts for answer generation. AceSearcher couples supervised fine-tuning on a diverse mixture of search, reasoning, and decomposition tasks with reinforcement fine-tuning optimized for final answer accuracy, eliminating the need for intermediate annotations. Extensive experiments on three reasoning-intensive tasks across 10 datasets show that AceSearcher outperforms state-of-the-art baselines, achieving an average exact match improvement of 7.6%. Remarkably, on document-level finance reasoning tasks, AceSearcher-32B matches the performance of the giant DeepSeek-V3 model using less than 5% of iits parameters. Even at smaller scales (1.5B and 8B), AceSearcher often surpasses existing search-augmented LLMs with up to 9× more parameters, highlighting its exceptional efficiency and effectiveness in tackling complex reasoning tasks.

Abstract

Name Change Policy