Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Cheng, Jorge (Zhoujun); Hao, Shibo; Liu, Tianyang; Zhou, Fan; Xie, Yutao; Yao, Feng; Bian, Yuexin; Dey, Nilabjo; Zhuang, Yonghao; Zha, Yuheng; Gu, Yi; Zhou, Kun; Wang, Yuqi; Li, Yuan; Fan, Richard; She, Jianshu; Gao, Chengqian; Saparov, Abulhair; Killian, Taylor W.; Li, Haonan; Yurochkin, Mikhail; Xing, Eric; Liu, Zhengzhong; Hu, Zhiting

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu, Fan Zhou, Yutao Xie, Feng Yao, Yuexin Bian, Nilabjo Dey, Yonghao Zhuang, Yuheng Zha, Yi Gu, Kun Zhou, Yuqi Wang, Yuan Li, Richard Fan, Jianshu She, Chengqian Gao, Abulhair Saparov, Taylor W. Killian, Haonan Li, Mikhail Yurochkin, Eric P Xing, Zhengzhong Liu, Zhiting Hu

Advances in Neural Information Processing Systems 38 (NeurIPS 2025) Datasets and Benchmarks Track

Bibtex Paper Supplemental

Abstract

Reinforcement learning (RL) has shown promise in enhancing large language model (LLM) reasoning, yet progress towards broader capabilities is limited by the availability of high-quality, multi-domain datasets. This work introduces \ours, a 92K RL-for-reasoning dataset designed to address this gap, covering six reasoning domains: Math, Code, Science, Logic, Simulation, and Tabular, each with corresponding verifiers. We build \ours via a careful data-curation pipeline, including sourcing, deduplication, reward design, and domain-specific and difficulty-based filtering, to facilitate the systematic investigation of cross-domain RL generalization. Our study using \ours suggests the efficacy of a simple mixed-domain RL training approach and reveals several key aspects affecting cross-domain transferability. We further train two models {\ours}-7B and {\ours}-32B purely with RL on our curated data and observe largely improved performance over leading open RL reasoning model baselines, with gains of 7.3\% and 7.8\% respectively on an extensive 17-task, six-domain evaluation suite. We are releasing our dataset, code, and evaluation suite to the community, aiming to support further research and development of more general RL-enhanced reasoning models.

Abstract

Name Change Policy