DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Huajian Xin; Z. Ren; Jun-Mei Song; Zhihong Shao; Wanjia Zhao; Haocheng Wang; Bo Liu (Benjamin Liu); Liyue Zhang; Xuan Lu; Qiushi Du; W. Gao; Qihao Zhu; Dejian Yang; Zhibin Gou; Z. F. Wu; Fuli Luo; C. Ruan

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Huajian Xin ,
Z. Ren ,
Jun-Mei Song ,
Zhihong Shao ,
Wanjia Zhao ,
Haocheng Wang ,
Bo Liu (Benjamin Liu) ,
Liyue Zhang ,
Xuan Lu ,
Qiushi Du ,
W. Gao ,
Qihao Zhu ,
Dejian Yang ,
Zhibin Gou ,
Z. F. Wu ,
Fuli Luo ,
C. Ruan

ICLR 2025 | August 2024

We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-V1. Further refinement is achieved through reinforcement learning from proof assistant feedback (RLPAF). Beyond the single-pass whole-proof generation approach of DeepSeek-Prover-V1, we propose RMaxTS, a variant of Monte-Carlo tree search that employs an intrinsic-reward-driven exploration strategy to generate diverse proof paths. DeepSeek-Prover-V1.5 demonstrates significant improvements over DeepSeek-Prover-V1, achieving new state-of-the-art results on the test set of the high school level miniF2F benchmark ($63.5\%$) and the undergraduate level ProofNet benchmark ($25.3\%$).