Zhenghai Xue

I am a 4th-year Ph.D. student at the College of Computing and Data Science, Nanyang Technological University, Singapre, supervised by Prof. Bo An. Previously, I obtained my B.Sc. of Artificial Intelligence from Nanjing University in 2022. In my undergraduate study, I worked with Prof. Yang Yu at LAMDA. I also interned at the MMLab of the Chinese University of Hong Kong with Prof. Bolei Zhou, Kuaishou Technology with Dr. Qingpeng Cai, Kunlun 2050 Research with Prof. Shuicheng Yan and TikTok with Dr. Qian Liu. I am now a research intern at Moonshot AI, working with Dr. Chenjun Xiao on Instruction Following in Coding Agents.

I study safe, robust, and generalizable decision-making algorithms and their applications in real-world problems, such as large language models, GUI navigation, video games, autonomous driving, robotics locomotion, and recommendation systems.

News

Jul 4, 2025	We release SimpleTIR, an end-to-end solution for stable multi-turn tool use RL training.
May 3, 2025	One paper is accepted at ICML 2025 as Spotlight Poster!
Jan 23, 2025	One paper is accepted at WWW 2025. Two papers are accepted to ICLR 2025.
Nov 8, 2023	I will give a talk on Optimizing Long-term User Engagement in the Applied Artificial Intelligence Workshop of DAI 2023!
Oct 19, 2023	Our paper “State Regularized Policy Optimization on Data with Dynamics Shift” is accepted by NeurIPS 2023!

Selected publications

Policy Optimization under Imperfect Human Interactions with Agent-Gated Shared Autonomy

Zhenghai Xue, Bo An, and Shuicheng Yan

In The Thirteenth International Conference on Learning Representations, 2025

HTML
Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning

Zhenghai Xue, Lang Feng, Jiacheng Xu, Kang Kang, Xiang Wen, Bo An, and Shuicheng Yan

In Forty-Second International Conference on Machine Learning (Spotlight), 2025

arXiv
AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems

Zhenghai Xue, Qingpeng Cai, Tianyou Zuo, Bin Yang, Lantao Hu, Peng Jiang, Kun Gai, and Bo An

In Proceedings of the ACM Web Conference (Oral), 2025

arXiv
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Zhenghai Xue*, Longtao Zheng*, Qian Liu, Yingru Li, Xiaosen Zheng, Zejun MA, and Bo An

In The Thirteenth International Conference on Learning Representations, 2026

arXiv HTML Code
Group-in-Group Policy Optimization for LLM Agent Training

Lang Feng, Zhenghai Xue, Tingcong Liu, and Bo An

Advances in Neural Information Processing Systems, 2025

arXiv Code
State Regularized Policy Optimization on Data with Dynamics Shift

Zhenghai Xue, Qingpeng Cai, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, and Bo An

Advances in Neural Information Processing Systems, 2023

arXiv HTML Code
Guarded Policy Optimization with Imperfect Online Demonstrations

Zhenghai Xue, Zhenghao Peng, Quanyi Li, Zhihan Liu, and Bolei Zhou

In The Eleventh International Conference on Learning Representations (Spotlight), 2023

arXiv HTML Code
Regret minimization experience replay in off-policy reinforcement learning

Xu-Hui Liu*, Zhenghai Xue*, Jingcheng Pang, Shengyi Jiang, Feng Xu, and Yang Yu

Advances in Neural Information Processing Systems, 2021

arXiv Code Video