Welcome to Hanning Zhang’s Personal Website
I am a second-year MSCS (thesis-track) student at the University of Illinois Urbana-Champaign (UIUC), advised by Professor Tong Zhang. Previously, I graduated from The Hong Kong University of Science and Technology (HKUST) in 2024, majoring in Computer Science. I also had the privilege to work as a research intern with Professor Heng Ji on the topic of LLM hallucination and alignment.
This webpage is last updated on 2025/12/01.
Research Interest
My research interests include Natural Language Processing (NLP) and Large Language Models (LLMs) with a focus on alignment and reasoning.
For the reasoning topic, I had worked on Process Reward Model[1][2][3], Zero-style DPO Training for Reasoning[4], Self-Rewarding for Reasoning[5], and Dynamic Sample Allocation Strategy in RL for Reasoning[6].
For the alignment topic, I had worked on Hallucination Mitigation via Refusal-Aware Tuning[7], Model Adaptive Merging and Ensembing for Generalization[8][9], Data Reweighting for LLM Training[10], and Reward Modeling for Open-ended Long-context Generation[11].
Recently, I have been working on Lean4 for Physics.
Open-Source Contribution
RLHF-Reward-Modeling
https://github.com/RLHFlow/RLHF-Reward-Modeling 1.5K Stars
I am the main contributor to the math-rm project, where we train process-supervised reward (PRM) and outcome-supervised reward (ORM) using the next-token prediction. We open-source the data, code, hyper-parameter, and model for a robust recipe that is easy to reproduce. This is the first open-source recipe of (generative) process reward.
Selected Research Papers (* denotes equal contribution)
ScaleML-Prover: Advancing Automatic Theorem Proving for Physics
Hanning Zhang*, Ruida Wang*, Rui Pan*, Ke Lin, Wenyuan Wang, Qingyun Wang, Tong Zhang
Under Review at ACL 2026OpenGenAlign: A Preference Dataset for Trustworthy Reward Modeling in Open-Ended, Long-Context Generation
Hanning Zhang, Juntong Song, Juno Zhu, Yuanhao Wu, Tong Zhang, Cheng Niu
Under Review at ACL 2026R-Tuning: Teaching Large Language Models to Refuse Unknown Questions
Hanning Zhang*, Shizhe Diao*, Yong Lin*, Yi R. Fung, Qing Lian, Xingyao Wang, Yangyi Chen, Heng Ji, Tong Zhang. (* denotes equal contribution)
NAACL-2024 (Oral)
Outstanding Paper Award, 6/2434 = 0.25%Entropy-Regularized Process Reward Model
Hanning Zhang*, Pengcheng Wang*, Shizhe Diao, Yong Lin, Rui Pan, Hanze Dong, Dylan Zhang, Pavlo Molchanov, Tong Zhang. (* denotes equal contribution)
Transaction of Machine Learning (TMLR)Self-rewarding Correction for Mathematical Reasoning
Wei Xiong*, Hanning Zhang*, Chenlu Ye*, Lichang Chen, Nan Jiang, Tong Zhang.
Under ReviewOnline-DPO-R1: Unlocking Effective Reasoning Without the PPO Overhead
Hanning Zhang, Jiarui Yao, Chenlu Ye, Wei Xiong, Tong Zhang.
Notion BlogOptimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao*, Yifan Hao*, Hanning Zhang, Hanze Dong, Wei Xiong, Nan Jiang, Tong Zhang.
NeurIPS-2025ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
Rui Pan*, Dylan Zhang*, Hanning Zhang*, Xingyuan Pan*, Minrui Xu, Jipeng Zhang, Renjie Pi, Xiaoyu Wang, Tong Zhang.
ACL-2025Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
Yifan Hao*, Xingyuan Pan*, Hanning Zhang*, Chenlu Ye, Rui Pan, Tong Zhang.
ICML-2025Mitigating the Alignment Tax of RLHF
Yong Lin*, Hangyu Lin*, Wei Xiong*, Shizhe Diao*, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan Yao, and Tong Zhang
EMNLP-2024 (Main)Towards Better Generalization via Distributional Input Projection Network
Yifan Hao*, Yanxin Lu*, Hanning Zhang, Xinwei Shen, Tong Zhang.
Under Review at ICLR 2026
Internship
Applied Scientist Intern Palo Alto, CA
Awards
Outstanding Paper Award at NAACL 2024 (6/2434 = 0.25%)
Siebel Scholar, Class of 2026 ($35,000 scholarship, 1 of 76 students around the world)
Education
University of Illinois Urbana-Champaign (2024-2026)
Master of Science in Computer ScienceThe Hong Kong University of Science and Technology (2020-2024)
Bachelor of Science in Computer ScienceUniversity of Illinois Urbana-Champaign (2023)
Exchange Program in Computer Science
