Welcome to Hanning Zhang’s Personal Website

I am a second-year MSCS (thesis-track) student at the University of Illinois Urbana-Champaign (UIUC), advised by Professor Tong Zhang. Previously, I graduated from The Hong Kong University of Science and Technology (HKUST) in 2024, majoring in Computer Science. I also had the privilege to work as a research intern with Professor Heng Ji on the topic of LLM hallucination and alignment.

This webpage is last updated on 2025/12/01.

Research Interest

My research interests include Natural Language Processing (NLP) and Large Language Models (LLMs) with a focus on alignment and reasoning.

For the reasoning topic, I had worked on Process Reward Model[1][2][3], Zero-style DPO Training for Reasoning[4], Self-Rewarding for Reasoning[5], and Dynamic Sample Allocation Strategy in RL for Reasoning[6].

For the alignment topic, I had worked on Hallucination Mitigation via Refusal-Aware Tuning[7], Model Adaptive Merging and Ensembing for Generalization[8][9], Data Reweighting for LLM Training[10], and Reward Modeling for Open-ended Long-context Generation[11].

Recently, I have been working on Lean4 for Physics.

Open-Source Contribution

RLHF-Reward-Modeling https://github.com/RLHFlow/RLHF-Reward-Modeling 1.5K Stars

I am the main contributor to the math-rm project, where we train process-supervised reward (PRM) and outcome-supervised reward (ORM) using the next-token prediction. We open-source the data, code, hyper-parameter, and model for a robust recipe that is easy to reproduce. This is the first open-source recipe of (generative) process reward.

Selected Research Papers (* denotes equal contribution)

ScaleML-Prover: Advancing Automatic Theorem Proving for Physics
Hanning Zhang*, Ruida Wang*, Rui Pan, Wenyuan Wang, BingXu Meng, Tong Zhang
Under Review at ACL 2026
OpenGenAlign: A Preference Dataset for Trustworthy Reward Modeling in Open-Ended, Long-Context Generation
Hanning Zhang, Juntong Song, Juno Zhu, Yuanhao Wu, Tong Zhang, Cheng Niu
Under Review at ACL 2026
R-Tuning: Teaching Large Language Models to Refuse Unknown Questions
Hanning Zhang*, Shizhe Diao*, Yong Lin*, Yi R. Fung, Qing Lian, Xingyao Wang, Yangyi Chen, Heng Ji, Tong Zhang. (* denotes equal contribution)
NAACL-2024 (Oral)
Outstanding Paper Award, 6/2434 = 0.25%
Entropy-Regularized Process Reward Model
Hanning Zhang*, Pengcheng Wang*, Shizhe Diao, Yong Lin, Rui Pan, Hanze Dong, Dylan Zhang, Pavlo Molchanov, Tong Zhang. (* denotes equal contribution)
Transaction of Machine Learning (TMLR)
Self-rewarding Correction for Mathematical Reasoning
Wei Xiong*, Hanning Zhang*, Chenlu Ye*, Lichang Chen, Nan Jiang, Tong Zhang.
Under Review
Online-DPO-R1: Unlocking Effective Reasoning Without the PPO Overhead
Hanning Zhang, Jiarui Yao, Chenlu Ye, Wei Xiong, Tong Zhang.
Notion Blog
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao*, Yifan Hao*, Hanning Zhang, Hanze Dong, Wei Xiong, Nan Jiang, Tong Zhang.
NeurIPS-2025
ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
Rui Pan*, Dylan Zhang*, Hanning Zhang*, Xingyuan Pan*, Minrui Xu, Jipeng Zhang, Renjie Pi, Xiaoyu Wang, Tong Zhang.
ACL-2025
Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
Yifan Hao*, Xingyuan Pan*, Hanning Zhang*, Chenlu Ye, Rui Pan, Tong Zhang.
ICML-2025
Mitigating the Alignment Tax of RLHF
Yong Lin*, Hangyu Lin*, Wei Xiong*, Shizhe Diao*, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan Yao, and Tong Zhang
EMNLP-2024 (Main)
Towards Better Generalization via Distributional Input Projection Network
Yifan Hao*, Yanxin Lu*, Hanning Zhang, Xinwei Shen, Tong Zhang.
Under Review at ICLR 2026

Internship

Amazon, Rufus Team May 2025 - Aug 2025
Applied Scientist Intern Palo Alto, CA

Awards

Outstanding Paper Award at NAACL 2024 (6/2434 = 0.25%)

Siebel Scholar, Class of 2026 ($35,000 scholarship, 1 of 76 students around the world)

Education

University of Illinois Urbana-Champaign (2024-2026)
Master of Science in Computer Science
The Hong Kong University of Science and Technology (2020-2024)
Bachelor of Science in Computer Science
University of Illinois Urbana-Champaign (2023)
Exchange Program in Computer Science