Heavy-tailed Information-Theoretic Generalization Bounds with Applications to LLM Safety Alignment
- A+
:张慧铭(北京航空航天大学)
:2026-01-11 16:30
:海韵园实验楼S106
报告人:张慧铭(北京航空航天大学)
时 间:2026年1月11日16:30
地 点:海韵园实验楼S106
内容摘要:
Classical information-theoretic generalization bounds, which link generalization error to the mutual information between the algorithm's input and output, typically rely on sub-Gaussian assumptions or finite moment generating functions(MGFs). However, these assumptions are often violated in heavy-tailed scenarios, such as adversarial training, reinforcement learning with rare high-reward events, and financial modeling. In this work, we bridge this gap by establishing a comprehensive framework for generalization under heavy-tailed sub-Weibull regimes. We demonstrate that standard K-L divergence bounds are vacuous in these settings due to the unboundedness of extreme events. To overcome this, we introduce a novel decorrelation lemma based on Rényi divergence and a generalized Young-type inequality, which circumvents the need for MGFs. By combining these tools with a refined chaining technique on the space of measures, we derive Dudley-type generalization bounds that explicitly depend on the tail parameter and the Rényi information. Additionally, we establish new maximal inequalities and information-theoretic generalization bounds under sub-Weibullity of loss of data in machine learning. The work also explores the application of these results to large language models (LLM): providing tail-adaptive reward guarantees for Reinforcement Learning from Human Feedback in LLM alignment (mitigating catastrophic Goodhart effects where KL-regularization fails).
个人简介:
张慧铭,北航人工智能研究院副教授(准聘)、硕士生导师;北航数学科学学院兼职博导。曾在澳门大学担任濠江学者博士后研究员(2020-2022);曾就读于北京大学(2016-2020)获得统计学博士学位。研究方向为稳健机器学习, AI统计理论(泛化误差、非渐近\小样本理论)、高维概率统计、函数型数据、子抽样估计、莱维过程等。发表SCI论文30篇(包括AI与自动化领域顶刊JMLR, IEEE-TAC;统计顶刊JASA, Biometrika、精算顶刊IME;Nature子刊Scientific Reports),谷歌学术引用次数超过900次。曾担任过美国《数学评论》评论员;概率统计、AI与机器学习领域顶刊(AOS, AOAP, JASA, JMLR, IEEE-TSP)的审稿人。
联系人:陈俊彤
