Statistical Inference for Distributed Contextual Multi-armed Bandit
- A+
:刘永朝(大连理工大学)
:2025-08-28 09:30
:海韵园实验楼S106
报告人:刘永朝(大连理工大学)
时 间:2025年8月28日9:30
地 点:海韵园实验楼S106
内容摘要:
In this talk, we study the online statistical inference of distributed contextual multi-armed bandit problems, where the agents collaboratively learn an optimal policy by exchanging their local estimates of the global parameters with neighbors over a communication network. We propose a distributed online decision making algorithm, which balances the exploration and exploitation dilemma via the $\varepsilon$-greedy policy and updates the policy online by incorporating the distributed stochastic gradient descent algorithm. We establish the pivotal limiting distribution for the estimator of reward model parameter as a stochastic process and then employ the random scaling method to construct its asymptotic confidence interval. We also establish the asymptotic normality of the online inverse probability weighted value estimator and construct an asymptotic confidence interval of the value by plug-in method. The proposed algorithm and theoretical results are tested by simulations and a real data application to a warfarin drug dosing problem.
个人简介:
刘永朝,大连理工大学数学科学学院教授、博士生导师。2005年和2008年于大连海事大学数学系获得学士和硕士学位,2011年于大连理工大学数学科学学院获得博士学位,2014年11月至2016年4月在南安普顿大学从事博士后研究。刘永朝主要研究方向为随机最优化,发表学术论文三十余篇,部分论文发表于Mathematical Programming, SIAM Journal on Optimization, Mathematics of Operations Research,SIAM Journal on Numerical Analysis期刊。
联系人:黄文
