High-Dimensional Inference for Weak-Supervision with Feature-Dependent Label Noise

  • A+

:王兆军(南开大学)
:2024-05-16 16:00
:海韵园实验楼105报告厅

报告人:王兆军(南开大学)

 间:202451616:00

 点:海韵园实验楼105报告厅

内容摘要:

This paper is concerned about a typical type of weak-supervision, the label noise problem. A common setting for classification with label noise assumes that the noise level is independent of feature and known. We consider the setting where a validation dataset with correct labels is available at learning time in addition to a large dataset with label noise. We argue that the classification with possibly feature-dependent noise in weakly-supervised settings can naturally be solved by a general logistic regression. The rate-optimal estimators are obtained via maximizing a penalized joint likelihood function. A sample-splitting-based method is further proposed for constructing confidence intervals for individual components of the regression vector, which enables us to identify label-noise-related features with error rate control. The superiority of our method is demonstrated through asymptotic properties as well as numerical experiments. A real example is also presented to illustrate how to use the proposed method in practice. 

人简介

王兆军,南开大学统计与数据科学学院执行院长/教授,国务院学位委员会统计学科评议组成员,全国统计教材编审委员会委员; 中国工业与应用数学学会副理事长,中国统计教育学会副会长,中国工业统计教学研究会副会长,中国概率统计学会副理事长。主要研究方向为统计质量控制、变点、高维数据统计推断。共发表SCI论文70余篇,包括国际统计著名杂志Annals of Statistics, JASA, Biometrika, Technometrics等。曾任国家统计专家咨询委员会委员、中国现场统计研究会副理事长、天津市现场统计研究会理事长,天津工业与应用数学学会理事长,曾获国务院政府特贴、全国百篇优博指导教师、教育部自然科学二等奖及天津市自然科学一等奖。

 

联系人:周达