南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 12 Stochastic Bandits - MAB, UCB, linear bandits, self-normalized concentration, generalized linear bandits

NJUAT 南京大学 人工智能学院 SCHOOL OF ARTFICIAL INTELUGENCE,NANJING UNFVERSITY Lecture 12.stochastic bandits Advanced Optimization(Fall 2023) Peng Zhao zhaop@lamda.nju.edu.cn Nanjing University
Lecture 12. Stochastic Bandits Peng Zhao zhaop@lamda.nju.edu.cn Nanjing University Advanced Optimization (Fall 2023)

Outline ·Multi-Armed Bandits Explore-Then-Exploit Upper Confidence Bound ·Linear Bandits ·LinUCB Algorithm Generalized Linear Bandits ·Advanced Topics Advanced Optimization(Fall 2023) Lecture 12.Stochastic Bandits 2
Advanced Optimization (Fall 2023) Lecture 12. Stochastic Bandits 2 Outline • Multi-Armed Bandits • Explore-Then-Exploit • Upper Confidence Bound • Linear Bandits • LinUCB Algorithm • Generalized Linear Bandits • Advanced Topics

Stochastic Multi-Armed Bandit (MAB) MAB:A player is facing K arms.At each time t,the player pulls one arm a∈[K]and then receives a reward rt(a)∈[O,l: Arm1 1(1) r2(1)】 0.6 r4(1) T5(1) Arm2 1 r2(2) r3(2) 0.2 r5(2) Arm3 r1(3) 0.7 r3(3) r4(3) 0.3 ●Stochastic: Each arm aEK]has an unknown distribution Da with mean u(a), such that rewards ri(a),r2(a),...,rT(a)are i.i.d samples from Da. Advanced Optimization(Fall 2023) Lecture 12.Stochastic Bandits 3
Advanced Optimization (Fall 2023) Lecture 12. Stochastic Bandits 3 Stochastic Multi-Armed Bandit (MAB) Arm 1 Arm 2 Arm 3 0.6 0.7 0.3 1 0.2

Stochastic MAB:Formulation At each round t=l,2,·… (1)the player first chooses an arm at E[K]; (2)and then environment reveals a reward rt(at)e[0,1]; (3)the player updates the model by the pair (at,rt(at)). ·The goal is to minimize the(pseudo小-regret: E[Regretr a)-rdan)) where a*=arg maxeK](a)is the best arm in the sense of expectation. Advanced Optimization(Fall 2023) Lecture 12.Stochastic Bandits 4
Advanced Optimization (Fall 2023) Lecture 12. Stochastic Bandits 4 Stochastic MAB: Formulation • The goal is to minimize the (pseudo)-regret:

Deploying Exp3 to Stochastic MAB Stochastic MAB is a special case of Adversarial MAB Directly deploying Exp3 for stochastic MAB achieves Theorem1.Suppose that∀t∈[T]andi∈[K],0≤Lt()≤l,then Exp3with learning raten=(In K)/(TK)guarantees EfRegretr=E) where the expectation is taken on the randomness of the algorithm. Not yet exploit benign stochastic assumption....instance-dependent analysis Advanced Optimization(Fall 2023) Lecture 12.Stochastic Bandits 5
Advanced Optimization (Fall 2023) Lecture 12. Stochastic Bandits 5 • Stochastic MAB is a special case of Adversarial MAB Deploying Exp3 to Stochastic MAB Directly deploying Exp3 for stochastic MAB achieves Not yet exploit benign stochastic assumption…. instance-dependent analysis

Regret Decomposition For stochastic MAB,a natural characterization of the arms: (i)Suboptimality gap:Aa=u(a*)-u(a); (ii)Number of times arm a is pulled int rounds:n(a)=>1=a). Regret can be reformulated as no-立=oi-a T E[Regretr)max E =】 ∑(u(a*)-μ(at)nr(a)=∑△anr(a aE[K] a∈[K] Advanced Optimization(Fall 2023) Lecture 12.Stochastic Bandits 6
Advanced Optimization (Fall 2023) Lecture 12. Stochastic Bandits 6 Regret Decomposition • For stochastic MAB, a natural characterization of the arms: • Regret can be reformulated as

A Natural Solution Explore-then-Exploit(ETE): (1)Do explore for the first To round by pulling each arm for To/K times; (2)Do exploit for the rest T-To round by always pulling a=arg maxael]r(a). Theorem1.Suppose that∀t∈[T]anda∈[K],0≤rt(a)≤l,then ETE with exploration period To guarantees eets(段+Tep()a a∈[K1 Advanced Optimization(Fall 2023) Lecture 12.Stochastic Bandits 7
Advanced Optimization (Fall 2023) Lecture 12. Stochastic Bandits 7 A Natural Solution • Explore-then-Exploit (ETE):

Proof of ETE Regret Bound Proof. E[Regretr]=∑△anr(a) aE[K] Exploration Exploitation nT(a)=To/K+(T-To)Pra=a} ≤T/K+(T-To)Pr{m(a)≥(a*)} orm.(a)≤u@ta≤m.(a) 2 snK+r-xwr{ao≥@专aua.o)≤@2"a} 2 snK+T-万(r{.o之uoa}+r{≤专a》 Union bound Pr{XUY}<Pr{X}+Pr{Y} Advanced Optimization(Fall 2023) Lecture 12.Stochastic Bandits 8
Advanced Optimization (Fall 2023) Lecture 12. Stochastic Bandits 8 Proof of ETE Regret Bound Proof.Exploration Exploitation

Proof of ETE Regret Bound Pof间s五/x+T-(r{ma之ota}+pr{aa)sata}》 Hoeffding's inequality.forX&∈[0,l,i∈m,X-a∑1X,we have Pr{x-E[X]≥e}≤exp(-2me2): Pr{X-E[X]≤-c}≤exp(-2me2). →n{a≥o2a}-nmo≥n0+As 2T△8 K △a=4(a*)-(a) 今g4@s(爱+7m(x)》 a∈[K Advanced Optimization(Fall 2023) Lecture 12.Stochastic Bandits 9
Advanced Optimization (Fall 2023) Lecture 12. Stochastic Bandits 9 Proof of ETE Regret Bound Proof

Issue of ETE Theorem1.Suppose that∀t∈[T]anda∈[K],0≤rt(a)≤l,then ETE with explore period To guarantees R≤(假+Tem()a a∈[K ·Need to tune To Tune To with prior of suboptimality gap A:ERegret]=(T) Tune To without prior of suboptimality gap A:E[Regret=O(T2/3) >Solution:do explore and exploit adaptively Advanced Optimization(Fall 2023) Lecture 12.Stochastic Bandits 10
Advanced Optimization (Fall 2023) Lecture 12. Stochastic Bandits 10 Issue of ETE • Need to tune Solution: do explore and exploit adaptively
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 11 Adversarial Bandits - MAB, IW estimator, Exp3, lower bound, BCO, gradient estimator, self-concordant barrier.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 10 Online Learning in Games - two-player zero-sum games, repeated play, minimax theorem, fast convergence.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 09 Optimistic Online Mirror Descent - optimistic online learning, predictable sequence, small-loss bound, gradient-variance bound, gradient-variation bound.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 08 Adaptive Online Convex Optimization - problem-dependent guarantee, small-loss bound, self-confident tuning, small-loss OCO, self-bounding property bound.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 07 Online Mirror Descent - OMD framework, regret analysis, primal-dual view, mirror map, FTRL, dual averaging.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 06 Prediction with Expert Advice - Hedge, minimax bound, lower bound; mirror descent(motivation and preliminary).pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 05 Online Convex Optimization - OGD, convex functions, strongly convex functions, online Newton step, exp-concave functions.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 04 GD Methods II - GD method, smooth optimization, Nesterov’s AGD, composite optimization.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 03 GD Methods I - GD method, Lipschitz optimization.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 02 Convex Optimization Basics; Function Properties.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 01 Introduction; Mathematical Background.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)11 图像特征分析.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)10 图像分割.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)09 形态学及其应用.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)08 压缩编码.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)07 频域滤波器.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)06 图像频域变换.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)05 代数运算与几何变换.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)04 图像复原及锐化.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)03 灰度直方图与点运算.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 13 Advanced Topics - non-stationary online learning, universal online learning, online ensemble, base algorithm, meta algorithm.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)课程简介 Combinatorics Introduction(主讲:尹一通).pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)基本计数 Basic enumeration.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)生成函数 Generating functions.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)筛法 Sieve methods.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)Cayley公式 Cayley's formula.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)Pólya计数法 Pólya's theory of counting.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)Ramsey理论 Ramsey theory.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)存在性问题 Existence problems.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)极值图论 Extremal graph theory.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)极值集合论 Extremal set theory.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)概率法 The probabilistic method.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)匹配论 Matching theory.pdf
- 南京大学:《高级机器学习》课程教学资源(课件讲稿)01 基础(主讲:詹德川).pdf
- 南京大学:《高级机器学习》课程教学资源(课件讲稿)02 典型方法.pdf
- 《互联网营销理论与工具运用》课程教学资源(教案)项目一 走进互联网营销.pdf
- 《互联网营销理论与工具运用》课程教学资源(教案)项目二 互联网营销市场调研.pdf
- 《互联网营销理论与工具运用》课程教学资源(教案)项目三 互联网搜索引擎营销.pdf
- 《互联网营销理论与工具运用》课程教学资源(教案)项目四 互联网电子邮件营销.pdf
- 《互联网营销理论与工具运用》课程教学资源(教案)项目五 互联网社交媒体营销.pdf