南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 11 Adversarial Bandits - MAB, IW estimator, Exp3, lower bound, BCO, gradient estimator, self-concordant barrier

版像 NJUAT 南京大学 人工智能学院 SCHODL OF ARTIFICIAL INTELUGENCE,NANJING UNIVERSITY Lecture 11.Adversarial Bandits Advanced Optimization(Fall 2023) Peng Zhao zhaop@lamda.nju.edu.cn Nanjing University
Lecture 11. Adversarial Bandits Peng Zhao zhaop@lamda.nju.edu.cn Nanjing University Advanced Optimization (Fall 2023)

Outline 。Problem Setup ·Multi-Armed Bandits Bandit Convex Optimization ·Advanced Topics Advanced Optimization(Fall 2023) Lecture 11.Adversarial Bandits 2
Advanced Optimization (Fall 2023) Lecture 11. Adversarial Bandits 2 Outline • Problem Setup • Multi-Armed Bandits • Bandit Convex Optimization • Advanced Topics

Online Convex Optimization At each round t=1,2,... (1)the player first picks a model xt from aconvex set cRd (2)and environments pick an online convex functionf:->R; (3)the player suffers loss fi(xt),observes some information about fi and updates the model. Problem Domain Online Functions General OCO convex set t'cRd convex function fi() OCO with Strongly Convex Functions convex set tc Rd V2fi(x)=aI OCO with Exp-concave Functions convex set t'C Rd V2f(x)≥BVf(x)Vf(x)T Prediction with Experts'Advice △4={x∈R+∑1,=1 f(x)=(,x) Advanced Optimization(Fall 2023) Lecture 11.Adversarial Bandits 3
Advanced Optimization (Fall 2023) Lecture 11. Adversarial Bandits 3 Online Convex Optimization

OCO Algorithms learned so far Given first-order information oracle:worst-case bound Online Mirror Descent =arg minxex if()() where D(x,y)=(x)-(y)-(Vu(y),x-y)is the Bregman divergence. T ∑(x)-∑f画 t=1 t=1 +(m-nux)-a T t=1 Advanced Optimization(Fall 2023) Lecture 11.Adversarial Bandits 4
Advanced Optimization (Fall 2023) Lecture 11. Adversarial Bandits 4 Online Mirror Descent OCO Algorithms learned so far • Given first-order information oracle: worst-case bound

OCO Algorithms learned so far Given first-order information oracle:worst-case bound Algo. OMD/proximal form () nt RegretT OGD for x+=arg min ne(x,Vfe(x)) llxll O(VT) convex XEX OGD for strongly c. X1=argin,7fx》+Ix-x Ilxll 品 O(logT) x∈X ONS for exp-concave +1=arg min m(x Vf() Ixl O(4logT) Hedge for N In N PEA x:+1=arg min m (x,Vfi(x))+KL(xx) cilog xi m O(VTlog N) xE△N =1 Advanced Optimization(Fall 2023) Lecture 11.Adversarial Bandits 5
Advanced Optimization (Fall 2023) Lecture 11. Adversarial Bandits 5 OCO Algorithms learned so far • Given first-order information oracle: worst-case bound OGD for convex OGD for strongly c. ONS for exp-concave Hedge for PEA Algo. OMD/proximal form

OCO Algorithms learned so far Given first-order information oracle:problem-dedependent bound Optimistic Online Mirror Descent x=arg minxex m(M+Du(x) =arg minxex in (f(x),x)+Du(x) T T -w aN-E+(e,a刻-nu)-(aR心+n,K刘 T Advanced Optimization(Fall 2023) Lecture 11.Adversarial Bandits 6
Advanced Optimization (Fall 2023) Lecture 11. Adversarial Bandits 6 Optimistic Online Mirror Descent OCO Algorithms learned so far • Given first-order information oracle: problem-dedependent bound

OCO Algorithms learned so far Given first-order information oracle:problem-dedependent bound Assumption(s) Setting of Setting of nt Problem-dependent Optimism Regret Bound Small-loss L-Smooth+ Bound Non-negative M:=0 ≈鼎a (v1+F) Variance D 一 Bound M=t-1 V1+Var:-1 (1+Var Variation D L-Smooth Bound M:=Vf-1(xt-1) V1+V- O(1+) Advanced Optimization(Fall 2023) Lecture 11.Adversarial Bandits 7
Advanced Optimization (Fall 2023) Lecture 11. Adversarial Bandits 7 OCO Algorithms learned so far • Given first-order information oracle: problem-dedependent bound Assumption(s) Setting of Optimism Problem-dependent Regret Bound Small-loss Bound L-Smooth + Non-negative Variance Bound — Variation Bound L-Smooth

Online Convex Optimization At each round t=1,2,... (1)the player first picks a model x from aconvex set (2)and environments pick an online convex functionf:-R; (3)the player suffers loss fi(xt),observes some information about fr and updates the model. on the feedback information: full information partial information -full information:observe entire f(or at least gradient Vf(x)) 8B88 partial information (bandits):observe function value fi(xt)only less information horse racing multi-armed bandits Advanced Optimization(Fall 2023) Lecture 11.Adversarial Bandits 8
Advanced Optimization (Fall 2023) Lecture 11. Adversarial Bandits 8 Online Convex Optimization less information full information horse racing partial information multi-armed bandits on the feedback information:

Multi-Armed Bandit Trial 1 Trial 2 Trial3 Loss:0.3 Loss:0.2 Loss:0.5 Arms 日日 chosen arm unobserved Advanced Optimization(Fall 2023) Lecture 11.Adversarial Bandits 9
Advanced Optimization (Fall 2023) Lecture 11. Adversarial Bandits 9 Multi-Armed Bandit Trial 1 Trial 2 Trial 3 Loss: 0.3 * Loss: 0.2 * Loss: 0.5 * * * * * * * Arms : chosen arm : unobserved

Formulation At each round t=1,2,... (1)the player first picks an arm atE[K]from K candidate arms; (2)and simultaneously environments pick a loss vector eE[0,1]K; (3)the player suffers and only observes loss et.a,,then updates the model. on the difficulty of environments: ●adversarial setting -oblivious:{e are chosen before the game starts. -non-oblivious:(a,.1,1.)can depend on the past history. .stochastic settingD,where Dis a fixed unknown distribution. Advanced Optimization(Fall 2023) Lecture 11.Adversarial Bandits 10
Advanced Optimization (Fall 2023) Lecture 11. Adversarial Bandits 10 Formulation on the difficulty of environments:
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 10 Online Learning in Games - two-player zero-sum games, repeated play, minimax theorem, fast convergence.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 09 Optimistic Online Mirror Descent - optimistic online learning, predictable sequence, small-loss bound, gradient-variance bound, gradient-variation bound.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 08 Adaptive Online Convex Optimization - problem-dependent guarantee, small-loss bound, self-confident tuning, small-loss OCO, self-bounding property bound.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 07 Online Mirror Descent - OMD framework, regret analysis, primal-dual view, mirror map, FTRL, dual averaging.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 06 Prediction with Expert Advice - Hedge, minimax bound, lower bound; mirror descent(motivation and preliminary).pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 05 Online Convex Optimization - OGD, convex functions, strongly convex functions, online Newton step, exp-concave functions.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 04 GD Methods II - GD method, smooth optimization, Nesterov’s AGD, composite optimization.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 03 GD Methods I - GD method, Lipschitz optimization.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 02 Convex Optimization Basics; Function Properties.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 01 Introduction; Mathematical Background.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)11 图像特征分析.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)10 图像分割.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)09 形态学及其应用.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)08 压缩编码.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)07 频域滤波器.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)06 图像频域变换.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)05 代数运算与几何变换.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)04 图像复原及锐化.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)03 灰度直方图与点运算.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)02 二值图像与像素关系.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 12 Stochastic Bandits - MAB, UCB, linear bandits, self-normalized concentration, generalized linear bandits.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 13 Advanced Topics - non-stationary online learning, universal online learning, online ensemble, base algorithm, meta algorithm.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)课程简介 Combinatorics Introduction(主讲:尹一通).pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)基本计数 Basic enumeration.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)生成函数 Generating functions.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)筛法 Sieve methods.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)Cayley公式 Cayley's formula.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)Pólya计数法 Pólya's theory of counting.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)Ramsey理论 Ramsey theory.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)存在性问题 Existence problems.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)极值图论 Extremal graph theory.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)极值集合论 Extremal set theory.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)概率法 The probabilistic method.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)匹配论 Matching theory.pdf
- 南京大学:《高级机器学习》课程教学资源(课件讲稿)01 基础(主讲:詹德川).pdf
- 南京大学:《高级机器学习》课程教学资源(课件讲稿)02 典型方法.pdf
- 《互联网营销理论与工具运用》课程教学资源(教案)项目一 走进互联网营销.pdf
- 《互联网营销理论与工具运用》课程教学资源(教案)项目二 互联网营销市场调研.pdf
- 《互联网营销理论与工具运用》课程教学资源(教案)项目三 互联网搜索引擎营销.pdf
- 《互联网营销理论与工具运用》课程教学资源(教案)项目四 互联网电子邮件营销.pdf