中国高校课件下载中心 》 教学资源 》 大学文库

南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 03 GD Methods I - GD method, Lipschitz optimization

文档信息
资源类别:文库
文档格式:PDF
文档页数:46
文件大小:11.14MB
团购合买:点击进入团购
内容简介
• Gradient Descent • Convex and Lipschitz • Polyak Step Size • Convergence without Optimal Value • Optimal Time-Varying Step Sizes • Strongly Convex and Lipschitz
刷新页面文档预览

吸鲁 NJUAT 南京大学 人工智能学院 SCHODL OF ARTIFICIAL INTELUGENCE,NANJING UNIVERSITY Lecture 3.Gradient Descent Method Advanced Optimization(Fall 2023) Peng Zhao zhaop@lamda.nju.edu.cn Nanjing University

Lecture 3. Gradient Descent Method Peng Zhao zhaop@lamda.nju.edu.cn Nanjing University Advanced Optimization (Fall 2023)

Outline ·Gradient Descent ·Convex and Lipschitz ·Polyak Step Size Convergence without Optimal Value Optimal Time-Varying Step Sizes Strongly Convex and Lipschitz Advanced Optimization(Fall 2023) Lecture 3.Gradient Descent Method 2

Advanced Optimization (Fall 2023) Lecture 3. Gradient Descent Method 2 Outline • Gradient Descent • Convex and Lipschitz • Polyak Step Size • Convergence without Optimal Value • Optimal Time-Varying Step Sizes • Strongly Convex and Lipschitz

Part 1.Gradient Descent Convex Optimization Problem Gradient Descent ·Performance Measure The First Gradient Descent Lemma Advanced Optimization(Fall 2023) Lecture 3.Gradient Descent Method 3

Advanced Optimization (Fall 2023) Lecture 3. Gradient Descent Method 3 Part 1. Gradient Descent • Convex Optimization Problem • Gradient Descent • Performance Measure • The First Gradient Descent Lemma

Convex Optimization Problem We adopt a minimization language min f(x) s.t.x∈X optimization variable x E Rd objective function f:RaR:convex and continuously differentiable feasible domain tC Rd:convex Advanced Optimization(Fall 2023) Lecture 3.Gradient Descent Method 4

Advanced Optimization (Fall 2023) Lecture 3. Gradient Descent Method 4 Convex Optimization Problem • We adopt a minimization language

Goal To output a sequence {x}such that x approximates x*when t goes larger. ●Function-value level:f(xr)-f(x*)≤e(T) .Optimizer--value level::xr-x*l‖l≤e(T) where can be statistics of the original sequence {x and s(T)is the approximation error and is a function of iterations T. Advanced Optimization(Fall 2023) Lecture 3.Gradient Descent Method 5

Advanced Optimization (Fall 2023) Lecture 3. Gradient Descent Method 5 Goal

Goal In general,there are two performance measures (essentially same) Convergence:f(xr)-f(x*)<e(T), Qualitatively::e(T)→0 when T→o Quantitatively:O(元)/o()/O()/o()/.… Complexity: Definition:number of iterations required to achieve f(r)-f(x*)<s. Quantitatively:()/()/()/0(In())/... corresponds to(元)/O()/O(是)/O()/ Advanced Optimization(Fall 2023) Lecture 3.Gradient Descent Method 6

Advanced Optimization (Fall 2023) Lecture 3. Gradient Descent Method 6 • In general, there are two performance measures (essentially same). Goal

Gradient Descent ·GD Template: xt+1=Πxxt-EVf(xt)] -x1 can be an arbitrary point inside the domain. -n>0 is the potentially time-varying step size (or called learning rate). -ProjectionⅡxy]=arg minx∈x‖x-y ensures the feasibility. Advanced Optimization(Fall 2023) Lecture 3.Gradient Descent Method 7

Advanced Optimization (Fall 2023) Lecture 3. Gradient Descent Method 7 • GD Template: Gradient Descent

Why Gradient Descent? Let's simply focus on the unconstrained setting. Idea:surrogate optimization We aim to find a sequence of local upper bounds U1,...,Ur,where the surrogate function U::RdR may depend on xt such that ()f(x)=U(xt); (i)f(x)≤U(x)holds for all x∈Ra, (iii)U(x)should be simple enough to minimize. Then,our proposed algorithm would bex+1=arg minx U:(x) Advanced Optimization(Fall 2023) Lecture 3.Gradient Descent Method 8

Advanced Optimization (Fall 2023) Lecture 3. Gradient Descent Method 8 Why Gradient Descent? Let’s simply focus on the unconstrained setting. • Idea: surrogate optimization

Why Gradient Descent? Following the "surrogate optimization"principle,let's invent GD for convex and smooth functions. Proposition 1.Suppose that f is convex and differentiable.Moreover,suppose that f is L-smooth with respect to l2-norm.Define the surrogate U:RRas 凶f)+fxx-x+51x-xg Then,we have (①)f(xt)=U(xt); ()f(x)≤U(x)holds for all x∈R, (iii)xt+1=arg minx Ui(x)is equivalent to x+=x-Vf(x). Advanced Optimization(Fall 2023) Lecture 3.Gradient Descent Method 9

Advanced Optimization (Fall 2023) Lecture 3. Gradient Descent Method 9 Why Gradient Descent? • Following the “surrogate optimization” principle, let’s invent GD for convex and smooth functions

Gradient Descent ·GD Template: xt+1=Πx[xt-:Vf(xt)] -xI can be an arbitrary point inside the domain. nt>0 is the potentially time-varying step size(or called learning rate). -Projection IIy]=arg minxex-yll ensures the feasibility. This lecture will focus on GD analysis for Lipschitz functions, and next lecture will discuss smooth functions. Advanced Optimization(Fall 2023) Lecture 3.Gradient Descent Method 10

Advanced Optimization (Fall 2023) Lecture 3. Gradient Descent Method 10 • GD Template: Gradient Descent This lecture will focus on GD analysis for Lipschitz functions, and next lecture will discuss smooth functions

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档