南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 01 Introduction; Mathematical Background

版套是 NJUAT 南京大学 人工智能学院 SCHODL OF ARTIFICIAL INTELUGENCE,NANJING UNIVERSITY Lecture 1.Mathematical Background Advanced Optimization(Fall 2023) Peng Zhao zhaop@lamda.nju.edu.cn Nanjing University
Lecture 1. Mathematical Background Peng Zhao zhaop@lamda.nju.edu.cn Nanjing University Advanced Optimization (Fall 2023)

Outline 。Calculus ·Linear Algebra Probability Statistics ·Information Theory Optimization in Machine Learning Advanced Optimization (Fall 2023) Lecture 1.Mathematical Background 2
Advanced Optimization (Fall 2023) Lecture 1. Mathematical Background 2 Outline • Calculus • Linear Algebra • Probability & Statistics • Information Theory • Optimization in Machine Learning

Notational Convention -[m={1,..,n} -x,y,v:vectors A.B:matrices -,K:domain d,m,n:dimensions -I:identity matrix X,Y:random variables p,q:probability distributions Advanced Optimization (Fall 2023) Lecture 1.Mathematical Background 3
Advanced Optimization (Fall 2023) Lecture 1. Mathematical Background 3 Notational Convention

Function ·Function mapping f:domf R"→yC Rm Definition 1(Continuous Function).A function f:R"->Rm is continuous at x E dom f if for all e>0 there exists a >0 with y E dom f,such that Ily-x2≤6→If(y)-f(x)2≤e. Advanced Optimization (Fall 2023) Lecture 1.Mathematical Background 4
Advanced Optimization (Fall 2023) Lecture 1. Mathematical Background 4 Function • Function mapping

Part 1.Calculus Gradient and Derivatives ·essian ·Chain Rule Advanced Optimization (Fall 2023) Lecture 1.Mathematical Background 5
Advanced Optimization (Fall 2023) Lecture 1. Mathematical Background 5 Part 1. Calculus • Gradient and Derivatives • Hessian • Chain Rule

Gradient and Derivatives(First Order) The gradient and derivative of a scalar function(f:RR)is the same. The derivative of vector functions (f C RdR)is the transpose of its gradient. we focus on the "gradient"language (i.e.,column vector) Definition 2(Gradient).Let f:C Rd-R be a differentiable function.Let x=[1,...,d].Then,the gradient of f at x is a vector in Rd denoted by Vf(x)and defined by Vf(x)= ... X Advanced Optimization (Fall 2023) Lecture 1.Mathematical Background 6
Advanced Optimization (Fall 2023) Lecture 1. Mathematical Background 6 Gradient and Derivatives (First Order) we focus on the “gradient” language (i.e., column vector)

Example Example1.The gradient off(x)=x≌∑1x号is [2x1 Vf(x)= =2x. 2xd Example 2.The gradient of f(x)=->Ini is 「-(lnx1+1) Vf(x)- -(Inza+1) Advanced Optimization (Fall 2023) Lecture 1.Mathematical Background 7
Advanced Optimization (Fall 2023) Lecture 1. Mathematical Background 7 Example

Hessian(Second Order) Definition 3(Hessian).Let f:CR->R be a twice differentiable function. Let x =[1,...,d]Then,the Hessian of f at x is the matrix in Rdxd denoted by V2f(x)and defined by Example 3.The Hessian of f(x)=-;n;is V2f(x)=diag(.). Example 4.The Hessian of f(x)=x-3x1+1is V2f(x)= 6a1x号 6ax片x2-6x2 6x2x2-9x号 2x3-18:x1x2 Advanced Optimization (Fall 2023) Lecture 1.Mathematical Background 8
Advanced Optimization (Fall 2023) Lecture 1. Mathematical Background 8 Hessian (Second Order)

Chain rule Consider scalar functions for simplicity. Chain Rule.For h(x)=f(g(x)), the gradient of h(x)is h'(x)=f'(g(x))g'(x). - the Hessian of h(x)is h"(x)=f"(g(x))(g'(x))2+f'(g(x))g"(x). Advanced Optimization (Fall 2023) Lecture 1.Mathematical Background 9
Advanced Optimization (Fall 2023) Lecture 1. Mathematical Background 9 • Consider scalar functions for simplicity. Chain Rule

Reference:The Matrix Cookbook 2 Derivatives The derivatives of vectors,matrices,norms, This section is covering differentiation of a number of expressions with respect to a matrix X.Note that it is always assumed that X has no special structure,i.e. determinants,etc can be found therein. that the elements of X are independent (e.g.not symmetric,Toeplitz,positive definite).See section 2.8 for differentiation of structured matrices.The basic assumptions can be written in a formula as 2.4.1 First Order 8X=65时 8Xi (32) (69) that is for e.g vectorforms Ox x 0aTXb 0x abT (70) 筒- [周-张 离。斋 8aTXTb X baT (71) The following rue are general and very useful when deriving the differential of an expression (191): 0aTXa 0aTXTa aaT (72 0A=0 (A is a constant) (33) ax a(ax) 8X 34 =Jij (73) 0(X+Y)=8X+Y (3 aXy a(T(X)】 =Tr(8X) 36 (XA) 8(XY=(8X)Y+X(8Y的 (37) 8Xmn =6im(A)ng =(JmnA) (74) X。Y)■ (8X)Y+Xo(8Y) 3 8X@Y)=(8X)⑧Y+X⑧(8Y) (39 8(XTA) =Bin(A)mj (J"mA) (75) 0(x-j = -X-(8x)x- 0) 0Xmn det(X)】 = Tr(adj(X)ax) 4 a(det(X))det(XyTr(X-aX) 8(In(det(X)))=Tr(X-8X) 8X =(8X)T (44 https://www.math.uwaterloo.ca/-hwolkowi/matrixcookbook.pdf 8xH=(8x" (45) Advanced Optimization (Fall 2023) Lecture 1.Mathematical Background 10
Advanced Optimization (Fall 2023) Lecture 1. Mathematical Background 10 Reference: The Matrix Cookbook https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf The derivatives of vectors, matrices, norms, determinants, etc can be found therein
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 南京大学:《数字图像处理》课程教学资源(课件讲义)11 图像特征分析.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)10 图像分割.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)09 形态学及其应用.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)08 压缩编码.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)07 频域滤波器.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)06 图像频域变换.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)05 代数运算与几何变换.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)04 图像复原及锐化.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)03 灰度直方图与点运算.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)02 二值图像与像素关系.pdf
- 南京大学:《数字图像处理》课程教学资源(课件讲义)01 概述 Digital Image Processing.pdf
- 人工智能相关文献资料:Adaptivity and Non-stationarity - Problem-dependent Dynamic Regret for Online Convex Optimization.pdf
- 北京大学出版社:21世纪全国应用型本科电子通信系列《MATLAB基础及其应用教程》实用规划教材(共八章,2007,编著:周开利等).pdf
- 《计算机应用基础》课程教学资源(参考资料)Mathematica CheatSheet.pdf
- 《计算机应用基础》课程教学资源(参考资料)MATLAB Reference Sheet, by Giordano Fusco & Jindich Soukup.pdf
- 《计算机应用基础》课程教学资源(参考资料)MATLAB Reference Sheet, by Sherman Wiggin & Dom Dal Bello.pdf
- 《计算机应用基础》课程教学资源(参考资料)MATLAB Reference Card, by Jesse Knight.pdf
- 《计算机应用基础》课程教学资源(参考资料)MATLAB Quick Reference, by Jialong He.pdf
- 《计算机应用基础》课程教学资源(参考资料)MATLAB_CheatSheet, by Thor Nielsen.pdf
- 《计算机应用基础》课程教学资源(参考资料)MATLAB常用命令和函数.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 02 Convex Optimization Basics; Function Properties.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 03 GD Methods I - GD method, Lipschitz optimization.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 04 GD Methods II - GD method, smooth optimization, Nesterov’s AGD, composite optimization.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 05 Online Convex Optimization - OGD, convex functions, strongly convex functions, online Newton step, exp-concave functions.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 06 Prediction with Expert Advice - Hedge, minimax bound, lower bound; mirror descent(motivation and preliminary).pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 07 Online Mirror Descent - OMD framework, regret analysis, primal-dual view, mirror map, FTRL, dual averaging.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 08 Adaptive Online Convex Optimization - problem-dependent guarantee, small-loss bound, self-confident tuning, small-loss OCO, self-bounding property bound.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 09 Optimistic Online Mirror Descent - optimistic online learning, predictable sequence, small-loss bound, gradient-variance bound, gradient-variation bound.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 10 Online Learning in Games - two-player zero-sum games, repeated play, minimax theorem, fast convergence.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 11 Adversarial Bandits - MAB, IW estimator, Exp3, lower bound, BCO, gradient estimator, self-concordant barrier.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 12 Stochastic Bandits - MAB, UCB, linear bandits, self-normalized concentration, generalized linear bandits.pdf
- 南京大学:《高级优化 Advanced Optimization》课程教学资源(讲稿)Lecture 13 Advanced Topics - non-stationary online learning, universal online learning, online ensemble, base algorithm, meta algorithm.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)课程简介 Combinatorics Introduction(主讲:尹一通).pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)基本计数 Basic enumeration.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)生成函数 Generating functions.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)筛法 Sieve methods.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)Cayley公式 Cayley's formula.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)Pólya计数法 Pólya's theory of counting.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)Ramsey理论 Ramsey theory.pdf
- 南京大学:《组合数学》课程教学资源(课堂讲义)存在性问题 Existence problems.pdf