中国高校课件下载中心 》 教学资源 》 大学文库

广东工业大学:《机器学习》课程教学资源(课件讲义)第10讲 神经网络的优化(激活函数 dropout)

文档信息
资源类别:文库
文档格式:PDF
文档页数:52
文件大小:2.38MB
团购合买:点击进入团购
内容简介
广东工业大学:《机器学习》课程教学资源(课件讲义)第10讲 神经网络的优化(激活函数 dropout)
刷新页面文档预览

Tips for Deep Learning

Tips for Deep Learning

Recipe of Deep Learning YES Step 1:define a NO set of function Good Results on Testing Data? Overfitting! Step 2:goodness of function YES NO Step 3:pick the Good Results on best function Training Data? Neural Network 8

Neural Network Good Results on Testing Data? Good Results on Training Data? Step 3: pick the best function Step 2: goodness of function Step 1: define a set of function YES YES NO NO Overfitting! Recipe of Deep Learning

Do not always blame Overfitting Not well trained s 20 (%)I01 Sururen (%)10113 56-layer 10 20-layer 56-layer 密 Over 20-layer 6 iter.(1e4) iter.(1e4) Training Data Testing Data Deep Residual Learning for Image Recognition http://arxiv.org/abs/1512.03385

Do not always blame Overfitting Deep Residual Learning for Image Recognition http://arxiv.org/abs/1512.03385 Testing Data Overfitting? Training Data Not well trained

Recipe of Deep Learning YES Good Results on Different approaches for Testing Data? different problems. e.g.dropout for good results YES on testing data Good Results on Training Data? Neural Network

Neural Network Good Results on Testing Data? Good Results on Training Data? YES YES Recipe of Deep Learning Different approaches for different problems. e.g. dropout for good results on testing data

Recipe of Deep Learning YES Early Stopping Good Results on Regularization Testing Data? Dropout YES Good Results on New activation function Training Data? Adaptive Learning Rate

Good Results on Testing Data? Good Results on Training Data? YES YES Recipe of Deep Learning New activation function Adaptive Learning Rate Early Stopping Regularization Dropout

Hard to get the power of Deep .. Handwritting Digit Classification 100 8 90 50万 Results on Training Data 705 60 Deeper usually does not imply better. 2 3 4 5 6 8 910 Layers

Hard to get the power of Deep … Deeper usually does not imply better. Results on Training Data

Vanishing Gradient Problem Y1 V2 : YM Smaller gradients Larger gradients Learn very slow Learn very fast Almost random Already converge based on random!?

Vanishing Gradient Problem Larger gradients Almost random Already converge based on random!? Learn very slow Learn very fast 1 x 2 x …… Nx…… …… …… …… …… …… …… y1 y2 yM Smaller gradients

Vanishing Gradient Problem Smaller gradients Small output 0.5 Large +△w input Intuitive way to compute the derivatives .. al △L △W

Vanishing Gradient Problem 1 x 2 x …… Nx…… …… …… …… …… …… …… 𝑦1 𝑦2 𝑦𝑀 …… 𝑦 ො 1 𝑦 ො 2 𝑦 ො 𝑀 𝑙 Intuitive way to compute the derivatives … 𝜕𝑙 𝜕𝑤 =? +∆𝑤 +∆𝑙 ∆𝑙 ∆𝑤 Smaller gradients Large input Small output

ReLU Rectified Linear Unit (ReLU) Reason: a o(Z) a=z 1.Fast to compute 2.Biological reason a=0 3.Infinite sigmoid with different biases 4.Vanishing gradient [Xavier Glorot,AISTATS'11] [Andrew L.Maas,ICML'13] problem [Kaiming He,arXiv'15]

ReLU • Rectified Linear Unit (ReLU) Reason: 1. Fast to compute 2. Biological reason 3. Infinite sigmoid with different biases 4. Vanishing gradient problem 𝑧 𝑎 𝑎 = 𝑧 𝑎 = 0 𝜎 𝑧 [Xavier Glorot, AISTATS’11] [Andrew L. Maas, ICML’13] [Kaiming He, arXiv’15]

a 三Z ReLU a=0 2 2 X2

ReLU1 x 2 x 1 y2 y 00 00 𝑧 𝑎 𝑎 = 𝑧 𝑎 = 0

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档