《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源(PPT讲稿)Lecture 13 Matrix Factorization and Latent Semantic Indexing

Matrix Factorization and Latent Semantic Indexing Web Search and Mining Lecture 13: Matrix Factorization and latent Semantic Indexing
Matrix Factorization and Latent Semantic Indexing 1 Lecture 13: Matrix Factorization and Latent Semantic Indexing Web Search and Mining

Matrix Factorization and Latent Semantic Indexing Topic a Matrix factorization matrix decomposition Latent Semantic Indexing( LSi) Term-document matrices are very large But the number of topics that people talk about is small (in some sense) Clothes, movies, politics Can we represent the term-document space by a lower dimensional latent space?
Matrix Factorization and Latent Semantic Indexing 2 Topic ▪ Matrix factorization = matrix decomposition ▪ Latent Semantic Indexing (LSI) ▪ Term-document matrices are very large ▪ But the number of topics that people talk about is small (in some sense) ▪ Clothes, movies, politics, … ▪ Can we represent the term-document space by a lower dimensional latent space?

Matrix Factorization and Latent Semantic Indexing Background Linear Algebra background
Matrix Factorization and Latent Semantic Indexing 3 Linear Algebra Background Background

Matrix Factorization and Latent Semantic Indexing Background Eigenvalues eigenvectors Eigenvectors (for a square m xm matrix s) Sy=Av Example 2 (right)eigenvector eigenvalue(4 0)(2, 2 ∈R≠0A∈R a How many eigenvalues are there at most? Sv=v<→(S-Av=0 only has a non-zero solution if $ -AI=0 This is a mth order equation in n which can have at most m distinct solutions (roots of the characteristic polynomial)-can be complex even though S is real
Matrix Factorization and Latent Semantic Indexing 4 Eigenvalues & Eigenvectors ▪ Eigenvectors (for a square mm matrix S) ▪ How many eigenvalues are there at most? only has a non-zero solution if This is a mth order equation in λ which can have at most m distinct solutions (roots of the characteristic polynomial) – can be complex even though S is real. (right) eigenvector eigenvalue Example Background

Matrix Factorization and Latent Semantic Indexing Background Matrix-vector multiplication 3000 S=0200 has eigenvalues 30, 20, 1 with corresponding eigenvectors 001 On each eigenvector, S acts as a multiple of the identity matrix: but as a different multiple on each Any vector(say X= 1)can be viewed as a combination of the eigenvectors X=2v1+4v+6v
Matrix Factorization and Latent Semantic Indexing 5 Matrix-vector multiplication = 0 0 1 0 20 0 30 0 0 S has eigenvalues 30, 20, 1 with corresponding eigenvectors = 0 0 1 1 v = 0 1 0 2 v = 1 0 0 3 v On each eigenvector, S acts as a multiple of the identity matrix: but as a different multiple on each. Any vector (say x= ) can be viewed as a combination of the eigenvectors: x = 2v1 + 4v2 + 6v3 6 4 2 Background

Matrix Factorization and Latent Semantic Indexing Background Matrix vector multiplication Thus a matrix-vector multiplication such as Sx s, x as in the previous slide can be rewritten in terms of the eigenvalues/vectors. Sx=S(2v1+42+6v3) Sx=2Sv1+4S2+6S3=21v1+42+63 Sx=60v1+80v2+6v 3 Even though x is an arbitrary vector the action of s on x is determined by the eigenvalues/vectors
Matrix Factorization and Latent Semantic Indexing 6 Matrix vector multiplication ▪ Thus a matrix-vector multiplication such as Sx (S, x as in the previous slide) can be rewritten in terms of the eigenvalues/vectors: ▪ Even though x is an arbitrary vector, the action of S on x is determined by the eigenvalues/vectors. Sx = S(2v1 + 4v2 + 6v 3 ) Sx = 2Sv1 + 4Sv2 + 6Sv 3 = 21 v1 + 42 v2 + 6 3 v 3 Sx = 60v1 + 80v2 + 6v 3 Background

Matrix Factorization and Latent Semantic Indexing Background Matrix vector multiplication Suggestion: the effect of small"eigenvalues is small If we ignored the smallest eigenvalue(1), then instead of (60 we would get 60 80 80 6 These vectors are similar (in cosine similarity etc
Matrix Factorization and Latent Semantic Indexing 7 Matrix vector multiplication ▪ Suggestion: the effect of “small” eigenvalues is small. ▪ If we ignored the smallest eigenvalue (1), then instead of we would get ▪ These vectors are similar (in cosine similarity, etc.) 60 80 6 60 80 0 Background

Matrix Factorization and Latent Semantic Indexing Background Eigenvalues Eigenvectors For symmetric matrices, eigenvectors for distinct eigenvalues are orthogonal All eigenvalues of a real symmetric matrix are real All eigenvalues of a positive semidefinite matrix are non-negative 8
Matrix Factorization and Latent Semantic Indexing 8 Eigenvalues & Eigenvectors Sv{1,2}={1,2}v{1,2} ,and 121•v2=0 For symmetric matrices, eigenvectors for distinct eigenvalues are orthogonal All eigenvalues of a real symmetric matrix are real. n ,w TSw0, then ifSv=v0 All eigenvalues of a positive semidefinite matrix are non-negative Background

Matrix Factorization and Latent Semantic Indexing Background Example Let 2 Real, symmetric 2- Then s-n= 12-2 S-A|=(2-1)2-1=0 The eigenvalues are 1 and 3 (nonnegative, real The eigenvectors are orthogonal (and real Plug in these values and solve for eigenvectors
Matrix Factorization and Latent Semantic Indexing 9 Plug in these values and solve for eigenvectors. Example ▪ Let ▪ Then ▪ The eigenvalues are 1 and 3 (nonnegative, real). ▪ The eigenvectors are orthogonal (and real): = 1 2 2 1 S | | (2 ) 1 0. 1 2 2 1 2 − = − − = − − − = S I S I − 1 1 1 1 Real, symmetric. Background

Matrix Factorization and Latent Semantic Indexing Background Eigen/diagonal Decomposition etsEra be a square matrix with m linearly independent eigenvectors a"non-defective"matrix) Theorem: Exists an eigen decomposition Unique diagonal for S= UAU distinct (cf. matrix diagonalization theorem) eigen values Columns of u are eigenvectors of S Diagonal elements of a are eigenvalues of s A=diag(入1,…,Mm),A2≥入2+1
Matrix Factorization and Latent Semantic Indexing 10 ▪ Let be a square matrix with m linearly independent eigenvectors (a “non-defective” matrix) ▪ Theorem: Exists an eigen decomposition ▪ (cf. matrix diagonalization theorem) ▪ Columns of U are eigenvectors of S ▪ Diagonal elements of are eigenvalues of Eigen/diagonal Decomposition diagonal Unique for distinct eigenvalues Background
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 计算机网络技术基础(PPT课件讲稿).ppt
- PROGRAMMING METHDOLODGY AND SOFTWARE ENGINEERING(PPT讲稿)C Programming Review.ppt
- 《神经网络 Neural Networks》课程教学资源(PPT课件讲稿)Ch 8 Artificial Neural networks.pptx
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第五章 运输层.ppt
- 《Web编程实用技术教程》课程教学资源(PPT课件讲稿)第5章 MFC WinSock类的编程.ppt
- 《数字图像处理》课程PPT教学课件(讲稿)第二章 图像获取、显示和表示.ppt
- 香港中文大学:《Topics in Theoretical Computer Science》课程教学资源(PPT课件讲稿)量子计算 Quantum computing.pptx
- 香港科技大学:深度学习导论(PPT讲稿)Introduction to Deep Learning.pptx
- 北京大学软件研究所:高级软件工程(PPT讲稿)云计算与平台即服务.ppt
- 合肥学院:《数据库原理与应用》课程教学资源(PPT课件)第1章 数据库系统概述(主讲:叶潮流).ppt
- 《数据库原理与应用》课程PPT教学课件(SQL Server)第9章 存储过程和触发器.ppt
- 《The C++ Programming Language》课程教学资源(PPT课件讲稿)Lecture 02 Procedure-Based Programming.ppt
- 东南大学:《数据结构》课程教学资源(PPT课件讲稿)第七章 图.ppt
- 北京大学:《高级软件工程》课程教学资源(PPT课件讲稿)第一讲 软件与软件开发.ppt
- 西安电子科技大学:《现代密码学》课程教学资源(PPT课件讲稿)第二章 流密码(主讲:董庆宽).pptx
- 《Photoshop基础教程与上机指导》教学资源(PPT讲稿)第18章 扫描和修饰图像.ppt
- 中国水利水电出版社:《单片机原理及应用》课程PPT教学课件(C语言版)第8章 单片机系统扩展(主编:周国运).ppt
- 西安电子科技大学:《操作系统 Operating Systems》课程教学资源(PPT课件讲稿)Chapter 04 Memory Management.ppt
- 《网页设计》课程教学资源:课程教学大纲.doc
- 《人工智能技术导论》课程教学资源(PPT课件讲稿)第3章 图搜索与问题求解.ppt
- 多媒体技术及应用(PPT讲稿)多媒体音频技术.ppt
- 山东大学:《微机原理及单片机接口技术》课程教学资源(PPT课件讲稿)第四章 指令系统及汇编语言程序设计(4.1-4.4).ppt
- 东南大学:《C++语言程序设计》课程教学资源(PPT课件讲稿)Chapter 13 Object-Oriented Programming - Polymorphism.ppt
- 《C++语言程序设计》课程教学资源(PPT课件)第14讲 运算符重载.ppt
- 淮阴工学院:《数据库原理》课程教学资源(PPT课件讲稿)第4章 结构化查询语言SQL.ppt
- 《计算机网络 COMPUTER NETWORKS》课程教学资源(PPT课件讲稿)Chapter 18 互联网协议 Internet Protocols(IP).ppt
- 计算机应用专业《计算机网络》教学大纲.doc
- 《计算机网络安全》课程教学资源(PPT课件讲稿)第四章 数据加密技术.ppt
- 西安培华学院:《计算机网络工程》课程教学资源(PPT课件讲稿)第1章 网络工程知识(主讲:张伟).ppt
- 对外经济贸易大学:《大学计算机基础》课程电子教案(PPT课件)第5章 PowerPoint幻灯片制作(PowerPoint 2010).pptx
- 中国地质大学(武汉):R语言入门教程(PPT讲稿).ppt
- 西南民族大学:软件需求分析与总体设计(PPT讲稿,主讲:殷锋).ppt
- 《软件测试 Software Testing》教学资源(PPT讲稿)Part 1 The Big Picture.ppt
- 系统编程工具REXX和CLIST.ppt
- 北京大学:基于信息利用的烟花算法研究(PPT讲稿)Research on Fireworks Algorithms from the Perspective of Information Utilization.pptx
- 《ARM嵌入式软件开发》课程教学资源(PPT课件讲稿)第三章 ARM体系结构及编程模型.ppt
- 《大型机系统管理技术》课程教学资源(PPT课件讲稿)第2章 大型服务器外存管理.ppt
- 《计算机组成原理》课程PPT教学课件(讲稿)第三章 计算机核心部件及其工作原理.ppt
- 《计算机网络概述》教学资源(PPT课件讲稿).ppt
- 面积对象编程(PPT讲稿)Object-Oriented Programming and Classes.ppt