上海交通大学:《挖掘海量数据集 Mining Massive Datasets》课程教学资源(PPT讲稿)Lecture 07 链接分析 Link Analysis

Link Analysis Mining Massive Datasets Wu-Jun li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 7: Link Analysis
Link Analysis 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 7: Link Analysis Mining Massive Datasets

Link Analysis Link Analysis algorithms PageRank Hubs and authorities Topic-Sensitive PageRank Spam Detection Algorithms Other interesting topics we wont cover Detecting duplicates and mirrors Mining for communities community detection (Refer to Chapter 10 of the textbook)
Link Analysis 2 Link Analysis Algorithms ▪ PageRank ▪ Hubs and Authorities ▪ Topic-Sensitive PageRank ▪ Spam Detection Algorithms ▪ Other interesting topics we won’t cover ▪ Detecting duplicates and mirrors ▪ Mining for communities (community detection) (Refer to Chapter 10 of the textbook)

Link Analysis Outline PageRank Topic-Sensitive PageRank Hubs and authorities Spam detection
Link Analysis 3 Outline ▪ PageRank ▪ Topic-Sensitive PageRank ▪ Hubs and Authorities ▪ Spam Detection

Link Analysis PageRank Ranking web pages Web pages are not equally important www.joe-schmoe.comvwww.stanford.edu Inlinks as votes www.stanford.eduhas23,400inlinks www.joe-schmoe.comhas1inlink Are all inlinks equal? Recursive question!
Link Analysis 4 Ranking web pages ▪ Web pages are not equally “important” ▪ www.joe-schmoe.com v www.stanford.edu ▪ Inlinks as votes ▪ www.stanford.edu has 23,400 inlinks ▪ www.joe-schmoe.com has 1 inlink ▪ Are all inlinks equal? ▪ Recursive question! PageRank

Link Analysis PageRank Simple recursive formulation Each link's vote is proportional to the importance of Its source page If page P with importance x has n outlinks, each link gets x/n votes Page p's own importance is the sum of the votes on its inlinks
Link Analysis 5 Simple recursive formulation ▪ Each link’s vote is proportional to the importance of its source page ▪ If page P with importance x has n outlinks, each link gets x/n votes ▪ Page P’s own importance is the sum of the votes on its inlinks PageRank

Link Analysis PageRank Simple flow"model The web in 1839 y/2 Yahoo y=y/2+a/2 a=y 2+m a/2 y m=a/2 m Amazon Soft a/2 a m
Link Analysis 6 Simple “flow” model The web in 1839 Yahoo Amazon M’soft y a m y/2 y/2 a/2 a/2 m y = y /2 + a /2 a = y /2 + m m = a /2 PageRank

Link Analysis PageRank Solving the flow equations 3 equations 3 unknowns, no constants No unique solution All solutions equivalent modulo scale factor Additional constraint forces uniqueness yta+m 1 ny=2/5,a=2/5,m=1/5 Gaussian elimination method works for small examples but we need a better method for large graphs
Link Analysis 7 Solving the flow equations ▪ 3 equations, 3 unknowns, no constants ▪ No unique solution ▪ All solutions equivalent modulo scale factor ▪ Additional constraint forces uniqueness ▪ y+a+m = 1 ▪ y = 2/5, a = 2/5, m = 1/5 ▪ Gaussian elimination method works for small examples, but we need a better method for large graphs PageRank

Link Analysis PageRank Matrix formulation Matrix M has one row and one column for each web page Suppose page j has n outlinks fj→ i, then m=1/n ■E|seM:=0 M is a column stochastic matrix Columns sum to 1 Suppose r is a vector with one entry per web page r; is the importance score of page i Call it the rank vector
Link Analysis 8 Matrix formulation ▪ Matrix M has one row and one column for each web page ▪ Suppose page j has n outlinks ▪ If j i, then Mij=1/n ▪ Else Mij=0 ▪ M is a column stochastic matrix ▪ Columns sum to 1 ▪ Suppose r is a vector with one entry per web page ▪ ri is the importance score of page i ▪ Call it the rank vector ▪ |r| = 1 PageRank

Link Analysis PageRank Example Suppose page j links to 3 pages, including i
Link Analysis 9 Example Suppose page j links to 3 pages, including i i j M r r = i 1/3 PageRank

Link Analysis PageRank Eigenvector formulation The flow equations can be written r= Mr So the rank vector is an eigenvector of the stochastic web matrix In fact, its first or principal eigenvector, with corresponding eigenvalue 1
Link Analysis 10 Eigenvector formulation ▪ The flow equations can be written r = Mr ▪ So the rank vector is an eigenvector of the stochastic web matrix ▪ In fact, its first or principal eigenvector, with corresponding eigenvalue 1 PageRank
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《计算机仿真技术》课程电子教案(PPT教学课件)第一章 绪论.ppt
- 《计算机网络》课程教学资源(PPT课件讲稿)第6章 IP路由.ppt
- 《计算机原理及应用》课程教学资源(PPT课件讲稿)第8章 单片机的存储器的扩展.ppt
- 《算法设计》课程教学资源(PPT课件讲稿)Lecture 6 Graph Traversal.ppt
- 香港浸会大学:《Data Communications and Networking》课程教学资源(PPT讲稿)Chapter 3 Data Transmission.ppt
- 南京大学:Decidability、Complexity(P、NP、NPC)、Reduce(P NP NPC).pptx
- 《计算机文化基础》课程教学资源(PPT课件讲稿)第四章 电子表格系统Excel 2003.ppt
- 西安电子科技大学:《信息系统安全》课程教学资源(PPT课件讲稿)第三章 信息安全保障体系、第四章 物理安全.ppt
- 《计算机网络》课程电子教案(PPT课件讲稿)第2章 数据通信与广域网技术.ppt
- 《计算机网络与互联网 Computer Networks and Internets》课程电子教案(PPT课件讲稿)Part IV 局域网 Local Area Networks(LANs).ppt
- 《人工智能导论》课程教学资源(PPT课件讲稿)群智能(Swarm Intelligence).ppt
- 山东大学:《微机原理及单片机接口技术》课程教学资源(PPT课件讲稿)第六章 中断 §6.1 中断的概念 §6.2 单片机的中断系统及其管理.ppt
- 3D computer vision techniques v.4b2 1.ppt
- 上海交通大学:《计算机控制技术》课程教学资源(PPT课件)第一章 计算机控制系统概述 Computer Control Technology.ppt
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第一章 概述.ppt
- 《ARM原理与设计》课程教学资源(PPT课件讲稿)Lecture 04 Cortex M3指令集.pptx
- 西安电子科技大学:《微机原理与接口技术》课程教学资源(PPT课件讲稿)第八章 中断系统与可编程中断控制器8259A.pptx
- 《软件工程》课程教学资源(PPT课件讲稿)需求分析.ppt
- 《Data Warehousing & Data Mining》课程教学资源(PPT讲稿)Ch 2 Discovering Association Rules.ppt
- Microsoft .NET(PPT课件讲稿)Being Objects and A Glimpse into Coding.pptx
- 香港中文大学:《Probability and Statistics for Engineers》课程教学资源(PPT课件讲稿)Chapter 09 Classical Staistical Inference.pptx
- 西安交通大学:《微机原理与接口技术》课程教学资源(PPT课件讲稿)第6章 数字量输入输出接口(主讲:桂小林).ppt
- 《软件工程》课程教学资源(PPT课件)Lecture 6 设计概念和原则 Design Concepts and Principles.ppt
- 《网络编程实用教程》课程教学资源(PPT课件讲稿)第2章 套接字网络编程基础.ppt
- 《现代操作系统 Modern Operating Systems》课程教学资源(PPT课件讲稿,Third Edition)Chapter 3 内存管理 Memory Management.ppt
- 《数据结构》课程教学资源(PPT课件讲稿)第四章 串.ppt
- 东北大学:《计算机图形学》课程教学资源(PPT课件讲稿,主讲:闻时光).ppt
- 上海交通大学:超立方体 Hypercube(PPT讲稿)Low-Diameter Architectures.ppt
- 《数据结构》课程教学资源(PPT课件讲稿)第五章 多维数组与广义表.ppt
- 西南交通大学:《网络性能评估与测试 Network Performance Evaluation and Testing》(PPT课件讲稿)第2讲 网络测试技术基础(主讲:张新有).ppt
- 《Photoshop CS教程》教学资源(PPT课件)第7章 编辑文字.ppt
- 《编译原理》课程教学资源(PPT课件讲稿)语法制导的翻译(Syntax-Directed Translation).pptx
- 电子科技大学:《密码理论》课程教学资源(PPT课件讲稿)第2章 流密码.ppt
- 搜索引擎技术(PPT讲稿)Web Spam.ppt
- 四川大学:《计算机操作系统 Operating System Principles》课程教学资源(PPT课件讲稿)第1章 导论(主讲:段磊).ppt
- 赣南师范大学:《计算机网络原理》课程教学资源(PPT课件讲稿)第七章 网络层.ppt
- 《人工智能》课程电子教案(PPT课件讲稿)第9章 机器学习与知识发现.ppt
- 《数字图像处理》课程教学资源(PPT课件讲稿)第7章 图像分割.ppt
- 《编译原理》课程教学资源(PPT课件讲稿)第五章 语法制导的翻译 5.1 语法制导的定义 5.2 S属性定义的自下而上计算.ppt
- 四川大学:《操作系统 Operating System》课程教学资源(PPT课件讲稿)Chapter 5 互斥与同步(Mutual Exclusion and Synchronization)5.3 Semaphores.ppt