同济大学:《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源(PPT课件讲稿)Clustering Basics(主讲:赵钦佩)

Outline a Cluster basics Clustering algorithms a Hierarchical clustering a K-means a Expectation-Maximization(EM) a Cluster Validity n determining the number of clusters a Clustering evaluation
2 Outline ◼ Cluster Basics ◼ Clustering algorithms Hierarchical clustering k-means Expectation-Maximization (EM) ◼ Cluster Validity determining the number of clusters clustering evaluation

Clustering Analysis ■ Definition 口物以类聚,人以群居 n Grouping the data with similar features It's a method of data exploration, a way of looking for patterns or structure in the V:"... data that are of interest a Properties: unsupervised parameter needed Application field: Machine learning, pattern recognition mage analysis, data mining information retrieval and K-means animation bioinformatics etc
3 Clustering Analysis ◼ Definition: 物以类聚,人以群居 Grouping the data with similar features ◼ It’s a method of data exploration, a way of looking for patterns or structure in the data that are of interest. ◼ Properties: unsupervised, parameter needed ◼ Application field: Machine learning, pattern recognition, image analysis, data mining, information retrieval and bioinformatics etc. K-means animation

Factors of Clustering What data could be used in clustering? a Large or small, Gaussian or non-Gaussian, etc a Which clustering algorithm?(cost function) Partition-based(e.g k-means n Model-based(e.g EM algorithm) a Density-based(e.g. DBSCAN) Genetic, spectral a Choosing(dis similarity measures-a critical step in clustering 口 Euclidean distance, a Pearson linear correlation a How to evaluate the clustering result?(cluster validity)
4 Factors of Clustering ◼ What data could be used in clustering? Large or small, Gaussian or non-Gaussian, etc. ◼ Which clustering algorithm? (cost function) Partition-based (e.g. k-means) Model-based (e.g. EM algorithm) Density-based (e.g. DBSCAN) Genetic, spectral …… ◼ Choosing (dis)similarity measures – a critical step in clustering Euclidean distance,… Pearson Linear Correlation,… ◼ How to evaluate the clustering result? (cluster validity)

Quality: What Is Good Clustering? A good clustering method will produce high quality clusters with a high intra-class similarity a low inter-class similarity The quality of a clustering result depends on both the similarity measure used by the method and its implementation a The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns
5 Quality: What Is Good Clustering? ◼ A good clustering method will produce high quality clusters with high intra-class similarity low inter-class similarity ◼ The quality of a clustering result depends on both the similarity measure used by the method and its implementation ◼ The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns

Requirements of clustering in data mining(1) Scalability ability to deal with different types of attributes Discovery of clusters with arbitrary shape a Minimal requirements for domain knowledge to determine input parameters
◼ Scalability ◼ Ability to deal with different types of attributes ◼ Discovery of clusters with arbitrary shape ◼ Minimal requirements for domain knowledge to determine input parameters Requirements of clustering in data mining (1) 6

Requirements of clustering in data mining(2) a Ability to deal with noise and outliers 丈 Insensitivity to order of input records a High dimensionality a Incorporation of user- specified constraints Interpretability and usability
Requirements of clustering in data mining (2) ◼ Ability to deal with noise and outliers ◼ Insensitivity to order of input records ◼ High dimensionality ◼ Incorporation of userspecified constraints ◼ Interpretability and usability 7

Similarity and dissimilarit a there is no single definition of similarity or dissimilarity between data objects The definition of similarity or dissimilarity between objects depends on a the type of the data considered a what kind of similarity we are looking for
Similarity and dissimilarity 8 ◼ There is no single definition of similarity or dissimilarity between data objects ◼ The definition of similarity or dissimilarity between objects depends on the type of the data considered what kind of similarity we are looking for

Similarity and dissimilarit Similarity/dissimilarity between objects is often expressed in terms of a distance measure d(x, y) a Ideally, every distance measure should be a metric. i.e. it should satisfy the following conditions (x,y)≥0 2. d(x,y)=oiff x 3.d(x,y)=d(y,x) 4.d(x,2)≤d(x,y)+d(y,2)
Similarity and dissimilarity 9 ◼ Similarity/dissimilarity between objects is often expressed in terms of a distance measure d(x,y) ◼ Ideally, every distance measure should be a metric, i.e., it should satisfy the following conditions: 4. ( , ) ( , ) ( , ) 3. ( , ) ( , ) 2. ( , ) 0 iff 1. ( , ) 0 d x z d x y d y z d x y d y x d x y x y d x y + = = =

Euclidean distance Find distance between X and y: xEX, yEY, n="number of dimensions", A="degree of minkowski" x2Ⅹ x ="absolute value of x",W="weight vector"can be all I 单Qw0(VE(+(x-¥xD) ⊥=1 A=1 city Block(x, r)=2(**( x,-y D) 主=1 Euclidean(x, y) E(w:*(lxY, D') Y = 1=1
Euclidean distance ◼ Here D is the number of dimensions in the data vector. For instance: ◆ RGB color channel in images ◆ Logitude and latitude in GPS data = = − D i d euc x i y i 1 2 (x,y) ( ) 10

Pearson Linear Correlation (x1-x)v1- No obvious p(x,y) relationship? X -x A medium direcl relationship? x Input A strong, inverse relationship a Were shifting the expression profiles down(subtracting the means)and scaling by the standard deviations (i.e. making the data have mean =0 and std= 1) a Always between-1 and +1(perfectly anti-correlated and perfectly correlated) 11
Pearson Linear Correlation ◼ We’re shifting the expression profiles down (subtracting the means) and scaling by the standard deviations (i.e., making the data have mean = 0 and std = 1) ◼ Always between –1 and +1 (perfectly anti-correlated and perfectly correlated) = = − − − − = = = = n i i n i i n i i n i i i n i i y n y x n x x x y y x x y y 1 1 ( ) ( ) ( )( ) ( , ) 1 2 1 2 1 x y 11
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 东南大学:《C++语言程序设计》课程教学资源(PPT课件讲稿)Chapter 09 Classes A Deeper Look(Part 1).ppt
- 贵州电子信息职业技术学院:常用办公技巧(PPT讲稿,主讲:刘忠华).ppt
- 计算机软件技术基础:《Visual Basic6.0 程序设计》课程教学资源(PPT课件)第1章 Visual Basic(VB)概述.ppt
- Dynamic Pricing in Spatial Crowdsourcing:A Matching-Based Approach.pptx
- 《Java Web应用开发基础》课程教学资源(PPT课件)第8章 EL、JSTL和Ajax技术.ppt
- 《计算机组装与维修》课程电子教案(PPT教学课件)第一章 计算机系统维护维修基础.ppt
- 湖南生物机电职业技术学院:《电子商务概论》课程教学资源(PPT课件)第六章 网上支付.ppt
- 清华大学出版社:《网络信息安全技术》教材电子教案(PPT课件讲稿)第2章 密码技术.ppt
- 《网络系统集成技术》课程教学资源(PPT课件讲稿)第六章 网络互联技术.ppt
- 数据库接口技术(PPT讲稿)开放式数据库联接 Open DataBase Connectivity——ODBC.ppt
- 《网络综合布线》课程教学资源(PPT讲稿)模块2 综合布线工程设计.ppt
- 《软件工程》课程教学资源(PPT课件讲稿)第4章 软件总体设计.ppt
- 华东理工大学:《Visual Basic程序设计教程》课程教学资源(PPT课件)第四讲 VB语言基础(运算符、函数和表达式).pps
- 《数据结构》课程教学资源(PPT课件讲稿)第六章 集合与字典.ppt
- 清华大学:《网络安全 Network Security》课程教学资源(PPT课件讲稿)Lecture 01 Introduction.pptx
- 安徽理工大学:《汇编语言》课程教学资源(PPT课件讲稿)第四章 汇编语言程序格式.ppt
- 《C程序设计》课程电子教案(PPT课件讲稿)第二章 基本数据类型及运算.ppt
- 浪潮公司:并行程序、编译与函数库简介、应用软件的调优.ppt
- 南京大学:《数据结构 Data Structures》课程教学资源(PPT课件讲稿)第二章 线性表.ppt
- 长春大学:《计算机应用基础》课程教学资源(PPT课件讲稿)第二章 操作系统.ppt
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第三章 数据链路层.ppt
- 上海交通大学:《网络安全技术》课程教学资源(PPT课件讲稿)比特币(主讲:刘振).pptx
- 中国科学技术大学:《并行算法实践》课程教学资源(PPT课件讲稿)上篇 并行程序设计导论 单元II 并行程序编程指南 第七章 OpenMP编程指南.ppt
- Online Minimum Matching in Real-Time Spatial Data:Experiments and Analysis.pptx
- 《数字图像处理 Digital Image Processing》课程教学资源(各章要求及必做题参考答案).pdf
- 北京航空航天大学:Graph Search & Social Networks.pptx
- 《C程序设计》课程电子教案(PPT课件讲稿)第四章 数组和结构.ppt
- 西安电子科技大学:《信息系统安全》课程教学资源(PPT课件讲稿)第二章 安全控制原理.ppt
- 南京航空航天大学:《数据结构》课程教学资源(PPT课件讲稿)第十章 排序.ppt
- 四川大学:《计算机操作系统 Operating System Principles》课程教学资源(PPT课件讲稿)第9章 文件管理.ppt
- 《多媒体教学软件设计》课程教学资源(PPT课件讲稿)第4章 多媒体教学软件的图文演示设计.ppt
- 河南中医药大学(河南中医学院):《计算机网络》课程教学资源(PPT课件讲稿)第三章 数据链路层.pptx
- 上海交通大学:《Multicore Architecture and Parallel Computing》课程教学资源(PPT课件讲稿)Lecture 9 MapReduce.pptx
- 西安交通大学:《网络与信息安全》课程PPT教学课件(网络入侵与防范)第四章 口令破解与防御技术.ppt
- 《机器学习》课程教学资源(PPT课件讲稿)第十二章 计算学习理论 Machine Learning.pptx
- 广西外国语学院:《计算机网络》课程教学资源(PPT课件讲稿)第9章 DHCP协议(任课教师:卢豫开).ppt
- 《信息技术基础》课程教学资源(PPT课件)信息技术基础知识的内容.ppt
- 《PHP程序设计》教学资源(PPT课件讲稿)项目二 网站用户中心.ppt
- Microsoft .NET(PPT课件讲稿)Being Objects and A Glimpse into Coding.pptx
- 《Data Warehousing & Data Mining》课程教学资源(PPT讲稿)Ch 2 Discovering Association Rules.ppt