重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 8 Cluster Analysis:Basic Concepts and Methods

Chapter 8. Cluster Analysis: Basic Concepts and Methods Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical methods Density-Based Methods Grid-Based methods Evaluation of Clustering Summar
1 Chapter 8. Cluster Analysis: Basic Concepts and Methods ◼ Cluster Analysis: Basic Concepts ◼ Partitioning Methods ◼ Hierarchical Methods ◼ Density-Based Methods ◼ Grid-Based Methods ◼ Evaluation of Clustering ◼ Summary 1

What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar(or related) to one another and different from (or unrelated to) the objects in other groups Inter-clustel Intra-cluster distances are distances are maximized minimized ○ 2
2 What is Cluster Analysis? ◼ Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster distances are maximized Intra-cluster distances are minimized

What is Cluster Analysis? Cluster a collection of data objects similar(or related) to one another within the same group dissimilar (or unrelated) to the objects in other groups Cluster analysis (or clustering, data segmentation, .. Finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters Unsupervised learning: no predefined classes(i.e, learning by observations Vs learning by examples: supervised Typical applications As a stand-alone tool to get insight into data distribution As a preprocessing step for other algorithms
3 What is Cluster Analysis? ◼ Cluster: A collection of data objects ◼ similar (or related) to one another within the same group ◼ dissimilar (or unrelated) to the objects in other groups ◼ Cluster analysis (or clustering, data segmentation, …) ◼ Finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters ◼ Unsupervised learning: no predefined classes (i.e., learning by observations vs. learning by examples: supervised) ◼ Typical applications ◼ As a stand-alone tool to get insight into data distribution ◼ As a preprocessing step for other algorithms

Clustering for Data Understanding and Applications Biology taxonomy of living things: kingdom, phylum, class, order, family, genus and species Information retrieval: document clustering a land use: dentification of areas of similar land use in an earth observation database Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs City-planning: Identifying groups of houses according to their house type, value, and geographical location Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults Climate: understanding earth climate find patterns of atmospheric and ocean Economic Science market research
4 Clustering for Data Understanding and Applications ◼ Biology: taxonomy of living things: kingdom, phylum, class, order, family, genus and species ◼ Information retrieval: document clustering ◼ Land use: Identification of areas of similar land use in an earth observation database ◼ Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs ◼ City-planning: Identifying groups of houses according to their house type, value, and geographical location ◼ Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults ◼ Climate: understanding earth climate, find patterns of atmospheric and ocean ◼ Economic Science: market research

Clustering as a Preprocessing Tool (Utility) Summarization Preprocessing for regression, PCA, classification, and association analysis Compression Image processing: vector quantization Finding K-nearest Neighbors Localizing search to one or a small number of clusters Outlier detection Outliers are often viewed as those far away from any cluster
5 Clustering as a Preprocessing Tool (Utility) ◼ Summarization: ◼ Preprocessing for regression, PCA, classification, and association analysis ◼ Compression: ◼ Image processing: vector quantization ◼ Finding K-nearest Neighbors ◼ Localizing search to one or a small number of clusters ◼ Outlier detection ◼ Outliers are often viewed as those “far away” from any cluster

Applications of Cluster Analysis ■ Understanding Group related documents for browsing, group genes and proteins that have similar functionality, or group stocks with similar price fluctuations Summarization Reduce the size of large data sets Clustering precipitation in Australia 6
6 Applications of Cluster Analysis ◼ Understanding ◼ Group related documents for browsing, group genes and proteins that have similar functionality, or group stocks with similar price fluctuations ◼ Summarization ◼ Reduce the size of large data sets Clustering precipitation in Australia

Clustering: Rich Applications and Multidisciplinary Efforts Pattern Recognition Spatial data Analysis Create thematic maps in Gis by clustering feature spaces Detect spatial clusters or for other spatial mining tasks Image Processing Economic Science(especially market research) WWW Document classification Cluster Weblog data to discover groups of similar access patterns
7 Clustering: Rich Applications and Multidisciplinary Efforts ◼ Pattern Recognition ◼ Spatial Data Analysis ◼ Create thematic maps in GIS by clustering feature spaces ◼ Detect spatial clusters or for other spatial mining tasks ◼ Image Processing ◼ Economic Science (especially market research) ◼ WWW ◼ Document classification ◼ Cluster Weblog data to discover groups of similar access patterns

Quality: What s g。。 d clustering A good clustering method will produce high quality clusters high intra-class similarity: cohesive within clusters low inter-class similarity: distinctive between clusters The guality of a clustering method depends on the similarity measure used by the method its implementation, and Its ability to discover some or all of the hidden patterns
Quality: What Is Good Clustering? ◼ A good clustering method will produce high quality clusters ◼ high intra-class similarity: cohesive within clusters ◼ low inter-class similarity: distinctive between clusters ◼ The quality of a clustering method depends on ◼ the similarity measure used by the method ◼ its implementation, and ◼ Its ability to discover some or all of the hidden patterns 8

What is not Cluster Analysis? Supervised classification Have class label information Simple segmentation Dividing students into different registration groups alphabetically, by last name Results of a query Groupings are a result of an external specification Graph partitioning Some mutual relevance and synergy, but areas are not identical
9 What is not Cluster Analysis? ◼ Supervised classification ◼ Have class label information ◼ Simple segmentation ◼ Dividing students into different registration groups alphabetically, by last name ◼ Results of a query ◼ Groupings are a result of an external specification ◼ Graph partitioning ◼ Some mutual relevance and synergy, but areas are not identical

Measure the Quality of Clustering Dissimilarity/Similarity metric Similarity is expressed in terms of a distance function typically metric: diD The definitions of distance functions are usually rather different for interval-scaled, boolean, categorical ordinal ratio, and vector variables Weights should be associated with different variables based on applications and data semantics Quality of clustering There is usually a separate " quality 'function that measures the goodness of a cluster. It is hard to define“ similar enough”or"“ good enough” The answer is typically highly subjective
Measure the Quality of Clustering ◼ Dissimilarity/Similarity metric ◼ Similarity is expressed in terms of a distance function, typically metric: d(i, j) ◼ The definitions of distance functions are usually rather different for interval-scaled, boolean, categorical, ordinal ratio, and vector variables ◼ Weights should be associated with different variables based on applications and data semantics ◼ Quality of clustering: ◼ There is usually a separate “quality” function that measures the “goodness” of a cluster. ◼ It is hard to define “similar enough” or “good enough” ◼ The answer is typically highly subjective 10
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 7 Classification:Basic Concepts.ppt
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 6 Advanced Frequent Pattern Mining.ppt
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 5 Mining Frequent Patterns, Association and Correlations:Basic Concepts and Methods.ppt
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 4 OLAP - Data Warehousing and On-line Analytical Processing.ppt
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 3 Data Preprocessing.ppt
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 2 about data - Getting to Know Your Data.ppt
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 1 introduction.ppt
- 重庆师范大学:《人工智能 AI》精品课程PPT教学课件_第7章 机器人规划.ppt
- 重庆师范大学:《人工智能 AI》精品课程PPT教学课件_第6章 机器学习.ppt
- 重庆师范大学:《人工智能 AI》精品课程PPT教学课件_第5章 搜索策略.ppt
- 重庆师范大学:《人工智能 AI》精品课程PPT教学课件_第4章 智能计算(计算智能).ppt
- 重庆师范大学:《人工智能 AI》精品课程PPT教学课件_第3章 推理技术.ppt
- 重庆师范大学:《人工智能 AI》精品课程PPT教学课件_第2章 知识表示.ppt
- 重庆师范大学:《人工智能 AI》精品课程PPT教学课件_绪论、第1章 人工智能概述.ppt
- 重庆师范大学:《人工智能》精品课程PPT教学课件_VR虚拟现实和AR增强现实技术.ppt
- 重庆大学:《大数据技术基础》课程教学资源(课件讲稿)09 Spark内存计算.pdf
- 重庆大学:《大数据技术基础》课程教学资源(课件讲稿)08 流计算 Stream Computing.pdf
- 重庆大学:《大数据技术基础》课程教学资源(课件讲稿)07 图计算 Graph Computing.pdf
- 重庆大学:《大数据技术基础》课程教学资源(课件讲稿)06 HBase.pdf
- 重庆大学:《大数据技术基础》课程教学资源(课件讲稿)05 HDFS.pdf
- 重庆大学:《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件(英文版)Chapter 9 Outlier Analysis.ppt
- 延安大学:《网页制作基础教程》课程教学资源_教学大纲.pdf
- 延安大学:《网页制作基础教程》学术论文_基于AJAX技术的Web模型在网站互动平台的应用研究.pdf
- 延安大学:《网页制作基础教程》学术论文_基于RIA技术的实验演示系统的设计与实现.pdf
- 延安大学:《网页制作基础教程》学术论文_服务器推技术在实验演示系统中的应用.pdf
- 延安大学:《网页制作基础教程》学术论文_用户行为驱动的网页布局自动调整的研究.pdf
- 《网页制作基础教程》参考书籍(PDF):JavaScript 权威指南(第四版).pdf
- 《网页制作基础教程》参考书籍(PDF):Python学习手册(第3版,涵盖Pathon 2.5).pdf
- 《网页制作基础教程》参考书籍:CSS Mastery 精通CSS书籍——高级WEB标准解决方案(人民邮电出版社).pdf
- 延安大学:《网页制作基础教程》课程PPT教学课件_第一章 网页结构(牛永洁).ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第二章 网页头部.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第三章 格式化.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第四章 列表的应用.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第五章 使用图像与多媒体.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第六章 使用超级链接.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第七章 在网页中使用表格.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第八章 在网页中使用框架的使用.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第九章 表单.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第十章 XHTML.ppt
- 延安大学:《网页制作基础教程》课程PPT教学课件_第十一章 CSS.ppt