北京航空航天大学:《数据挖掘——概念和技术(Data Mining - Concepts and Techniques)》课程教学资源(PPT课件讲稿)Chapter 02 Getting to Know Your Data
data:image/s3,"s3://crabby-images/23c55/23c558216135801f6395e0fa48c0e78f7949536f" alt=""
Chapter 2: Getting to Know Your Data Data objects and attribute types Basic statistical Descriptions of data Data visualization Measuring data Similarity and dissimilarity ■ Summary
2 Chapter 2: Getting to Know Your Data ◼ Data Objects and Attribute Types ◼ Basic Statistical Descriptions of Data ◼ Data Visualization ◼ Measuring Data Similarity and Dissimilarity ◼ Summary
data:image/s3,"s3://crabby-images/05d87/05d870cb0df53530fd61dedb2292810877d07e97" alt=""
Types of Data Sets Record Relational records Data matrix, e. g. numerical matrix crosstabs Document data: text documents, term frequency vector Document 1 050 602 Transaction data graph and network Document 2 0702100300 World Wide Web Document 3 Social or information networks o,,|2|20。0 Molecular Structures Ordered TD tems Video data: sequence of images Bread, Coke, Milk Temporal data: time-series Beer bread Sequential Data: transaction sequences Beer, Coke, Diaper, Milk Genetic sequence data Spatial, image and multimedia Beer, Bread, Diaper, Milk Spatial data: maps Coke, Diaper, Milk Image data Video data
3 Types of Data Sets ◼ Record ◼ Relational records ◼ Data matrix, e.g., numerical matrix, crosstabs ◼ Document data: text documents: termfrequency vector ◼ Transaction data ◼ Graph and network ◼ World Wide Web ◼ Social or information networks ◼ Molecular Structures ◼ Ordered ◼ Video data: sequence of images ◼ Temporal data: time-series ◼ Sequential Data: transaction sequences ◼ Genetic sequence data ◼ Spatial, image and multimedia: ◼ Spatial data: maps ◼ Image data: ◼ Video data: Document 1 season timeout lost wi n game score ball pla y coach team Document 2 Document 3 3 0 5 0 2 6 0 2 0 2 0 0 7 0 2 1 0 0 3 0 0 1 0 0 1 2 2 0 3 0 TID Items 1 Bread, Coke, Milk 2 Beer, Bread 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk
data:image/s3,"s3://crabby-images/77cbd/77cbd9fbdde862afcbe8fcaf0d299414becd0642" alt=""
Important Characteristics of Structured Data a Dimensionality Curse of dimensionality a sparsity Only presence counts Resolution Patterns depend on the scale a distribution Centrality and dispersion
4 Important Characteristics of Structured Data ◼ Dimensionality ◼ Curse of dimensionality ◼ Sparsity ◼ Only presence counts ◼ Resolution ◼ Patterns depend on the scale ◼ Distribution ◼ Centrality and dispersion
data:image/s3,"s3://crabby-images/b2134/b213428b756ce5ec5e0f81a538a90f909d0d1314" alt=""
Data Objects Data sets are made up of data objects a data object represents an entity Examples: sales database: customers store items sales medical database: patients treatments university database: students professors, courses Also called samples, examples, instances, data points, objects, tuples. Data objects are described by attributes Database rows-> data objects columns->attributes
5 Data Objects ◼ Data sets are made up of data objects. ◼ A data object represents an entity. ◼ Examples: ◼ sales database: customers, store items, sales ◼ medical database: patients, treatments ◼ university database: students, professors, courses ◼ Also called samples , examples, instances, data points, objects, tuples. ◼ Data objects are described by attributes. ◼ Database rows -> data objects; columns ->attributes
data:image/s3,"s3://crabby-images/b60d2/b60d25520f611c20ff3acf3c9576770813fd992d" alt=""
Attributes Attribute(or dimensions, features, variables) a data field representing a characteristic or feature of a data object E., customer_ID, name address Types nominal Binary Numeric: quantitative Interval-scaled Ratio- scaled
6 Attributes ◼ Attribute (or dimensions, features, variables): a data field, representing a characteristic or feature of a data object. ◼ E.g., customer _ID, name, address ◼ Types: ◼ Nominal ◼ Binary ◼ Numeric: quantitative ◼ Interval-scaled ◼ Ratio-scaled
data:image/s3,"s3://crabby-images/ff77d/ff77d6b003345c8cd021419d159301551ffd6de8" alt=""
Attribute Types Nominal: categories, states, or names of things Hair-color=auburn, black, blond, brown, grey red, white marital status, occupation id numbers, zip codes Bina Nominal attribute with only 2 states(0 and 1) Symmetric binary: both outcomes equally important e.g. gender Asymmetric binary: outcomes not equally important e.g. medical test(positive Vs. negative Convention: assign 1 to most important outcome(e.g. HIV positive Ordinal Values have a meaningful order (ranking but magnitude between successive values is not known Size=tsmall, medium, large,, grades army rankings
7 Attribute Types ◼ Nominal: categories, states, or “names of things” ◼ Hair_color = {auburn, black, blond, brown, grey, red, white} ◼ marital status, occupation, ID numbers, zip codes ◼ Binary ◼ Nominal attribute with only 2 states (0 and 1) ◼ Symmetric binary: both outcomes equally important ◼ e.g., gender ◼ Asymmetric binary: outcomes not equally important. ◼ e.g., medical test (positive vs. negative) ◼ Convention: assign 1 to most important outcome (e.g., HIV positive) ◼ Ordinal ◼ Values have a meaningful order (ranking) but magnitude between successive values is not known. ◼ Size = {small, medium, large}, grades, army rankings
data:image/s3,"s3://crabby-images/c368b/c368b02a878245d7c4f4af59bca694c6f55281ec" alt=""
Numeric Attribute Types Quantity(integer or real-valued) Interval Measured on a scale of equal-sized units Values have order E.g. temperature in c or F calendar dates No true zero-point Ratio Inherent zero-point We can speak of values as being an order of magnitude larger than the unit of measurement (10K° is twice as high as5K°) e.g temperature in Kelvin, length, counts monetary quantities
8 Numeric Attribute Types ◼ Quantity (integer or real-valued) ◼ Interval ◼ Measured on a scale of equal-sized units ◼ Values have order ◼ E.g., temperature in C˚or F˚, calendar dates ◼ No true zero-point ◼ Ratio ◼ Inherent zero-point ◼ We can speak of values as being an order of magnitude larger than the unit of measurement (10 K˚ is twice as high as 5 K˚). ◼ e.g., temperature in Kelvin, length, counts, monetary quantities
data:image/s3,"s3://crabby-images/5a6ae/5a6ae82c432b65164feaab4b803f21bb1ef0cd54" alt=""
Discrete vs Continuous Attributes Discrete attribute Has only a finite or countably infinite set of values E.g. zip codes profession or the set of words in a collection of documents Sometimes, represented as integer variables Note: Binary attributes are a special case of discrete attributes Continuous Attribute Has real numbers as attribute values E.g. temperature, height or weight Practically, real values can only be measured and represented using a finite number of digits Continuous attributes are typically represented as floating-point variables
9 Discrete vs. Continuous Attributes ◼ Discrete Attribute ◼ Has only a finite or countably infinite set of values ◼ E.g., zip codes, profession, or the set of words in a collection of documents ◼ Sometimes, represented as integer variables ◼ Note: Binary attributes are a special case of discrete attributes ◼ Continuous Attribute ◼ Has real numbers as attribute values ◼ E.g., temperature, height, or weight ◼ Practically, real values can only be measured and represented using a finite number of digits ◼ Continuous attributes are typically represented as floating-point variables
data:image/s3,"s3://crabby-images/71e22/71e223cffa38b1d1a88fff56672e9d6b44585588" alt=""
Chapter 2: Getting to Know Your Data Data objects and attribute types Basic statistical Descriptions of data Data visualization Measuring data Similarity and dissimilarity ■ Summary 10
10 Chapter 2: Getting to Know Your Data ◼ Data Objects and Attribute Types ◼ Basic Statistical Descriptions of Data ◼ Data Visualization ◼ Measuring Data Similarity and Dissimilarity ◼ Summary
data:image/s3,"s3://crabby-images/1cece/1cece325812fdf70b422bcba75bce669ce8fd780" alt=""
Basic Statistical Descriptions of Data ■ Motivation To better understand the data: central tendency, variation and spread Data dispersion characteristics median, max, min, quantiles, outliers, variance, eto a Numerical dimensions correspond to sorted intervals Data dispersion analyzed with multiple granularities of precision a Boxplot or quantile analysis on sorted intervals a dispersion analysis on computed measures a Folding measures into numerical dimensions a Boxplot or quantile analysis on the transformed cube 11
11 Basic Statistical Descriptions of Data ◼ Motivation ◼ To better understand the data: central tendency, variation and spread ◼ Data dispersion characteristics ◼ median, max, min, quantiles, outliers, variance, etc. ◼ Numerical dimensions correspond to sorted intervals ◼ Data dispersion: analyzed with multiple granularities of precision ◼ Boxplot or quantile analysis on sorted intervals ◼ Dispersion analysis on computed measures ◼ Folding measures into numerical dimensions ◼ Boxplot or quantile analysis on the transformed cube
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《Java程序开发》课程教学资源(PPT课件讲稿)第11章 Struts2框架技术.ppt
- Software Reliability & Testing(PPT讲稿)Overview of Software Reliability Engineering.ppt
- 香港浸会大学:《Data Communications and Networking》课程教学资源(PPT讲稿)Chapter 9 High Speed LANs and Wireless LANs.ppt
- 《软件工程》课程教学资源(PPT讲稿)软件测试——系统测试.pptx
- 厦门大学:《大数据技术原理与应用》课程教学资源(PPT课件讲稿,2017)第4章 分布式数据库HBase.ppt
- 上海交通大学:自然语言处理(PPT课件讲稿)Natural Language Processing.ppt
- 演化计算(PPT讲稿)Evolutionary Computation(EC).ppt
- 《计算机组成原理》课程电子教案(PPT课件讲稿)第4章 指令系统.ppt
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第五章 运输层.ppt
- C++ Basics(PPT讲稿).ppt
- 河南中医药大学(河南中医学院):《计算机文化》课程教学资源(PPT课件讲稿)第五章 运输层.pptx
- 南京航空航天大学:《数据结构》课程教学资源(PPT课件讲稿)第七章 图(微软精品课程建设).ppt
- 香港浸会大学:Programming Interest Group(PPT讲稿)Combinatorics & Number Theory.ppt
- 河南中医药大学(河南中医学院):《计算机网络》课程教学资源(PPT课件讲稿)第二章 物理层.ppt
- 《网络搜索和挖掘关键技术 Web Search and Mining》课程教学资源(PPT讲稿)Lecture 03 The term vocabulary and postings lists.ppt
- A Unified Approach to Route Planning for Shared Mobility.pptx
- 同济大学:《软件测试》课程教学资源(PPT课件讲稿)第6章 功能测试(朱少民).ppt
- 香港理工大学:Introduction to Matlab(PPT讲稿)Image Processing with MATLAB.pptx
- 同济大学:《机器学习》课程教学资源(PPT讲稿)决策树 Decision Tree.pptx
- 河南中医药大学:《网络技术实训》课程教学资源(PPT课件讲稿)网络建设中的关键技术(主讲:路景鑫).pptx
- 《计算机网络》课程教学资源(PPT课件讲稿)第三章 数据链路层.ppt
- 《信息系统与数据库技术》课程教学资源(PPT课件讲稿)第4章 T-SQL与可编程对象.ppt
- 香港理工大学:数据仓库和数据挖掘(PPT讲稿)Data Warehousing & Data Mining.ppt
- 山西农业大学:大数据技术原理与应用(PPT讲稿)Development and application of bigdata technology.ppt
- Peer-to-Peer Networks:Distributed Algorithms for P2P Distributed Hash Tables.ppt
- 中国科学技术大学:《计算机体系结构》课程教学资源(PPT课件讲稿)Chapter 01 量化设计与分析基础(主讲:周学海).ppt
- 《计算机视觉》课程教学资源(PPT课件讲稿)边缘和线特征提取.ppt
- 厦门大学:《数据库系统原理》课程教学资源(PPT课件讲稿,2016版)第五章 数据库完整性.ppt
- 四川大学:《Linux操作系统》课程教学资源(PPT课件讲稿)第2章 Linux操作系统管理基础.ppt
- 《数据结构》课程教学资源(PPT课件讲稿)第六章 树与二叉树(6.1-6.3).ppt
- 《Java语言程序设计》课程教学资源(PPT课件讲稿)第三章 Java面向对象程序设计.ppt
- 香港科技大学:Advanced Topics in Next Generation Wireless Networks.ppt
- 《图像处理与计算机视觉 Image Processing and Computer Vision》课程教学资源(PPT课件讲稿)Chapter 04 Feature extraction and tracking.pptx
- 面向服务的业务流程管理(PPT讲稿)Introduction to Business Process Management(BPM).pptx
- 《Computer Networking:A Top Down Approach》英文教材教学资源(PPT课件讲稿,6th edition)Chapter 6 无线和移动网络 Wireless and Mobile Networks.ppt
- “互联网+”与“+互联网”(PPT讲稿).pptx
- 《C语言程序设计》课程电子教案(PPT课件讲稿)第六章 函数.ppt
- 南京大学:可信软件(PPT讲稿)认识、度量与评估.ppt
- 电子工业出版社:《计算机网络》课程教学资源(第五版,PPT课件讲稿)第二章 物理层.ppt
- 中国科学技术大学:《嵌入式系统设计》课程教学资源(PPT课件讲稿)第2章 ARM微处理器概述与编程模型(王行甫).ppt