同济大学:《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源(PPT课件讲稿)Getting to Know Your Data

Big Data Analysis and Mining Lecture 2: Getting to Know your Data Weixiong Rao饶卫雄 Tongji University同济大学软件学院 2015 Fall wxrao@tongji.edu.cn 2021/2/9 同济大学软件学院 ool of Software Engineering. Tongpi Unversity
2021/2/9 1 Big Data Analysis and Mining Lecture 2: Getting to Know Your Data Weixiong Rao 饶卫雄 Tongji University 同济大学软件学院 2015 Fall wxrao@tongji.edu.cn

Getting to Know your Data Data Objects and Attribute Types (1 a Basic Statistical Descriptions of data Data visualization Measuring Data Similarity and Dissimilarity a Summary 同济大学软件学院 ool of Software Engineering. Tongpi Unversity
2 Getting to Know Your Data ◼ Data Objects and Attribute Types ◼ Basic Statistical Descriptions of Data ◼ Data Visualization ◼ Measuring Data Similarity and Dissimilarity ◼ Summary

Types of Data Sets ■ Record ◆ Relational records Data matrix, e.g., numerical matrix, crosstabs Document data: text documents term frequency vector Document 1 ◆ Transaction data Graph and network Document 2 0 ◆ World wide Web 003 Document 3 00 2 Social or information networks Molecular structures Ordered TD tems Video data: sequence of images Bread. Coke. Milk Temporal data: time-series Beer. bread Sequential Data transaction sequences Genetic Beer, Coke, Diaper, Milk I Spatial, image and multimedia Beer, Bread, Diaper, Milk Spatial data: maps Coke, Diaper, Milk Image data Video data 同济大学软件学院 ool of Software Engineering. Tongpi Unversity 3
3 Types of Data Sets ◼ Record ◆ Relational records ◆ Data matrix, e.g., numerical matrix, crosstabs ◆ Document data: text documents: term - frequency vector ◆ Transaction data ◼ Graph and network ◆ World Wide Web ◆ Social or information networks ◆ Molecular Structures ◼ Ordered ◆ Video data: sequence of images ◆ Temporal data: time-series ◆ Sequential Data: transaction sequences ◆ Genetic sequence data ◼ Spatial, image and multimedia: ◆ Spatial data: maps ◆ Image data: ◆ Video data: D o c u m e n t 1 season timeout lost wi n game score ball pla y coach team D o c u m e n t 2 D o c u m e n t 3 3 0 5 0 2 6 0 2 0 2 00 7 0 2 1 0 0 3 0 0 1 0 0 1 2 2 0 3 0 TID Items 1 Bread, Coke, Milk 2 Beer, Bread 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk

Important Characteristics of Structured Data a Dimensionality Curse of dimensionality a Sparsity Only presence counts ■ Resolution Patterns depend on the scale ■ Distribution Centrality and dispersion 同济大学软件学院 ool of Software Engineering. Tongpi Unversity
4 Important Characteristics of Structured Data ◼ Dimensionality ◆ Curse of dimensionality ◼ Sparsity ◆ Only presence counts ◼ Resolution ◆ Patterns depend on the scale ◼ Distribution ◆ Centrality and dispersion

Data Obiects a Data sets are made up of data objects a data object represents an entity ■上 xamples e sales database: customers store items sales medical database: patients, treatments university database: students, professors, courses Also called samples, examples, instances, data points, objects, tuples a Data objects are described by attributes a Database rows -> data objects columns ->attributes 同济大学软件学院 ool of Software Engineering. Tongpi Unversity
5 Data Objects ◼ Data sets are made up of data objects. ◼ A data object represents an entity. ◼ Examples: ◆ sales database: customers, store items, sales ◆ medical database: patients, treatments ◆ university database: students, professors, courses ◼ Also called samples , examples, instances, data points, objects, tuples. ◼ Data objects are described by attributes. ◼ Database rows -> data objects; columns ->attributes

Attributes Attribute(or dimensions, features, variables a data field, representing a characteristic or feature of a data object E.g., customer D, name, address Types ◆ Nominal ◆ Binary ◆ Numeric: quantitative a Interval-scaled □ Ratio-sca|ed 6 同济大学软件学院 ool of Software Engineering. Tongpi Unversity
6 Attributes ◼ Attribute (or dimensions, features, variables): a data field, representing a characteristic or feature of a data object. ◆ E.g., customer _ID, name, address ◼ Types: ◆ Nominal ◆ Binary ◆ Numeric: quantitative Interval-scaled Ratio-scaled

Attribute Types a Nominal: categories, states, or"names of things Hair color=auburn, black, blond, brown, grey, red, white) .marital status, occupation, ID numbers, zip codes Binary Nominal attribute with only 2 states(0 and 1) Symmetric binary: both outcomes equally important 口e.g., gender Asymmetric binary outcomes not equally important o e.g., medical test(positive Vs. negative o Convention: assign 1 to most important outcome( e.g., HIV positive) Ordinal Values have a meaningful order(ranking) but magnitude between successive values is not known Size =Small, medium, large), grades, army rankings 同济大学软件学院 ool of Software Engineering. Tongpi Unversity
7 Attribute Types ◼ Nominal: categories, states, or “names of things” ◆ Hair_color = {auburn, black, blond, brown, grey, red, white} ◆ marital status, occupation, ID numbers, zip codes ◼ Binary ◆ Nominal attribute with only 2 states (0 and 1) ◆ Symmetric binary: both outcomes equally important e.g., gender ◆ Asymmetric binary: outcomes not equally important. e.g., medical test (positive vs. negative) Convention: assign 1 to most important outcome (e.g., HIV positive) ◼ Ordinal ◆ Values have a meaningful order (ranking) but magnitude between successive values is not known. ◆ Size = {small, medium, large}, grades, army rankings

Numeric Attribute Types a Quantity(integer or real-valued ■ nterval a Measured on a scale of equal-sized units a Values have order > E.g., temperature in C or F, calendar dates 口 No true zero-point Ratio 口 Inherent zero- point a We can speak of values as being an order of magnitude larger than the unit of measurement (10 K is twice as high as 5K) >e.g., temperature in Kelvin, length, counts, monetary guantities 8 同济大学软件学院 ool of Software Engineering. Tongpi Unversity
8 Numeric Attribute Types ◼ Quantity (integer or real-valued) ◼ Interval Measured on a scale of equal-sized units Values have order ➢ E.g., temperature in C˚or F˚, calendar dates No true zero-point ◼ Ratio Inherent zero-point We can speak of values as being an order of magnitude larger than the unit of measurement (10 K˚ is twice as high as 5 K˚). ➢ e.g., temperature in Kelvin, length, counts, monetary quantities

Discrete vs. Continuous Attributes Discrete Attribute o Has only a finite or countably infinite set of values o E.g., zip codes, profession, or the set of words in a collection of documents Sometimes, represented as integer variables Note: Binary attributes are a special case of discrete attributes Continuous Attribute e Has real numbers as attribute values D E.g., temperature, height, or weight Practically, real values can only be measured and represented using a finite number of digits o Continuous attributes are typically represented as floating-point variables 同济大学软件学院 ool of Software Engineering. Tongpi Unversity
9 Discrete vs. Continuous Attributes ◼ Discrete Attribute ◆ Has only a finite or countably infinite set of values E.g., zip codes, profession, or the set of words in a collection of documents ◆ Sometimes, represented as integer variables ◆ Note: Binary attributes are a special case of discrete attributes ◼ Continuous Attribute ◆ Has real numbers as attribute values E.g., temperature, height, or weight ◆ Practically, real values can only be measured and represented using a finite number of digits ◆ Continuous attributes are typically represented as floating-point variables

Getting to Know your Data a Data objects and Attribute types a Basic Statistical Descriptions of data Data visualization Measuring Data Similarity and Dissimilarity a Summary 同济大学软件学院 10 ool of Software Engineering. Tongpi Unversity
10 Getting to Know Your Data ◼ Data Objects and Attribute Types ◼ Basic Statistical Descriptions of Data ◼ Data Visualization ◼ Measuring Data Similarity and Dissimilarity ◼ Summary
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《计算机系统安全》课程PPT教学课件(信息安全与管理)第九章 防火墙.ppt
- 《计算机网络》课程教学资源(PPT课件讲稿)第六章 传输层.ppt
- 《PHP程序设计》教学资源(PPT课件讲稿)项目七 Ajax商品发布.ppt
- 《电脑组装与维护实例教程》教学资源(PPT课件讲稿)第14章 系统的维护.ppt
- 东北大学:《可信计算基础》课程教学资源(PPT课件讲稿)第五讲 分布式系统的安全(主讲:周福才).ppt
- 《运筹学与最优化方法》课程教学资源(PPT课件讲稿)第十章 智能优化计算简介.ppt
- 《3ds Max 9》教学资源(PPT课件)第8章 灯光、摄影机、渲染输出.ppt
- 编译程序构造 COMPILER CONSTRUCTION(PPT讲稿)原理与实践 Principles and Practice.ppt
- 上海交通大学:《程序设计》课程教学资源(PPT课件讲稿)第7章 间接访问——指针.ppt
- 《数据库系统概论》课程教学资源(PPT课件讲稿)数据结构实用教程(共十章).ppt
- 大连理工大学:《计算机网络》课程教学资源(PPT课件讲稿)Chapter 1 Introduction(roadmap,主讲:孙伟峰).ppt
- 《计算机网络基础》课程PPT教学课件(讲稿)第4章 IP协议.ppt
- 西安交通大学:《微机原理与接口技术》课程教学资源(PPT课件讲稿)第4章 存储器系统接口.ppt
- 《网页设计与制作》课程PPT教学课件(Fireworks Mx 2004)第九章 Firework图像处理.ppt
- 《数据结构》课程教学资源(PPT讲稿)二叉树和二叉搜索树 Trees, Binary Trees, and Binary Search Trees.ppt
- Robust Networking Architecture and Secure Communication Scheme for Heterogeneous Wireless Sensor Networks.pptx
- 中国科学技术大学:《算法基础》课程教学资源(PPT课件讲稿)第五讲 概率分析与随机算法.pptx
- 同济大学:《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源(PPT课件讲稿)Data Preprocessing.ppt
- 《编译原理与技术》课程教学资源(PPT课件讲稿)运行环境.ppt
- 华南理工大学:神经计算的生理和动力学指标(PPT讲稿).ppt
- 香港浸会大学:Computer Security(PPT课件讲稿)Cryptography Chapter 1 Symmetric Ciphers.ppt
- 《计算机文化基础》课程教学资源(PPT课件讲稿)第九章 多媒体技术基础.ppt
- 数据挖掘10大算法产生过程(PPT讲稿).ppt
- 清华大学:高校信息化建设理论与规划(PPT讲稿).ppt
- 《汇编语言程序设计》课程教学资源(PPT课件讲稿)第二章 IBM-PC微机的功能结构.ppt
- 《软件工程》课程教学资源(PPT课件讲稿)详细设计.ppt
- 同济大学:《大数据分析与数据挖掘 Big Data Analysis and Mining》课程教学资源(PPT课件讲稿)Decision Tree.ppt
- 上海交通大学:《网络科学导论》课程PPT教学课件(Network Science An Introduction)Chapter 4 Degree Correlations & Community Structure.pptx
- 中国科学技术大学:《数据结构与数据库》课程教学资源(PPT课件讲稿)第五章 串和数组.pps
- 最小生成树(PPT课件讲稿)Minimum Spanning Trees.pptx
- 《数据结构》课程教学资源(PPT课件讲稿)第10章 内排序.ppt
- jQuery个人主页(PPT讲稿).ppt
- 《Internet技术与应用》课程PPT教学课件(讲稿)第3讲 双绞线制作和传输介质.ppt
- 中国铁道出版社:《局域网技术与组网工程》课程教学资源(PPT课件讲稿)第4章 Windows Server系统工程.ppt
- 《电子商务概论》课程教学资源(PPT课件)第十章 电子商务安全技术.ppt
- 《C程序设计》课程电子教案(PPT课件讲稿)第二章 基本数据类型及运算.ppt
- 中国科学技术大学:云计算基本概念、关键技术、应用领域及发展趋势.pptx
- 南京大学:《面向对象技术 OOT》课程教学资源(PPT课件讲稿)异常处理 Exception Handling.ppt
- 《计算机系统结构》课程教学资源(PPT课件讲稿)第三章 流水线技术.ppt
- 四川大学:Object-Oriented Design and Programming(Java,PPT课件)3.2 Graphical User Interface.ppt