《系统软件与软件安全》课程教学课件(PPT讲稿,英文)Lecture-2-access-patterns-in-big-data

TheWeakeningandDelayed Effects ofLongTail DistributionsinBigDataAccesses
1 The Weakening and Delayed Effects of Long Tail Distributions in Big Data Accesses

Big Data and Power Law#of hitstoeachdataobjectPopularityranksforeachdataobjecTotherights(theyellowregion)isthelongtailof lower8o%objects;to the left are the few that dominate (the top 20%objects).Withlimitedspacetostoreobjectsandlimitedsearchability to a large volume of objects, most attentions and hitshave to be in the top 2o% objects, ignoring the long tail
To the rights (the yellow region) is the long tail of lower 80% objects; to the left are the few that dominate (the top 20% objects). With limited space to store objects and limited search ability to a large volume of objects, most attentions and hits have to be in the top 20% objects, ignoring the long tail. # of hits to each data object Popularity ranks for each data object Big Data and Power Law

The Change of Time (short search latency) and Space (unlimited storagecapacity)for BigDataCreatesDifferent Data AccessDistributionsTraditional longtaildistributionFlattereddistributionafterthelongtailcanbeeasilyaccessedTheheadis loweredandthetailisdropped moreandmoreslowlyIftheflattereddistributionisnotpowerlawanymore,whatisit?
Traditional long tail distribution Flattered distribution after the long tail can be easily accessed • The head is lowered and the tail is dropped more and more slowly • If the flattered distribution is not power law anymore, what is it? The Change of Time (short search latency) and Space (unlimited storage capacity) for Big Data Creates Different Data Access Distributions

DistributionChangesinDVDsinNetflix2000to201180%70%60%Lessdemandforthetop50050%40%201120002005predicted30%20%Moredemandforthe"middle'10%Longertail(15%ofdemandcomefrombeyondrank.3.000.whereandmortarretailersrun.outof.inventory)0%005TOOSST0000T0OOTTOOST0002T0050000EOOET0000OOOST00090000O0SO0059000000a0=aS=O0a2=00500008-UnO555550CTThegrowthofNetflixselections(today:30millionUSusers,40millionuserstotal,1/3streamingtrafficofInternet)2000:4.500DVDs.2005:18.000DVDs-2011:over100,000DvDs(thelongtailwouldbedroppedevenmoreslowlyformoredemands)Note:"breaksandmortarretailers":face-to-facesellshops
• The growth of Netflix selections ( today: 30 million US users, 40 million users total, 1/3 streaming traffic of Internet) – 2000: 4,500 DVDs, 2005: 18,000 DVDs – 2011: over 100,000 DVDs (the long tail would be dropped even more slowly for more demands) – Note: “breaks and mortar retailers”: face-to-face sell shops. Distribution Changes in DVDs in Netflix 2000 to 2011 2011 predicted

Amazon Case: Growth of Sales from the Changes of Time/SpaceAmazonNorthAmericaMediaSalesBarnes&NobleChainStoreSalesBordersChainStoreSales$7.0$6.0SAL$5.0ES$4.0B1$3.0LV1$2.0oN$1.0s$0.0020304050607080910Fromwwrw.fonerbooks.com/booksale.htmBN Sales shown without BN.com to contrast online vs offlineBorders &BNSalesFY ends Q1 2011,shown as2010
Amazon Case: Growth of Sales from the Changes of Time/Space

We Must Find the New Distribution for Big Data AccessesInternet stores all kinds of huge big data sets_ The rapid growth and wide distribution of Internet mediacontent is a representative case study of big data The media content is carried by scalable distributed systemsWehope distribution model developed is- General purpose for other applications of big data- Scalability nature of both data and systems1
We Must Find the New Distribution for Big Data Accesses • Internet stores all kinds of huge big data sets – The rapid growth and wide distribution of Internet media content is a representative case study of big data – The media content is carried by scalable distributed systems • We hope distribution model developed is – General purpose for other applications of big data – Scalability nature of both data and systems 7

Zipf distributionis believed the generalmodel of data access patternsZipfdistribution(powerlaw)Characterizesthepropertyofscaleinvariance-Heawytailed,scalefree80-20ruleheavy tailIncomedistribution:80%ofsocialwealth-owned by20% people (Pareto law)Webtraffic:80%Webrequestsaccess20% pages (Breslau,INFOCOM'99)y, oαci-αα.0.6~0.8Systemimplicationsi:rank of objects-Objectivelycachingtheworkingsetinyi : number of referencesproxy-Significantlyreducenetworktraffic8
8 Zipf distribution is believed the general model of data access patterns • Zipf distribution (power law) – Characterizes the property of scale invariance – Heavy tailed, scale free • 80-20 rule – Income distribution: 80% of social wealth owned by 20% people (Pareto law) – Web traffic: 80% Web requests access 20% pages (Breslau, INFOCOM’99) • System implications – Objectively caching the working set in proxy – Significantly reduce network traffic log i log y slope: -a i y i−a i : rank of objects yi : number of references a: 0.6~0.8 i y heavy tail

Does Internet media trafficfollow Zipf's law?Webmedia systemsVoDmedia systemsaudlo/videoChesire,USITS'O1:Zipf-likeAcharya,MMcN'oo:non-ZipfCherkasova,NOSSDAVo2:non-ZipfYu,EUROSYS'O6:Zipf-likeP2PmediasystemsLivestreamingandIPTVsystemsVeloso,IMW'02:Zipf-likeGummadi,SOSPo3:non-Zipf9Sripanidkulchai,IMC'04:non-Zipflamnitchi,INFOCOM'O4:Zipf-like
9 Does Internet media traffic follow Zipf’s law? Chesire, USITS’01: Zipf-like Cherkasova, NOSSDAV’02: non-Zipf Acharya, MMCN’00: non-Zipf Yu, EUROSYS’06: Zipf-like Web media systems VoD media systems Live streaming and IPTV systems Veloso, IMW’02: Zipf-like Sripanidkulchai, IMC’04: non-Zipf P2P media systems Gummadi, SOSP’03: non-Zipf Iamnitchi, INFOCOM’04: Zipf-like

Inconsistent media access pattern modelsStill basedontheZipfmodel-Zipfwithexponential cutoff-Zipf-Mandelbrotdistribution- Generalized Zipf-like distributionheuristicassumptions-Two-modeZipfdistribution-Fetch-at-most-onceeffect-ParabolicfractaldistributionAllcasestudies-Basedononeortwoworkloads- Different from or even conflict with each otherAninsightfulunderstandingisessentialto-Contentdelivery systemdesign-Internetresourceprovisioning- Performance optimization10
10 Inconsistent media access pattern models • Still based on the Zipf model – Zipf with exponential cutoff – Zipf-Mandelbrot distribution – Generalized Zipf-like distribution – Two-mode Zipf distribution – Fetch-at-most-once effect – Parabolic fractal distribution – . • All case studies – Based on one or two workloads – Different from or even conflict with each other • An insightful understanding is essential to – Content delivery system design – Internet resource provisioning – Performance optimization heuristic assumptions

ResearchObjectives: Find a general distribution model of Internet mediaaccess patterns as a case for big data- Comprehensive measurements and experiments- Rigorous mathematical analysis and modeling- Insights into media system designs11
11 Research Objectives • Find a general distribution model of Internet media access patterns as a case for big data – Comprehensive measurements and experiments – Rigorous mathematical analysis and modeling – Insights into media system designs
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《系统软件与软件安全》课程教学课件(PPT讲稿,英文)Lecture-1-balanced-systems-updated.pptx
- 《系统软件与软件安全》课程教学资源(文献资料)系统软件与软件安全文献合集.pdf
- 济南大学:研究生院《人工智能》专业课程教学大纲汇编.pdf
- 济南大学:研究生院《计算机技术》专业课程教学大纲汇编.pdf
- 济南大学:研究生院《计算机科学与技术》专业课程教学大纲汇编.pdf
- 北京信息科技大学:研究生院计算机学院课程教学大纲汇编.pdf
- 湖南工业大学:计算机与人工智能学院人工智能专业课程教学大纲汇编(2023版人才培养方案).pdf
- 湖南工业大学:计算机与人工智能学院智能科学与技术专业课程教学大纲汇编(2023版人才培养方案).pdf
- 湖南工业大学:计算机与人工智能学院物联网工程专业课程教学大纲汇编(2023版人才培养方案).pdf
- 湖南工业大学:计算机与人工智能学院网络工程专业课程教学大纲汇编(2023版人才培养方案).pdf
- 湖南工业大学:计算机与人工智能学院通信工程专业课程教学大纲汇编(2023版人才培养方案).pdf
- 湖南工业大学:计算机与人工智能学院软件工程专业课程教学大纲汇编(2023版人才培养方案).pdf
- 华中科技大学:计算机科学与技术学院《机器学习》课程教学大纲(2021版).pdf
- 华中科技大学:计算机科学与技术学院《计算机图形学》课程教学大纲(2021版).pdf
- 华中科技大学:计算机科学与技术学院《计算理论》课程教学大纲(2021版).pdf
- 华中科技大学:计算机科学与技术学院《计算思维》课程教学大纲(2021版).pdf
- 华中科技大学:计算机科学与技术学院《接口技术》课程教学大纲(2021版).pdf
- 华中科技大学:计算机科学与技术学院《命令式计算原理》课程教学大纲(2021版).pdf
- 华中科技大学:计算机科学与技术学院《人工智能导论》课程教学大纲(2021版).pdf
- 华中科技大学:计算机科学与技术学院《嵌入式系统》课程教学大纲(2021版).pdf
- 《系统软件与软件安全》课程教学课件(PPT讲稿,英文)Lecture-3-MR-model-and-systems.pptx
- 《系统软件与软件安全》课程教学课件(PPT讲稿,英文)Lecture-4-LSbM-tree.pptx
- 《系统软件与软件安全》课程教学课件(PPT讲稿,英文)Lecture-7-big-volume-data-accesses.pptx
- 《系统软件与软件安全》课程教学课件(PPT讲稿,英文)Lecture-6-locks-and-CC.pptx
- 《系统软件与软件安全》课程教学课件(PPT讲稿,英文)Lecture-7-SSD-sys.pptx
- 《系统软件与软件安全》课程教学课件(PPT讲稿,英文)Lecture-8-SDS-vision.pptx
- 江苏科技大学:《计算机组成原理》课程教学资源(PPT课件,完整讲稿,共十章).pptx
- 江苏科技大学:《微机原理与接口技术》课程教学资源(PPT课件)Chapter1_1计算机基础知识.pptx
- 江苏科技大学:《微机原理与接口技术》课程教学资源(PPT课件)Chapter1_2计算机中数的表示和编码.pptx
- 江苏科技大学:《微机原理与接口技术》课程教学资源(PPT课件)Chapter2_1 8086-8088微处理器结构.pptx
- 江苏科技大学:《微机原理与接口技术》课程教学资源(PPT课件)Chapter2_2 8086-8088的寻址方式.pptx
- 江苏科技大学:《微机原理与接口技术》课程教学资源(PPT课件)Chapter2_3 8086-8088的指令系统.pptx
- 江苏科技大学:《微机原理与接口技术》课程教学资源(PPT课件)Chapter2_4逻辑指令-控制转移指令.pptx
- 江苏科技大学:《微机原理与接口技术》课程教学资源(PPT课件)Chapter2_5处理机控制-串处理指令.pptx
- 江苏科技大学:《微机原理与接口技术》课程教学资源(PPT课件)Chapter3_1汇编语言及其程序结构.pptx
- 江苏科技大学:《微机原理与接口技术》课程教学资源(PPT课件)Chapter3_2汇编语言程序举例.pptx
- 江苏科技大学:《微机原理与接口技术》课程教学资源(PPT课件)Chapter3_3 BIOS和DOS中断功能调用.pptx
- 江苏科技大学:《微机原理与接口技术》课程教学资源(PPT课件)Chapter3_4 汇编语言程序设计.pptx
- 江苏科技大学:《微机原理与接口技术》课程教学资源(PPT课件)Chapter3_5 汇编语言程序设计小结.pptx
- 江苏科技大学:《微机原理与接口技术》课程教学资源(PPT课件)Chapter4_1 PC机的总线结构和时序.pptx
