中国高校课件下载中心 》 教学资源 》 大学文库

《系统软件与软件安全》课程教学课件(PPT讲稿,英文)Lecture-2-access-patterns-in-big-data

文档信息
资源类别:文库
文档格式:PPTX
文档页数:34
文件大小:4.41MB
团购合买:点击进入团购
内容简介
《系统软件与软件安全》课程教学课件(PPT讲稿,英文)Lecture-2-access-patterns-in-big-data
刷新页面文档预览

TheWeakeningandDelayed Effects ofLongTail DistributionsinBigDataAccesses

1 The Weakening and Delayed Effects of Long Tail Distributions in Big Data Accesses

Big Data and Power Law#of hitstoeachdataobjectPopularityranksforeachdataobjecTotherights(theyellowregion)isthelongtailof lower8o%objects;to the left are the few that dominate (the top 20%objects).Withlimitedspacetostoreobjectsandlimitedsearchability to a large volume of objects, most attentions and hitshave to be in the top 2o% objects, ignoring the long tail

To the rights (the yellow region) is the long tail of lower 80% objects; to the left are the few that dominate (the top 20% objects). With limited space to store objects and limited search ability to a large volume of objects, most attentions and hits have to be in the top 20% objects, ignoring the long tail. # of hits to each data object Popularity ranks for each data object Big Data and Power Law

The Change of Time (short search latency) and Space (unlimited storagecapacity)for BigDataCreatesDifferent Data AccessDistributionsTraditional longtaildistributionFlattereddistributionafterthelongtailcanbeeasilyaccessedTheheadis loweredandthetailisdropped moreandmoreslowlyIftheflattereddistributionisnotpowerlawanymore,whatisit?

Traditional long tail distribution Flattered distribution after the long tail can be easily accessed • The head is lowered and the tail is dropped more and more slowly • If the flattered distribution is not power law anymore, what is it? The Change of Time (short search latency) and Space (unlimited storage capacity) for Big Data Creates Different Data Access Distributions

DistributionChangesinDVDsinNetflix2000to201180%70%60%Lessdemandforthetop50050%40%201120002005predicted30%20%Moredemandforthe"middle'10%Longertail(15%ofdemandcomefrombeyondrank.3.000.whereandmortarretailersrun.outof.inventory)0%005TOOSST0000T0OOTTOOST0002T0050000EOOET0000OOOST00090000O0SO0059000000a0=aS=O0a2=00500008-UnO555550CTThegrowthofNetflixselections(today:30millionUSusers,40millionuserstotal,1/3streamingtrafficofInternet)2000:4.500DVDs.2005:18.000DVDs-2011:over100,000DvDs(thelongtailwouldbedroppedevenmoreslowlyformoredemands)Note:"breaksandmortarretailers":face-to-facesellshops

• The growth of Netflix selections ( today: 30 million US users, 40 million users total, 1/3 streaming traffic of Internet) – 2000: 4,500 DVDs, 2005: 18,000 DVDs – 2011: over 100,000 DVDs (the long tail would be dropped even more slowly for more demands) – Note: “breaks and mortar retailers”: face-to-face sell shops. Distribution Changes in DVDs in Netflix 2000 to 2011 2011 predicted

Amazon Case: Growth of Sales from the Changes of Time/SpaceAmazonNorthAmericaMediaSalesBarnes&NobleChainStoreSalesBordersChainStoreSales$7.0$6.0SAL$5.0ES$4.0B1$3.0LV1$2.0oN$1.0s$0.0020304050607080910Fromwwrw.fonerbooks.com/booksale.htmBN Sales shown without BN.com to contrast online vs offlineBorders &BNSalesFY ends Q1 2011,shown as2010

Amazon Case: Growth of Sales from the Changes of Time/Space

We Must Find the New Distribution for Big Data AccessesInternet stores all kinds of huge big data sets_ The rapid growth and wide distribution of Internet mediacontent is a representative case study of big data The media content is carried by scalable distributed systemsWehope distribution model developed is- General purpose for other applications of big data- Scalability nature of both data and systems1

We Must Find the New Distribution for Big Data Accesses • Internet stores all kinds of huge big data sets – The rapid growth and wide distribution of Internet media content is a representative case study of big data – The media content is carried by scalable distributed systems • We hope distribution model developed is – General purpose for other applications of big data – Scalability nature of both data and systems 7

Zipf distributionis believed the generalmodel of data access patternsZipfdistribution(powerlaw)Characterizesthepropertyofscaleinvariance-Heawytailed,scalefree80-20ruleheavy tailIncomedistribution:80%ofsocialwealth-owned by20% people (Pareto law)Webtraffic:80%Webrequestsaccess20% pages (Breslau,INFOCOM'99)y, oαci-αα.0.6~0.8Systemimplicationsi:rank of objects-Objectivelycachingtheworkingsetinyi : number of referencesproxy-Significantlyreducenetworktraffic8

8 Zipf distribution is believed the general model of data access patterns • Zipf distribution (power law) – Characterizes the property of scale invariance – Heavy tailed, scale free • 80-20 rule – Income distribution: 80% of social wealth owned by 20% people (Pareto law) – Web traffic: 80% Web requests access 20% pages (Breslau, INFOCOM’99) • System implications – Objectively caching the working set in proxy – Significantly reduce network traffic log i log y slope: -a i y i−a  i : rank of objects yi : number of references a: 0.6~0.8 i y heavy tail

Does Internet media trafficfollow Zipf's law?Webmedia systemsVoDmedia systemsaudlo/videoChesire,USITS'O1:Zipf-likeAcharya,MMcN'oo:non-ZipfCherkasova,NOSSDAVo2:non-ZipfYu,EUROSYS'O6:Zipf-likeP2PmediasystemsLivestreamingandIPTVsystemsVeloso,IMW'02:Zipf-likeGummadi,SOSPo3:non-Zipf9Sripanidkulchai,IMC'04:non-Zipflamnitchi,INFOCOM'O4:Zipf-like

9 Does Internet media traffic follow Zipf’s law? Chesire, USITS’01: Zipf-like Cherkasova, NOSSDAV’02: non-Zipf Acharya, MMCN’00: non-Zipf Yu, EUROSYS’06: Zipf-like Web media systems VoD media systems Live streaming and IPTV systems Veloso, IMW’02: Zipf-like Sripanidkulchai, IMC’04: non-Zipf P2P media systems Gummadi, SOSP’03: non-Zipf Iamnitchi, INFOCOM’04: Zipf-like

Inconsistent media access pattern modelsStill basedontheZipfmodel-Zipfwithexponential cutoff-Zipf-Mandelbrotdistribution- Generalized Zipf-like distributionheuristicassumptions-Two-modeZipfdistribution-Fetch-at-most-onceeffect-ParabolicfractaldistributionAllcasestudies-Basedononeortwoworkloads- Different from or even conflict with each otherAninsightfulunderstandingisessentialto-Contentdelivery systemdesign-Internetresourceprovisioning- Performance optimization10

10 Inconsistent media access pattern models • Still based on the Zipf model – Zipf with exponential cutoff – Zipf-Mandelbrot distribution – Generalized Zipf-like distribution – Two-mode Zipf distribution – Fetch-at-most-once effect – Parabolic fractal distribution – . • All case studies – Based on one or two workloads – Different from or even conflict with each other • An insightful understanding is essential to – Content delivery system design – Internet resource provisioning – Performance optimization heuristic assumptions

ResearchObjectives: Find a general distribution model of Internet mediaaccess patterns as a case for big data- Comprehensive measurements and experiments- Rigorous mathematical analysis and modeling- Insights into media system designs11

11 Research Objectives • Find a general distribution model of Internet media access patterns as a case for big data – Comprehensive measurements and experiments – Rigorous mathematical analysis and modeling – Insights into media system designs

刷新页面下载完整文档
VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
相关文档