《多元统计分析》课程教学资源(阅读材料)A visual tour of interactive graphics with R

1p1ots.tex,b2648c8on2011/04/01 A visual tour of interactive graphics with R Christophe Lalanne* March,2011 Abstract:We here describe simple use of interactive data analysis using the iPlots R package.The idea is to use brushing in linked graphics to foster exploratory analysis and model diagnostic.Other R packages are discussed. Packages::iPlots·rgl·rggobi 1 Motivations Far from being an exhaustive review of interactive and dynamic statistical graphics,the idea here is to review some of the available capabilities in R.A larger review is provided in Cook and Swayne(2007),using the GGobi software and its R interface. We will focus on two aspects of interactive visualization,namely brushing (Becker and Cleve- land,1988)and 3D interactivty. 2 The iPlots eXtreme package The iPlots eXtreme package,aka Acinonyx (Urbanek,2009),is available from .It should supersede the traditional iPlots package.Although its functionnalities may appear rather limited at the moment,it already allows the user to explore data in an interactive manner,with linking and brushing enabled by default. Let us assume a simple linear model of the form yi=0.4 x i+E,where i~(0,12),that can be readily simulated in R as follows: set.seed(101) n) Well,it merely summarizes the type of object that is being plotted,and its address in memory. More information can be gathered by looking at its class: E-mail:ch.lalanneat|gmail.com.Text available on www.aliquote.org,in /articles/tech/rvisuals ens!A d 一●●●●●●●●
R Visuals iplots.tex, b2648c8 on 2011/04/01 A visual tour of interactive graphics with R Christophe Lalanne∗ March, 2011 Abstract: We here describe simple use of interactive data analysis using the iPlots R package. The idea is to use brushing in linked graphics to foster exploratory analysis and model diagnostic. Other R packages are discussed. Packages: iPlots • rgl • rggobi 1 Motivations Far from being an exhaustive review of interactive and dynamic statistical graphics, the idea here is to review some of the available capabilities in R. A larger review is provided in Cook and Swayne (2007), using the GGobi software and its R interface. We will focus on two aspects of interactive visualization, namely brushing (Becker and Cleveland, 1988) and 3D interactivty. 2 The iPlots eXtreme package The iPlots eXtreme package, aka Acinonyx (Urbanek, 2009), is available from . It should supersede the traditional iPlots package. Although its functionnalities may appear rather limited at the moment, it already allows the user to explore data in an interactive manner, with linking and brushing enabled by default. Let us assume a simple linear model of the form yi = 0.4 × xi + εi , where εi ∼ N (0, 1 2 ), that can be readily simulated in R as follows: set.seed(101) n ) Well, it merely summarizes the type of object that is being plotted, and its address in memory. More information can be gathered by looking at its class: ∗ E-mail: ch.lalanne|at|gmail.com. Text available on www.aliquote.org, in /articles/tech/rvisuals

1p1ots.tex,b2648c8on2011/04/01 [1]"iScatterplot""iPlot" "iVisual" "iObject" In fact,our scatterplot is a subclass of iplot It does not support the 'formula'interface, so data must be entered separately as x and y.However,overplotting is done by using transparency which results in nice-looking plots,while allowing to get a feel of the 2D density. Now,adding a regression line is as simple as ip lm(y -x) If we ask for an histogram of the i,the new plot will be automatically linked to the previous one.Note that it brings out a new graphic device,but we will learn shortly how to put them in a common frame. ihist(x) The top panel shows a scatterplot and an histogram for the same data after we selected a certain range of x values.On the bottom panel,we do the reverse and select statistical units in the scatterplot. 件09 3 The rgl package The rgl package,,uses OpenGL as a rendering engine,and provides interesting 3D viewing option,otherwise lacking in R. To get a feel of rgl capabilities,just try demo(bivar) to show up a parametric density surface of a bivariate normal distribution. sjens!A y ●●●●●●●●一 品
R Visuals iplots.tex, b2648c8 on 2011/04/01 [1] "iScatterplot" "iPlot" "iVisual" "iObject" In fact, our scatterplot is a subclass of iPlot It does not support the ‘formula’ interface, so data must be entered separately as x and y. However, overplotting is done by using transparency which results in nice-looking plots, while allowing to get a feel of the 2D density. Now, adding a regression line is as simple as ip + lm(y ~ x) If we ask for an histogram of the xi , the new plot will be automatically linked to the previous one. Note that it brings out a new graphic device, but we will learn shortly how to put them in a common frame. ihist(x) The top panel shows a scatterplot and an histogram for the same data, after we selected a certain range of x values. On the bottom panel, we do the reverse and select statistical units in the scatterplot. 3 The rgl package The rgl package, , uses OpenGL as a rendering engine, and provides interesting 3D viewing option, otherwise lacking in R. To get a feel of rgl capabilities, just try demo(bivar) to show up a parametric density surface of a bivariate normal distribution

1p1ots.tex,b2648c8on2011/04/01 The code to generate this figure is rather simple;here is a snipped version: n<-50;ngrid<-40 x <-rnorm(n);y <-rnorm(n) denobj<-kde2d(x,y, n=ngrid) den.z <-denobj$z xgrid <-denobjSx ygrid <-denobjsy bi.z <-dnorm(xgrid)%*t(dnorm(ygrid)) zscale<-20 Draws simulated data spheres3d(x,y,rep(0,n),radius=0.1) Draws non-parametric density surface3d(xgrid,ygrid,den.z*zscale,alpha=0.5) Draws parametric density surface3d(xgrid,ygrid,bi.z*zscale,front="lines") As an example,the following piece of code intends to show how PCA basically works.We first generate a matrix of random data,with a specific covariance structure,and then show the first three principal axes.Part of the code shown below comes from the excellent tutorials on Information Visualisation by Ross lhaka. sim.cor.data <-function(n=30,p=2,rho=0.6,sigma=1){ require(mvtnorm) H<-abs(outer(1:p,1:p,"-") V <-sigma rho H X <-rmvnorm(n,rep(0,p),V) return(X) X <-sim.cor.data(n=100,p=5) X.pca <-prcomp(X,scale=TRUE) Now,constructing the 3D plots is done as follows. rgl.open() rgl.bg(color="white") display the 3D cloud rgl.points(X.pca$x[,1:3],col="black",size=5,point_antialias=TRUE) set up a reference plane xyz.lims <-apply(X.pca$x[,1:3],2,range) sjens!A y ●●●一●●●●●
R Visuals iplots.tex, b2648c8 on 2011/04/01 The code to generate this figure is rather simple; here is a snipped version: n <- 50; ngrid <- 40 x <- rnorm(n); y <- rnorm(n) denobj <- kde2d(x, y, n=ngrid) den.z <-denobj$z xgrid <- denobj$x ygrid <- denobj$y bi.z <- dnorm(xgrid)%*%t(dnorm(ygrid)) zscale<-20 # Draws simulated data spheres3d(x,y,rep(0,n),radius=0.1) # Draws non-parametric density surface3d(xgrid,ygrid,den.z*zscale,alpha=0.5) # Draws parametric density surface3d(xgrid,ygrid,bi.z*zscale,front="lines") As an example, the following piece of code intends to show how PCA basically works. We first generate a matrix of random data, with a specific covariance structure, and then show the first three principal axes. Part of the code shown below comes from the excellent tutorials on Information Visualisation by Ross Ihaka. sim.cor.data <- function(n=30, p=2, rho=0.6, sigma=1) { require(mvtnorm) H <- abs(outer(1:p, 1:p, "-")) V <- sigma * rho^H X <- rmvnorm(n, rep(0,p), V) return(X) } X <- sim.cor.data(n=100, p=5) X.pca <- prcomp(X, scale=TRUE) Now, constructing the 3D plots is done as follows. rgl.open() rgl.bg(color="white") # display the 3D cloud rgl.points(X.pca$x[,1:3], col="black", size=5, point_antialias=TRUE) # set up a reference plane xyz.lims <- apply(X.pca$x[,1:3], 2, range)

1p1ots.tex,b2648c8on2011/04/01 bot.plane <-min(xyz.lims[1,3])-diff(xyz.lims[,3])/10 bot.plane <-mean(X.pca$x[,3]) rgl.surface(seq(xyz.lims[1,1],xyz.lims [2,1],length=10), seq(xyz.lims [1,2],xyz.lims [2,2],length=10), rep(bot.plane,10*10), color="#CCCCFF",front="lines") To capture the output,we can use rgl.snapshot(filename),where filename is the name of the PNG file to be saved. Instead of a reference plane,we could directly draw unit vectors rg1.1ines(c(0,1),c(0,0),c(0,0),col="red",1wd=2) rg1.1ines(c(0,0),c(0,1),c(0,0),col="red",1wd=2) rg1.1ines(c(0,0),c(0,0),c(0,1),co1="red",1wd=2) or axes (ranging from min to max observed values) rgl.lines(xyz.lims[,1],c(0,0),c(0,0),col="red",lwd=2) rgl.lines(c(0,0),xyz.lims[,2],c(0,0),col="red",lwd=2) rgl.lines(c(0,0),c(0,0),xyz.lims[,3],col="red",lwd=2) rg1.texts(c(xyz.1ims[2,1]+.5,-.15,-.15), c(-.15,xyz.1ims[2,2]+.5,-.15) c(-.15,-.15,xyz.1ims[2,3]+.5),1 etters[24:26],co1="red") Both results are shown below 1 Finally,there is no possibility of brushing an rgl device,but we can use spinning (here,360) with: for(i in seq(0,360,by =1)){ rgl.viewpoint(theta =i,phi =0) Sys.sleep(1/60) There are alternative and more practical ways to the above,as found in e.g.,ordirgl in the vegan package,or the BiplotGUI package that provides a complete environment for ens!A ●●●●●●●●
R Visuals iplots.tex, b2648c8 on 2011/04/01 bot.plane <- min(xyz.lims[1,3]) - diff(xyz.lims[,3])/10 bot.plane <- mean(X.pca$x[,3]) rgl.surface(seq(xyz.lims[1,1],xyz.lims[2,1], length=10), seq(xyz.lims[1,2],xyz.lims[2,2], length=10), rep(bot.plane, 10*10), color="#CCCCFF", front="lines") To capture the output, we can use rgl.snapshot(filename), where filename is the name of the PNG file to be saved. Instead of a reference plane, we could directly draw unit vectors rgl.lines(c(0,1), c(0,0), c(0,0), col="red", lwd=2) rgl.lines(c(0,0), c(0,1), c(0,0), col="red", lwd=2) rgl.lines(c(0,0), c(0,0), c(0,1), col="red", lwd=2) or axes (ranging from min to max observed values) rgl.lines(xyz.lims[,1], c(0,0), c(0,0), col="red", lwd=2) rgl.lines(c(0,0), xyz.lims[,2], c(0,0), col="red", lwd=2) rgl.lines(c(0,0), c(0,0), xyz.lims[,3], col="red", lwd=2) rgl.texts(c(xyz.lims[2,1]+.5,-.15,-.15), c(-.15,xyz.lims[2,2]+.5,-.15), c(-.15,-.15,xyz.lims[2,3]+.5), letters[24:26], col="red") Both results are shown below. Finally, there is no possibility of brushing an rgl device, but we can use spinning (here, 360◦ ) with: for(i in seq(0, 360, by = 1)) { rgl.viewpoint(theta = i, phi = 0) Sys.sleep(1/60) } There are alternative and more practical ways to the above, as found in e.g., ordirgl in the vegan package, or the BiplotGUI package that provides a complete environment for

1p1ots.tex,b2648c8on2011/04/01 manipulating biplots(Gower and Hand,1996),in 2D or 3D.For those who are seeking a more direct application of the commands discussed here,you can try to adapt the sphpca function in the psy package(Falissard,1996). 4 Back to the basics So far,we only talked about dedictaed environments for interactive visualization.However, the base R functionalities might still prove to be useful in some cases.In fact,the tcltk package offers a simple way to attach interactive buttons to the current device. Let's say we want to intercatively display the most extremes individuals on a given matrix of scores.'Extreme'could mean many things,but for now assume this is a percentile-based measure,for example the 5e and 95e percentile are used to flag individuals having extreme low or high scores. filter.perc <function(x,cutoff=c(.05,.95),id=NULL,collate=FALSE){ lh <quantile(x,cutoff,na.rm=TRUE) out <-list(x.low=which(x Ih[1]),x.high=which(x 1h[2])) if (!is.null(id)){ out [["x.low"]]<-id[out [["x.low"]]] out [["x.high"]]<-id[out [["x.high"]]] if (collate) out <unique(c(out [["x.low"]],out [["x.high"]])) return(out) n<-500 scores <-replicate(5,rnorm(n,mean=sample(20:40,1))) idx <-apply(scores,2,filter.perc,id=NULL,collate=TRUE) my.col <-as.numeric(1:n %in%unique(unlist(idx)))+1 splom(-scores,pch=19,col=my.col,alpha=.5,cex=.6) A simple display for the distribution of these five series of scores is shown below,with individuals in red corresponding to those being in the lowest or highest fifth percentile.(Also,keep in mind that is done in a purely univariate manner.) Now,what about varying the thresholds for highlighting individuals?Instead of repeating the same steps,we could simply add a dynamic selector to this display. Using aplpack:slider.this can be implemented as follows: do.it <-function() require(aplpack) update.display <function(...){ value <-slider(no=1) idx <-apply(scores,2,filter.perc,cutoff=c(value,1-value), id=NULL,collate=TRUE) slens!A d ●●●●●●●●
R Visuals iplots.tex, b2648c8 on 2011/04/01 manipulating biplots (Gower and Hand, 1996), in 2D or 3D. For those who are seeking a more direct application of the commands discussed here, you can try to adapt the sphpca function in the psy package (Falissard, 1996). 4 Back to the basics So far, we only talked about dedictaed environments for interactive visualization. However, the base R functionalities might still prove to be useful in some cases. In fact, the tcltk package offers a simple way to attach interactive buttons to the current device. Let’s say we want to intercatively display the most extremes individuals on a given matrix of scores. ‘Extreme’ could mean many things, but for now assume this is a percentile-based measure, for example the 5e and 95e percentile are used to flag individuals having extreme low or high scores. filter.perc lh[2])) if (!is.null(id)) { out[["x.low"]] <- id[out[["x.low"]]] out[["x.high"]] <- id[out[["x.high"]]] } if (collate) out <- unique(c(out[["x.low"]], out[["x.high"]])) return(out) } n <- 500 scores <- replicate(5, rnorm(n, mean=sample(20:40, 1))) idx <- apply(scores, 2, filter.perc, id=NULL, collate=TRUE) my.col <- as.numeric(1:n %in% unique(unlist(idx))) + 1 splom(~ scores, pch=19, col=my.col, alpha=.5, cex=.6) A simple display for the distribution of these five series of scores is shown below, with individuals in red corresponding to those being in the lowest or highest fifth percentile. (Also, keep in mind that is done in a purely univariate manner.) Now, what about varying the thresholds for highlighting individuals? Instead of repeating the same steps, we could simply add a dynamic selector to this display. Using aplpack::slider, this can be implemented as follows: do.it <- function() { require(aplpack) update.display <- function(...) { value <- slider(no=1) idx <- apply(scores, 2, filter.perc, cutoff=c(value, 1-value), id=NULL, collate=TRUE)

1p1ots.tex,b2648c8on2011/04/01 27 25 2627 26 25 V5 24 232425 23 35 32333435 33 32 V4 32 29303132 30 29 0 3 3730 35 34 35 36 34 43 2 40414243 40 V1 40 37383940 38 Matrice de nuages de points my.col . 5 Miscalleneous TODO. sjens!A d ●●●●●●-●● 是
R Visuals iplots.tex, b2648c8 on 2011/04/01 Matrice de nuages de points 40 V1 41 42 43 40 41 42 43 37 38 39 40 37 38 39 40 36 V2 37 38 36 37 38 34 35 36 34 35 36 38 V3 39 40 38 39 40 36 37 38 36 37 38 32 V4 33 34 35 32 33 34 35 29 30 31 32 29 30 31 32 25 V5 26 27 25 26 27 23 24 25 23 24 25 my.col . 5 Miscalleneous TODO

1p1ots.tex,b2648c8on2011/04/01 discuss 3D PCA in psy ● mention BiplotGUI .discuss ordirgl in vegan library(Rcmdr) attach(mtcars) scatter3d(wt,disp,mpg) ●●●●●●●
R Visuals iplots.tex, b2648c8 on 2011/04/01 • discuss 3D PCA in psy • mention BiplotGUI • discuss ordirgl in vegan library(Rcmdr) attach(mtcars) scatter3d(wt, disp, mpg)

1p1ots.tex,b2648c8on2011/04/01 References Cook,D.and Swayne,D.(2007).Interactive and Dynamic Graphics for Data Analysis With R and GGobi. Springer.http://www.ggobi.org/book/. Becker,R.and Cleveland,W.(1988).Brushing scatterplots.In Cleveland,W.and McGill,M..editors,Dynamic Graphics for Statistics,pages 201-224.Wadsworth Brooks/Cole,Belmont,CA. Urbanek,S.(2009).iPlots eXtreme.Next-generation interactive graphics for analysis of large data.In UseR! 2009 Conference.http://www.r-project.org/conferences/useR-2009/slides/Urbanek.pdf. Gower,J.and Hand,D.(1996).Biplots.Chapman Hall,London,UK. Falissard,B.(1996).A spherical representation of a correlation matrix.Journal of Classification,13(2).167-280. sjensIA d ●●●●●●● 品
R Visuals iplots.tex, b2648c8 on 2011/04/01 References Cook, D. and Swayne, D. (2007). Interactive and Dynamic Graphics for Data Analysis With R and GGobi. Springer. http://www.ggobi.org/book/. Becker, R. and Cleveland, W. (1988). Brushing scatterplots. In Cleveland, W. and McGill, M., editors, Dynamic Graphics for Statistics, pages 201-224. Wadsworth & Brooks/Cole, Belmont, CA. Urbanek, S. (2009). iPlots eXtreme. Next-generation interactive graphics for analysis of large data. In UseR! 2009 Conference . http://www.r-project.org/conferences/useR-2009/slides/Urbanek.pdf. Gower, J. and Hand, D. (1996). Biplots. Chapman & Hall, London, UK. Falissard, B. (1996). A spherical representation of a correlation matrix. Journal of Classification, 13(2), 167-280
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《多元统计分析》课程教学资源(阅读材料)A Survey on Multivariate Data Visualization.pdf
- 《多元统计分析》课程教学资源(阅读材料)30 Years of Multidimensional Multivariate Visulization.pdf
- 中国科学技术大学:《多元统计分析》课程教学资源(课件讲义)第二讲 多元数据的可视化技术.pdf
- 中国科学技术大学:《多元统计分析》课程教学资源(课件讲义)第一讲 简介及描述性统计(主讲:张伟平).pdf
- 《实用统计软件》课程教学资源(阅读材料)Dan Bruns, Chattanooga, TN, An Introduction to the Simplicity and Power of SAS/Graph.pdf
- 中国科学技术大学:《实用统计软件》课程课件讲义(统计计算与软件)第十四讲 SAS介绍.pdf
- 中国科学技术大学:《实用统计软件》课程课件讲义(统计计算与软件)第十三讲 MatLab介绍(二).pdf
- 中国科学技术大学:《实用统计软件》课程课件讲义(统计计算与软件)第十二讲 MatLab介绍(一).pdf
- 中国科学技术大学:《实用统计软件》课程课件讲义(统计计算与软件)第十一讲 R中的数值优化方法.pdf
- 中国科学技术大学:《实用统计软件》课程课件讲义(统计计算与软件)第十讲 Expectation-Maximization(EM算法)方法.pdf
- 中国科学技术大学:《实用统计软件》课程课件讲义(统计计算与软件)第九讲 Markov Chain Monte Carlo(二)马尔科夫蒙特卡罗方法.pdf
- 《实用统计软件》课程教学资源(阅读材料)A History of Markov Chain Monte Carlo——Subjective Recollections from Incomplete Data.pdf
- 中国科学技术大学:《实用统计软件》课程课件讲义(统计计算与软件)第八讲 Markov Chain Monte Carlo(一)马尔科夫蒙特卡罗方法.pdf
- 《实用统计软件》课程教学资源(阅读材料)T. DiCiccio and B.Efron(1996), Bootstrap Confidence Intervals, Statistical Science, 3,189-228.pdf
- 中国科学技术大学:《实用统计软件》课程课件讲义(统计计算与软件)第七讲 Boostrap方法和Jackknife方法(自助和刀切).pdf
- 中国科学技术大学:《实用统计软件》课程课件讲义(统计计算与软件)第六讲 Monte Carlo方法在统计推断中的应用.pdf
- 《实用统计软件》课程教学资源(阅读材料)图像合成方面应用的一个介绍 Monte Carlo Integration.ppt
- 《实用统计软件》课程教学资源(阅读材料)多元分类问题中的应用 Variance Reduction with Monte Carlo Estimates of Error Rates in Multivariate Classication.pdf
- 中国科学技术大学:《实用统计软件》课程课件讲义(统计计算与软件)第五讲 Monte Carlo积分和方差减少技术.pdf
- 中国科学技术大学:《实用统计软件》课程课件讲义(统计计算与软件)第四讲 随机数产生方法.pdf
- 《多元统计分析》课程教学资源(阅读材料)Lattice and Other Graphics in R.pdf
- 中国科学技术大学:《多元统计分析》课程教学资源(课件讲义)第三讲 多元正态(I).pdf
- 中国科学技术大学:《多元统计分析》课程教学资源(课件讲义)第四讲 多元正态(II).pdf
- 《多元统计分析》课程教学资源(阅读材料)Multiple hypothesis testing.pdf
- 《多元统计分析》课程教学资源(阅读材料)Outlier detection.pdf
- 《多元统计分析》课程教学资源(阅读材料)R package - mvoutlier论文.pdf
- 中国科学技术大学:《多元统计分析》课程教学资源(课件讲义)第五讲 多元正态均值向量的推断.pdf
- 《多元统计分析》课程教学资源(阅读材料)EM algorithm.pdf
- 中国科学技术大学:《多元统计分析》课程教学资源(课件讲义)第六讲 两均值向量的比较.pdf
- 中国科学技术大学:《多元统计分析》课程教学资源(课件讲义)第七讲 主成分分析.pdf
- 《多元统计分析》课程教学资源(阅读材料)Overview - Principal component analysis.pdf
- 《多元统计分析》课程教学资源(阅读材料)Face Recognition using PCA.pdf
- 中国科学技术大学:《多元统计分析》课程教学资源(课件讲义)第八讲 因子分析.pdf
- 中国科学技术大学:《多元统计分析》课程教学资源(课件讲义)第九讲 判别与分类.pdf
- 《多元统计分析》课程教学资源(阅读材料)Least Squares Linear Discriminant Analysis.pdf
- 《多元统计分析》课程教学资源(阅读材料)Statistical Classification.pdf
- 《多元统计分析》课程教学资源(阅读材料)SVM in R.pdf
- 中国科学技术大学:《多元统计分析》课程教学资源(课件讲义)第十讲 聚类分析.pdf
- 《多元统计分析》课程教学资源(阅读材料)Cluster Validation.pdf
- 《多元统计分析》课程教学资源(阅读材料)Spectral cluster.pdf