Notes on Chapter 2

Back to notes on chapter 1

  • p. 40: there is now a third graphics system in R, called ggplot (for "Grammar of Graphics"). This package, by Hadley Wickham, implements L. Wilkinson's grammar of graphics, a way to abstract the details of specifying graphics. ggplot looks like an interesting compromise between the ordinariness of base graphics and the baroqueness of lattice/grid graphics …
  • p. 40: (I have a note to myself here to say something about 3D graphics, but I don't remember what it was!)
  • p. 43: in Figure 2.2, the y-axis tick marks don't quite line up in the two subfigures. Oh well.
  • p. 46: I thought mfrow=c(row,col) might be a little confusing here. The point is that row is the number of rows and col is the number of columns, so e.g. par(mfrow=c(2,3)) will set up a layout with 2 rows and 3 columns.
  • p. 49: see the Vanderbilt Biostatistics wiki for a nice dotplot code that lines the dots up when they would overlap horizontally (a little nicer than jittering). Also see the blog post by Andrew Gelman entitled Does jittering suck? (obviously a suggestion that it might - but you can read it for yourself)
  • p. 52: I'm not sure I was 100% consistent in referring to "mortality" or "survival" throughout this example: they are just opposites (mortality=1-survival)
  • p. 61: Here are links to the data files (duncan_10m.csv and duncan_25m.csv) referred to in the chapter.
  • p. 67: did I already introduce lines and points?

I didn't find room in this chapter to talk about Q-Q (quantile-quantile) plots, which are a useful (if initially somewhat mystifying) way to explore whether data follow a particular distribution or not. You can look at the Wikipedia entry (which uses R for graphics, by the way), or the NIST handbook definition, for more details … there are qqnorm (compare vs. normal) and qqplot (compare two samples) in base R, and qqmath (compare vs. a known distribution) and qq (compare two samples) in the lattice package. Here's one of the examples from ?qqmath (the complicated looking bits are needed to draw the line on the graph for comparison):

qqmath(~ height | voice.part, aspect = "xy", data = singer,
       prepanel = prepanel.qqmathline,
       panel = function(x, ...) {
           panel.qqmathline(x, ...)
           panel.qqmath(x, ...)

The fact that the dots follow reasonably straight lines in each subplot means that the distributions within categories are normally distributed — the fact that the slopes and intercepts vary means the means and variances differ.

On to chapter 3 notes

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License