Styling Box Plots in R (with Examples)
I love the power that R (The R Project for Statistical Computing) offers for manipulating and analysing data, but I must admit that the plots & graphs it produces by default are, well, butt ugly.
Now there are many, many awesome libraries out there to produce amazing looking plots… but sometimes you just want to stick to the basics.
Now for many simple plot types, styling is pretty straight forward, but for one of my favourite plot types (box plots!), it requires a bit deeper knowledge.
This article aims to be an introductory guide to getting your box plots looking awesome… or at least passable. To help achieve this, a number of examples will be provided.
The above is how a box plot is styled by default. Functional, incredibly bland, but a great example to help us understand how R views the various elements of the plot:
Understanding what R calls each element of the plot is important, because there are a number of “hidden” (not obvious without reading the manual) parameters that allow you style each element.
Box Style Parameters
The box is the outline/border drawn around the rectangle that encompasses the middle two most quartiles and the median line.
- boxwex: a number (whole or fractional) that represents the scale of the box relevant to the width of the column assigned to the box’s data group. This basically controls how wide the box is (e.g at 0.5 the box would be 50% the width of group’s column).
- boxlty: a number representing the type of line you wish the box to be drawn with. This parameter can take any of the standard R line types. Set to 0 if you wish to turn the box off (do not set the line width to 0).
- boxlwd: a number (whole or fractional) representing the thickness of the line the box is drawn with.
- boxcol: a string representing the colour you wish the box to be drawn with. This can be any RGB colour in the form “#RRGGBB”.
It is worth noting that if you would like to change the background colour of the box, the easiest way to do this is to simply use the default colour parameter of the boxplot() function (e.g. col=”#FF0000″ with the box prefix).
Med (Median) Style Parameters
The “med” is the line that represents the median of the box plot.
- medlty: a number representing the type of line you wish the box to be drawn with (line types). Set to 0 to disable.
- medlwd: a number (whole or fractional) representing the thickness of the median line.
- medcol: a string representing the colour you wish the median line to be drawn with. Can be any RGB colour in the form “#RRGGBB”.
You may be noticing a pattern… the parameters to style the various elements of the box plot are, basically, just the standard styling parameters with the element’s name/label prefixed.
Not all styling parameters are available for all elements, “wex” having no effect on the median line for example, and some elements have access to a broader range of options (see outliers), but with a little experimentation you should get a feel for things pretty fast.
So before we jump in to simply listing the core parameters for each element, lets see a basic example of what we can do:
Not a huge improvement, but this example demonstrate some of the basic concepts well.
boxplot( example$Value, col = "#EEEEEE", boxcol = "#727272", boxlty = 1, boxlwd = 3.25, boxwex = 0.2, medcol = "#727272", medlty = 1, medlwd = 8, whiskcol = "#727272", whisklty = 2, whisklwd = 1.75, staplecol = "#727272", staplelwd = 3.25, staplewex = 2.25, outpch = 20, outcex = 2, outcol = "#727272" )
Before we jump in to any other examples, lets just quickly finish listing out the core parameters for each element:
Whisk (Whisker) Style Parameters
The “whisk” (whisker) is the line that extends out from the box in both directions.
- whisklty: the type of line you wish the whisker to be drawn with (line types).
- whisklwd: the thickness of the whisker line.
- whiskcol: the colour of the whisker line.
Staple Style Parameters
The staple is the horizontal line that caps each whisker.
- staplewex: the width of the staple in relation to the width of the box (e.g. 1.0 would be 100% the width of the box).
- staplelty: the type of line you wish the whisker to be drawn with (line types).
- staplelwd: the thickness of the whisker line.
- staplecol: the colour of the whisker line.
Out (Outlier) Style Parameters
The “out” (outlier) parameters control the look and feel of the markers for outliers on the plot. These parameters are somewhat different from the others.
- outpch: the “point symbol” that should be used to represent the outlier. This can be any of the standard R point symbols. You will probably need to play with this a bit to get used to it, but for example if you set the value to 10, all the outliers will be represented by a circle and cross symbol.
- outcex: the size of the symbol representing the outlier, relevant to its default size (e.g. 2.0 would be 200% of default). Please note, this is not a type… in this case it is “outcex” not “outwex”, as it controls the width and height.
- outcol: the colour of the symbol representing the outlier.
- outlty: where relevant (will not work for all symbols), sets the type of line to be used drawing the symbol to represent the outlier.
- outlwd: where relevant (will not work for all symbols), sets the width of the line to be used drawing the symbol to represent the outlier.
It is worth noting that many of the symbol related parameters will also work with the median prefix (e.g. “medpch = 3)… but it is hard to get something that looks good paired with the median line.
colours <- c("#00AEEF", "#F7941D") boxplot( example$Value ~ example$Label, col = colours, boxcol = "#FFFFFF", boxlty = 0, boxwex = 0.25, medcol = "#FFFFFF", medlty = 1, medlwd = 4, whiskcol = colours, whisklty = 1, whisklwd = 4, staplecol = colours, staplelwd = 4, staplewex = 0.3, outpch = '*', outcex = 2.5, outcol = colours )
In the example above, we have not only applied nearly everything we have learned, but we have also produced plot with two groups, which are different colours.
This is very simple to achieve. All you need to do is pass in a list (vector) of colours in to any colour parameter, and R will automatically cycle through the colours as it moves from group to group. When it runs out of colours it will just start the cycle again.
colours <- c("#00AEEF", "#F7941D") ... whiskcol = colours
boxplot( example$Value, notch = TRUE, notch.frac = 0.5, col = "#F4BFCF", boxcol = "#992246", boxlty = 1, boxlwd = 3, boxwex = 0.5, medcol = "#992246", medlty = 1, medlwd = 6, whiskcol = "#992246", whisklty = 3, whisklwd = 3, staplecol = "#992246", staplelwd = 4, staplewex = 1, outpch = 5, outcex = 1, outlwd = 2, outcol = "#992246" )
R provides the ability to add a notch to your box plot to represent the confidence interval of the median.
If you would like to do this, all you have to do is add a “notch” parameter and set it to “TRUE”.
notch = TRUE
If you would like to control the width of the notch, you can do this using the “notch.frac” parameter. The value of this parameter is the width of the notch relative to the box.
notch.frac = 0.5 # 0.5 = 50% the width of the box
You can not directly control the height of the notch, as this is what is used to visually represent the confident interval.
colours1 <- c("#87D987", "#FF9F9F") colours2 <- c("#1E891E", "#AB2626") boxplot( example$Value ~ example$Label, col = colours1, boxcol = colours2, boxlty = 1, boxlwd = 3, boxwex = 0.5, medcol = colours2, medlty = 1, medlwd = 8, whiskcol = colours2, whisklty = 3, whisklwd = 2.5, staplelty = 0, outpch = 20, outcex = 2, outcol = colours2 )