# Histogram review

Overview

Histograms are a powerful visualization tool, but are often confused with bar charts and as we will see below many of the bad examples of histograms come from people trying to hack a histogram out of a bar chart.

Histograms are generally used for continuous data sets, where the goal is to understand the distribution of a data set.

The data sets are arranged in bins where a total count of observations in a bin is shown to create the distribution. Depending on the data set, the algorithm to arrange the data in the bins and determining the optimal number of bins given spacial constraints can be relatively complex.  While this is a heavy layer of processing, most people rely on existing tools to do this rather than working on first principals.  The mapping of data is relatively easy and achieved heavily in the bin selection and population stage, after that apart from axis differences the process is similar to working with a bar chart.

Good Examples

The first example of a good histogram, is something that I've created in the past. There were a number of elements that I was trying to control for in the image, including (1) correct bin labeling (2) summary data presented spatially (3) controlling for the long right tail in the second last bin (4) controlling for N/M data

In the second example of a good histogram, I selected a tiled histogram, that compares alternative histograms, and provides a nice high level comparison across the datasets.  Personally, I find tiled histograms to be of high value, especially in a three tile example where focusing on series A and B, and the delta between A and B.

In the final good example I have selected an example of a histogram displaying multiple data series.  While there are some design issues, I think the best thing coming away from the chart is the fact that all series are centered at the 0 axis. Often with histograms with multiple series people stack the series which eliminates the ability to examine a data series in its own right.  While there are instances that this is interesting to do with I find that people default to stacking when they should be defaulting to the way in which the below chart is displayed