Histograms

Definition

The word histogram comes from the Greek 'histos', meaning "anything that stands upright", and 'gram', which means chart or graph (source: NCSS Statistical Software, Etymology online). The term was coined by Karl Peterson in 1895.

Histograms are accurate representations of continuous, numerical data on a singular axis. A histogram is a visualization of the frequency distribution in which the vertical axis represents the count (frequency) and the horizontal axis represents the possible range of the data values.

*The anatomy of a histogram. Source: Data Visualization Catalogue.*

Each bar in a histogram represents the frequency of scores within the intervals (also known as bins). Histograms help estimate where values are concentrated, what the extremes are and whether there are any gaps or unusual values. The bins are the main differentiator from a bar chart: a histogram can only plot the frequency of score occurrences whereas a bar chart can plot many other kinds of data. Histograms can also be normalized to display relative frequencies, which will quickly show how often data occur in relation to other data. Histograms are useful for:

Comparing data within intervals
Showing data over time
Displaying and analyzing distribution
Finding patterns across ranges

*An example of a simple histogram with distribution outlined. Source: OriginLab*

Creation

The bars produced in histograms are used to represent the shape of the distribution of data and as such are used to perform communicate business and math analysis. Bar shapes can vary in height or width, as long as the groups are defined in a way that clearly communicates any relevant patterns. Creating a histogram can be straightforward if the data is grouped correctly. It is important to try filtering the data several times to explore relevant patterns.

*A good binning method will clearly identify patterns within the data. Source: Chartio*

Several methods have been developed for optimizing data bins in creating histograms. Bayesian blocks, shown below, is a dynamic histogramming method utilizing probability density functions to program bin widths.

Examples of different grouping methods for histograms. *Source: AstroML.*

*An example of a log-scaled histogram. Source: Wolfram*

Histograms don't have to remain vertical; radial histograms, polar histogram and angle histograms are useful for examining peaks and identifying abnormalities in distribution from a similar starting point.

*Polar coordinates help communicate more data with clarity. Source: Yahia El Gamal*

Analysis

With the right bins, analyzing a histogram can be one of the quickest ways to communicate patterns across a data set. As histograms examine frequencies rather than other kinds of data, it is easier to tell if a distribution has any irregularities within it. Here are some examples of what different distributions might look like in a histogram and what they mean: