How to compare Histograms in Excel

Recently someone asked me to assist with a histogram chart. I have to admit, I had not done one for a couple of years and worried a bit. After a brief refresher online however, I was ready to help.

A histogram is done when you're trying to find out the distribution of a variable through a population. Online examples abound but  the most common one is grades in a population of students. I'll differ from that tradition. My variable is a medical condition across a population of patients of varying ages. High Blood Pressure or HBP.

Doing a histogram in Excel is not hard. It's a bit tricky, but have no fear, Marco is here...(sorry, could not resist, lol).

A histogram requires a few things:
         a) The data array (Patient Ages)
         b) The groupings or intervals: The lowest age of the group is 36 and the oldest is 88. So we have to find a way to segregate them into several groupings. Later we will see how to do this.
         c) The Bins: The highest number for each interval
         d) The Frequency: This is where Excel does it job using Array Formulas. The trick to Array      Formulas is knowing that you have to hold down 3 keys after entering the formula in the cell. Again, more of that later. Just know that this is very important.

Since you already have the data, you must define the rest. Below I prepared a graphic of the above rules using the example discussed:


Now that we have the outline, let's dig a little deeper into a real example with a second variable: another medical condition: Diabetes. 

This will make things a little interesting because by introducing a second variable, we now have 3 groupings. 
   1) Those patients with only HBP
   2) Those with Diabetes and 
   3) Those with both conditions. 

If the patients had none of the conditions, they would not be in our histogram, so that's not an option. 



A word about Excel's Frequency formula. The Formula calls for the Data Array and the Bin Array. While the Bin Array does not change for all 3 groupings in our example, the Data Array does change for each.

In other words, the HBP group formula only needs the patients where HBP = Yes AND Diabetes = No. Similarly, the Diabetes group's data array is the group of patients where HBP=No AND Diabetes = Yes.

Another point to highlight is that if you add up all the groupings, they total 197 patients. This is important as you must account for all the data.

And now for a shocker... A true Histogram cannot have gaps between the columns. In other words, what I have here is not a true Histogram because of the gaps. However, the methodology followed here is that of a Histogram.

While this chart is more of a Bar graph, it allows us to compare and contrast various groupings--something that a Histogram cannot do well by its very nature.

I hope this quick guide helps get you going.