Today I encountered what is perhaps one of the more common mistakes made even by those that have a decent background in statistics: Unfounded conclusions drawn from simple, one dimensional bar charts.
Here was the scenario: One of our departments produces automated charts for use by the call center and training organizations. The charts break phone agents into two groups–those that have been out of training for less than 50 days and those that have been out of training for 50 days or more. Each bar represents one of these two groups. Then, each of the charts shows the average of a key metric (e.g., average handle time, member satisfaction score, etc) for each of the groups.
What prompted me to view these charts (which have always seemed a bit one dimensional/almost useless) is a current debate in our organization over average handle time and why it increased by over 60 seconds last fall and has remained elevated since then. One hypothesis that has almost ascended to the status of lore and is accepted by a significant contingent is that we have more new agents in the call center. The thinking continues that new agents are less experienced and take longer to complete technical support calls as a result.
One of the bar charts I described are often cited as proof of this hypothesis (never mind that hypotheses are more easily rejected than proven according to Popper). The chart shows the average handle time of agents that have been taking calls for less than 50 days as having an average handle time that is almost two minutes higher than agents that have been taking calls for more than 50 days.
Proof! See? Not so fast.
Dismay struck me at first. All those scatter plots and regression analyses I had run in the past showed no relationship between agent tenure and agent average handle time. How could this be?
I ran a regression test again using a sample of well over 1000. Lo and behold there was a statistically significant relationship with an R Square value of…3.7%. The scatter and fitted line plots were as unimpressive as those I had seen in the past. Sample sizes this size usually provide enough power to find statistically significant results but there are outstanding questions one must ask, like is this simply an artifact of my sampling technique? Or, did I truly meet all of the assumptions of a regression test if it only explains 3.7% of the variance? Recall that one of the assumptions of a regression test is that you have included all of the factors that relate to the output variable. Evidently that is not the case here or we might expect to explain a larger chunk of the variance in handle time.
I started to wonder what might happen if I broke the agents into categories based on how long they have taken calls and run an ANOVA. It occurred to me that taking a continuous variable and putting it into categories would tidy things up and remove some variance. Sometimes that is helpful and sometimes that is deceiving. The jury was still out in this case.
The ANOVA also found a statistically significant difference between the agent tenure categories (e.g., 120 days tenure group. There was my answer and there was the flaw in the thinking of that contingent of people in my organization! While those groups did appear to have a higher handle time on average, there were very few of them in relation to the overall call center. Thus, when I ran a quick weighted average to combine the handle time of the two groups, I saw that the newbies only contributed a few seconds to the overall average handle time.
Ok, Dean, don’t let those silly, out of context, bar charts fool you or anyone else again.