Bar Charts Illuminated by Regression and ANOVA

Today I encountered what is perhaps one of the more common mistakes made even by those that have a decent background in statistics: Unfounded conclusions drawn from simple, one dimensional bar charts.

Here was the scenario: One of our departments produces automated charts for use by the call center and training organizations. The charts break phone agents into two groups–those that have been out of training for less than 50 days and those that have been out of training for 50 days or more. Each bar represents one of these two groups. Then, each of the charts shows the average of a key metric (e.g., average handle time, member satisfaction score, etc) for each of the groups.

What prompted me to view these charts (which have always seemed a bit one dimensional/almost useless) is a current debate in our organization over average handle time and why it increased by over 60 seconds last fall and has remained elevated since then. One hypothesis that has almost ascended to the status of lore and is accepted by a significant contingent is that we have more new agents in the call center. The thinking continues that new agents are less experienced and take longer to complete technical support calls as a result.

One of the bar charts I described are often cited as proof of this hypothesis (never mind that hypotheses are more easily rejected than proven according to Popper). The chart shows the average handle time of agents that have been taking calls for less than 50 days as having an average handle time that is almost two minutes higher than agents that have been taking calls for more than 50 days.

Proof! See? Not so fast.

Dismay struck me at first. All those scatter plots and regression analyses I had run in the past showed no relationship between agent tenure and agent average handle time. How could this be?
I ran a regression test again using a sample of well over 1000. Lo and behold there was a statistically significant relationship with an R Square value of…3.7%. The scatter and fitted line plots were as unimpressive as those I had seen in the past. Sample sizes this size usually provide enough power to find statistically significant results but there are outstanding questions one must ask, like is this simply an artifact of my sampling technique? Or, did I truly meet all of the assumptions of a regression test if it only explains 3.7% of the variance? Recall that one of the assumptions of a regression test is that you have included all of the factors that relate to the output variable. Evidently that is not the case here or we might expect to explain a larger chunk of the variance in handle time.

I started to wonder what might happen if I broke the agents into categories based on how long they have taken calls and run an ANOVA. It occurred to me that taking a continuous variable and putting it into categories would tidy things up and remove some variance. Sometimes that is helpful and sometimes that is deceiving. The jury was still out in this case.

The ANOVA also found a statistically significant difference between the agent tenure categories (e.g., 120 days tenure group. There was my answer and there was the flaw in the thinking of that contingent of people in my organization! While those groups did appear to have a higher handle time on average, there were very few of them in relation to the overall call center. Thus, when I ran a quick weighted average to combine the handle time of the two groups, I saw that the newbies only contributed a few seconds to the overall average handle time.

Ok, Dean, don’t let those silly, out of context, bar charts fool you or anyone else again.

Developing New Measurement Systems

As a Six Sigma Black Belt at a company that does not embrace Six Sigma as part of its culture, data are sometimes hard to come by–useful data, at least. As I described earlier, some types of data are hard to measure and collect and many black belts move on to projects that can be completed using existing data. I am guilty again: I used this very tactic for my Black Belt project.

I have bumped up against this issue enough times that it led me to think and where I started thinking was as follows.

In my college years, I decided that I wanted to study psychology with an emphasis on family and couple dynamics. I completed an undergraduate degree in family psychology and entered a doctoral program with the intent to become either a research clinician or family therapist. I have the utmost respect for those that perform this worthwhile work but found that my skill set did not lend itself to either of these careers. After completing my master’s coursework, I discontinued my studies and focused my efforts on my then fledgling career as a business analyst.

The work that most impressed me during my graduate studies was that conducted by Dr. John Gottman. He built what is known as “The Love Lab” in Washington state where he and his research team observe couples interacting. Unlike many social scientists, however, Gottman and his team make more than the typical social science types of observations. They use heart rate monitors, measures of pulse amplitude, jitteriness, and skin conductivity. This is all in addition to a very detailed behavioral observation system he developed to characterize behaviors and emotions of couples as they interact.

But Gottmans approach to measuring behavior and emotions is not the entire story. He and his team have made significant discoveries using these data. They are able to predict with incredible accuracy–around 90%–whether a couple will still be together a year from the date that they observe them in their laboratory!

How does this relate to Six Sigma and data measurement systems in business? I suspect that many of us find ourselves stuck using the same old types of measurement systems. I believe that one of the keys to breakthrough improvements lies in developing fundamentally different ways of collecting business data.

When I say fundamentally different, I mean that the type of data or method of collecting data should be very different than what/how your business and my business currently collect–second order change.

An example of second order change might be to invest effort in text mining the written comments you receive from customers (rather than or in addition to just using likkert-scale surveys). A first order change would be to simply add more questions to your customer surveys. First order change can result in improvements but when a process is stuck, second order change is the often the only thing that will allow a breakthrough result.

Measurement and Collection of Data

In the service industry, many companies struggle to measure and collect data. The reasons for this are varied but the impact is the same–if you don’t have reliable, valid data, it will be nearly impossible to carry a Six Sigma project through to completion. The obvious reason is that the measure, analyze, and improve steps of the DMAIC model require quantifiable data.

One of the main reasons that many service industry companies do not measure many of their inputs and outputs is because these variables are often difficult to measure. For example, how does a service company measure the satisfaction level of their customers? Customer surveys jumps to mind but these efforts are time consuming and costly. Further, even if a company commits to the costs associated with surveying customers, it is very difficult to measure customer satisfaction in a valid, reliable fashion. Consider how often you, personally, fill out customer satisfaction surveys. If you filled one out recently, you most likely received some sort of incentive to do so (think big dollars for the company that wanted that survey from you). Then if you did receive a survey, did that incentive alter the ratings you gave the company?

Another reason that service companies opt not to measure and collect data on some of their important inputs or outputs is that they do not believe that investing in the collection of such data will prove cost effective. In some cases, managers that make these decisions may be right. It does not make sense to measure every possible input or output variable. Key inputs and outputs, however, should be measured and tracked if a company intends to succeed.

Though I have not worked in the manufacturing industry, I suspect there is often less deliberating over whether to measure most input and output variables because most of them are relatively simple to measure. There are obviously still costs involved but many of the variables are a little more tangible than things like customer satisfaction.

I find the mental exercise of listing key input and output variables for a process useful. Then I review the list and identify which of the variables are already measured and collected. This way I can quickly identify projects with the greatest potential for success.

Making a business case to start measuring a key metric (that is not currently measured or tracked) can also be a useful strategy to create potential Six Sigma project opportunities.

While the service industry faces the challenge of measuring and collecting business data, there is hope and plenty of innovation opportunities for those that are persistent.