Interpreting K-means cluster analysis
Metric Knowledge Management Services, Pvt. Ltd provided the article listed below involving the analysis required to interpret the results of PCA (principle components analysis) or factor analysis. The principles in this article are the same for the interpretation of mTAB's K-Means Cluster Analysis results.
For a pictorial illustration of the use of mTAB's K-Mean's feature, refer to the mTAB version 5.4 Newsletter.
For more information on Metric Knowledge Management's Services and Consulting opportunities, refer to the Metric Knowledge web-site at www.metricknowledge.com.
mTAB would like to acknowledge and thank Metric Knowledge Management Services for granting their permission to include the article below.
Taking Chance with PCA / Factor Analysis
All of us know that PCA (Principal Component Analysis) and Factor Analysis are popular Multivariate Statistical techniques that are used to summarize information contained in a large number of variables, into a smaller number of subsets or factors.
We use PCA or Factor Analysis to discover simple patterns in the complex web of relationships in multivariate data. Specifically, we try to see if the observed variables can be replaced largely or entirely by a much smaller number of variables called Components or Factors.
In Satisfaction Measurement studies, we use PCA / Factor Analysis to reduce the number of attributes to a few factors that account for the maximum variability contained in the full set of original attributes.
However, as practicing MR professionals, we have to in turn “sell” our findings to the skeptical line managers and hard-nosed decision makers.
And that is where the problem begins.
I don’t know how many of us, and how many times, have got a neat and beautiful table like below.
|SPEED OF RESPONSE||-0.26||0.92||-0.06|
|COST OF ACCESSORIES||0.54||0.05||0.75|
|% VARIANCE (CUMMULATIVE)||36||66||82|
It’s almost a dream situation for two reasons.
The attribute weights are such that, each component (PC1, PC2 & PC3) contains a different set of attributes because their weights in the three components are quite different from each other. First four attributes belong to PC1 because their weights are the highest in PC1, etc. You are saved of an awkward decision of putting an attribute in one component, whereas it might as well belong to another one or more components.
More importantly, the new components thus evolved (the synthetic components) nicely correspond to some distinctly different real life categories, which are understandable and actionable for managerial decision-making (PC1 = Technology, PC2 = Service PC3= Cost).
What we have to say will be quite clear when this textbook case is contrasted with the following output for a real life data.
Firstly, you will notice that almost every attribute can belong to more than one component by virtue of its close values of weights.
What is more important is that even if we fit some division by force, the components that emerge do not necessarily correspond to distinctly different and meaningful real life categories, for a useful managerial decision making.
This is not to discard the technique altogether. But more to understand its limitations so that we don’t make a fetish out of it.
Now let me illustrate a case study where we managed to use the PCA and come out with relevant inputs, for managerial decision-making.
Case Study: Gas Station CS Study
A large oil sector company asked us to evaluate their 300 gas stations for customer satisfaction. Unlike in west, in India, the front end of a gas station: the customer interface is yet to be mechanized. So the entire service component is still delivered by a team of deliverymen with the help of dispensing units. These are susceptible to tampering at most places. The price structure for oil products is such that, there is a hefty incentive for mixing and adulteration.
A predetermined sample of visiting customers was interviewed at the exit point using a structured questionnaire. The list of 40 attributes contained attributes like accurate quantity of fuel, pure quality of fuel, courteous behavior of staff, value added services like windshield cleaning, payment by credit card, availability of auto accessories etc.
Analysis of the satisfaction data was done using METRIC’s proprietary model: The MOSTER System.
Despite being strong critiques of reckless use of traditional MVA tools for Satisfaction Data, we always try them out in each of our projects. In this case, we were lucky to get “some” meaningful insights using the Principal Component Analysis (PCA). This is how it goes…(The table has been truncated for the sake of brevity)
|Less waiting time||-0.369||0.025||-0.748||-0.030||-0.074||0.476|
|24 hour service||-0.394||-0.017||-0.256||0.553||-0.287||-0.555|
The three components accounting for 85% of variation were termed as,
- PC1: Product
- PC2: Human Interference
- PC3: Delivery
- = Managerial Dimensions
You will notice that we were unable to convert the other components, namely the PC4, PC5 and PC6 into any meaningful managerial categories. It is different story that any way they were not adding substantially to the cumulative explained variation.
Based on the above study, the company initiated the following actions.
|Action||Target Component||Target Attribute|
|Tightened the supply chain||Product||Pure Quality|
|Installed a tight quality control regime||Product||Pure Quality|
|Training of the staff on work discipline and customer orientation||Human Interface||Courteous Behavior & Work Discipline|
|Installed New generation dispensing units which are tamper proof and fast||Product & Delivery||Accurate Quantity|
|Added man power||Human Interface & Delivery||Quick Service|
The moral of the story as we see is…
The use of PCA & Factor Analysis is likely to be fraught with uncertainties as explained above. We must make the user aware of these uncertainties. (Easier said than done! The users often demand comfort of certainty.)
Practice of both PCA and Factor Analysis requires considerable use of human creativity and judgment at every step. This is not a blind plug and play software tool, as often made out to be.
If you receive this newsletter from a colleague, you can start your own free subscription.
We protect your privacy.
To unsubscribe from this newsletter, please click here.
If you have general comments or suggestions regarding this newsletter, e-mail us at firstname.lastname@example.org.