Interpreting K-means cluster analysis

Metric Knowledge Management Services, Pvt. Ltd provided the article listed below involving the analysis required to interpret the results of PCA (principle components analysis) or factor analysis. The principles in this article are the same for the interpretation of mTAB's K-Means Cluster Analysis results.

For a pictorial illustration of the use of mTAB's K-Mean's feature, refer to the mTAB version 5.4 Newsletter.

For more information on Metric Knowledge Management's Services and Consulting opportunities, refer to the Metric Knowledge web-site at www.metricknowledge.com.

mTAB would like to acknowledge and thank Metric Knowledge Management Services for granting their permission to include the article below.

Taking Chance with PCA / Factor Analysis

All of us know that PCA (Principal Component Analysis) and Factor Analysis are popular Multivariate Statistical techniques that are used to summarize information contained in a large number of variables, into a smaller number of subsets or factors.

We use PCA or Factor Analysis to discover simple patterns in the complex web of relationships in multivariate data. Specifically, we try to see if the observed variables can be replaced largely or entirely by a much smaller number of variables called Components or Factors.

In Satisfaction Measurement studies, we use PCA / Factor Analysis to reduce the number of attributes to a few factors that account for the maximum variability contained in the full set of original attributes.

However, as practicing MR professionals, we have to in turn “sell” our findings to the skeptical line managers and hard-nosed decision makers.

And that is where the problem begins.

I don’t know how many of us, and how many times, have got a neat and beautiful table like below.

VARIABLE	PC1	PC2	PC3
DESIGN	0.81	0.19	-0.31
RELIABILITY	0.81	0.22	-0.34
SETUP TIME	0.76	0.21	-0.38
QUALITY	0.76	0.14	0.05
COURTESY	-0.21	0.93	0.04
SPEED OF RESPONSE	-0.26	0.92	-0.06
KNOWLEDGE	-0.23	0.91	0.04
COST OF ACCESSORIES	0.54	0.05	0.75
SYSTEM COST	0.60	0.03	0.70
% VARIANCE (CUMMULATIVE)	36	66	82

It’s almost a dream situation for two reasons.

The attribute weights are such that, each component (PC1, PC2 & PC3) contains a different set of attributes because their weights in the three components are quite different from each other. First four attributes belong to PC1 because their weights are the highest in PC1, etc. You are saved of an awkward decision of putting an attribute in one component, whereas it might as well belong to another one or more components.

More importantly, the new components thus evolved (the synthetic components) nicely correspond to some distinctly different real life categories, which are understandable and actionable for managerial decision-making (PC1 = Technology, PC2 = Service PC3= Cost).

What we have to say will be quite clear when this textbook case is contrasted with the following output for a real life data.

Variable	PC1	PC2	PC3	PC4	PC5	PC6
P_1	-0.165	0.006	-0.345	0.083	0.109	0.092
P_2	-0.163	0.017	-0.267	0.166	-0.148	-0.023
P_3	-0.145	-0.040	-0.096	-0.059	-0.205	0.080
P_4	-0.166	0.294	0.004	0.083	-0.030	-0.333
P_5	-0.139	0.204	-0.063	-0.176	0.143	-0.043
P_6	-0.178	0.012	-0.100	0.165	-0.149	0.022
P_7	-0.142	0.372	-0.148	-0.257	-0.120	0.104
P_8	-0.144	-0.031	-0.111	0.084	-0.112	-0.041
P_9	-0.101	-0.147	-0.253	0.045	-0.010	-0.061
P_10	-0.186	0.179	-0.127	0.057	0.096	0.210

Firstly, you will notice that almost every attribute can belong to more than one component by virtue of its close values of weights.

What is more important is that even if we fit some division by force, the components that emerge do not necessarily correspond to distinctly different and meaningful real life categories, for a useful managerial decision making.

This is not to discard the technique altogether. But more to understand its limitations so that we don’t make a fetish out of it.

Now let me illustrate a case study where we managed to use the PCA and come out with relevant inputs, for managerial decision-making.

Case Study: Gas Station CS Study

CONTEXT

A large oil sector company asked us to evaluate their 300 gas stations for customer satisfaction. Unlike in west, in India, the front end of a gas station: the customer interface is yet to be mechanized. So the entire service component is still delivered by a team of deliverymen with the help of dispensing units. These are susceptible to tampering at most places. The price structure for oil products is such that, there is a hefty incentive for mixing and adulteration.

DATA COLLECTED

A predetermined sample of visiting customers was interviewed at the exit point using a structured questionnaire. The list of 40 attributes contained attributes like accurate quantity of fuel, pure quality of fuel, courteous behavior of staff, value added services like windshield cleaning, payment by credit card, availability of auto accessories etc.

Analysis of the satisfaction data was done using METRIC’s proprietary model: The MOSTER System.

Despite being strong critiques of reckless use of traditional MVA tools for Satisfaction Data, we always try them out in each of our projects. In this case, we were lucky to get “some” meaningful insights using the Principal Component Analysis (PCA). This is how it goes…(The table has been truncated for the sake of brevity)

GAS STATION DATA
Variable	PC1	PC2	PC3	PC4	PC5	PC6
Courteous Behavior	-0.304	0.898	0.272	0.005	-0.111	0.075
Easy Access	-0.387	-0.328	0.316	-0.318	-0.524	-0.163
Quick Service	-0.373	-0.275	0.442	0.546	0.237	0.486
Accurate Quantity	-0.405	-0.018	-0.047	-0.212	0.753	-0.410
Less waiting time	-0.369	0.025	-0.748	-0.030	-0.074	0.476
24 hour service	-0.394	-0.017	-0.256	0.553	-0.287	-0.555
Pure Quality	-0.405	-0.096	0.065	-0.500	-0.041	0.167
Cumulative% Explained	68.6	77.7	85.1	91.1	94.9	97.5

The three components accounting for 85% of variation were termed as,

PC1: Product

PC2: Human Interference

PC3: Delivery

= Managerial Dimensions

You will notice that we were unable to convert the other components, namely the PC4, PC5 and PC6 into any meaningful managerial categories. It is different story that any way they were not adding substantially to the cumulative explained variation.

Based on the above study, the company initiated the following actions.

Action	Target Component	Target Attribute
Tightened the supply chain	Product	Pure Quality
Installed a tight quality control regime	Product	Pure Quality
Training of the staff on work discipline and customer orientation	Human Interface	Courteous Behavior & Work Discipline
Installed New generation dispensing units which are tamper proof and fast	Product & Delivery	Accurate Quantity
Added man power	Human Interface & Delivery	Quick Service

The moral of the story as we see is…

The use of PCA & Factor Analysis is likely to be fraught with uncertainties as explained above. We must make the user aware of these uncertainties. (Easier said than done! The users often demand comfort of certainty.)

Practice of both PCA and Factor Analysis requires considerable use of human creativity and judgment at every step. This is not a blind plug and play software tool, as often made out to be.

Feedback

If you receive this newsletter from a colleague, you can start your own free subscription.

We protect your privacy.

To unsubscribe from this newsletter, please click here.

If you have general comments or suggestions regarding this newsletter, e-mail us at newsletter@metricknowledge.com.

Office Contact: business@metricknowledge.com or metric@vsnl.com.

Interpreting K-means cluster analysis

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Administration

Filter

Format

Layer

Recode

Verbatim Responses

Spreadsheet

Subset

TopN

User Defined Questions

Slice

Sig Testing/Statistics

Charts

Save/Export

Other

Videos

Whitepapers

Tools