Data mining – Key influencers
Posted by thomasivarssonmalmo on December 3, 2008
This is my first blog post that uses SSAS 2008. It is also valid for SSAS 2005 users.
It is about a data mining components for Excel 2007 and a simple algorithm, "Table Analysis Tools and "Analyze key Influencers", that I like because it is simple. You will only have to decide what column you would like to predict and the columns that you think might decide the value of that predicted column. It will only work with discrete or categorized columns and not continous data like sales or income.
Instead of using the known example of bike buyer patterns in the customer dimensions I would like to use this algorithm to predict what attributes that are most imprtant for explaining the number of cars each customer has bought.
This is the start table in the data mining samples that is part of the installation of the Excel 2007 data mining add ins for SSAS 2005 and SSAS 2008.
I select the "Table Analysis Tools and "Analyze Key Influencers"
The first step in that wizard is to decide on what column to explain.
And optionally you can unselect columns that are of no interest like a source table key (like ID in the next picture). You should also deselect columns with duplicate information like a avoid using both the key and the description columns(CustomerId and CustomerName). Select one of these columns.
And this is the result were I have filtered on the most important attributes below. The Naive Bayes algorithm that is used here uses the influencing attributes one by one. This means that you can only look at each number of cars and their influencers and not across different number of cars.
If the customers have zero cars this can be explained by education and the commute distance. One car is explained by another education degree and commute distance. More than one car is explained by education, the number of children, commute distance and income.