Thomas.I Microsoft BI & Analytics

A Site Dedicated to General BI Issues and MS BI Issues: Architecture and Technology

Loading duplicates records into SSAS 2005

Posted by thomasivarssonmalmo on October 19, 2008

This post is about what will happen when you load duplicate dimension records into SSAS 2005. A following post will deal with duplicate fact records. Will SSAS 2005 generate an error when you try the first scenario? Not by default as we shall see in a simple cube.
 
Edit: After doing some additional testing I have updated this post at the end.
 
When you have primary keys in all dimensions, foreign keys in the fact table and a unique constraint on all the fact table foreign keys the problem with duplicate records will never occur in either the dimension tables or the fact tables. You will get errors when you load the tables in the ETL-process. In reality the data quality is not always that good or you might have performance objectives that will prevent you from having constraints in your starschema.
 
Run this script to construct two simple dimension tables and a fact table.
 
Create Table Product(
ProductId Int,
ProductName varchar(20),
ProductCat varchar(20))
 
Create Table Dates (
DateId Int,
DateKey DateTime,
YearNo Int)
 
Create Table FactSales(
ProductId Int,
DateId Int,
SalesQty Int,
SalesValue Decimal(10,2))
 
Insert Into Product(ProductId, ProductName, ProductCat) Values(1,’Beer’,’Alcohol’)
Insert Into Product(ProductId, ProductName, ProductCat) Values(2,’Water’,’NonAlcohol’)
Insert Into Product(ProductId, ProductName,ProductCat) Values(3,’Milk’,’NonAlcohol’)
 
Insert Into Dates(DateId,DateKey,YearNo) Values(1, ‘2007-06-01’,2007)
Insert Into Dates(DateId,DateKey,YearNo) Values(2, ‘2008-06-01’,2008)
Insert Into Dates(DateId,DateKey,YearNo) Values(3, ‘2008-07-01’,2008)
Insert Into Dates(DateId,DateKey,YearNo) Values(4, ‘2008-08-01’,2008)
Insert Into Dates(DateId,DateKey,YearNo) Values(5, ‘2008-09-01’,2008)
 
Insert Into FactSales(ProductId, DateId, SalesQty, SalesValue) Values(1,1,10,100)
Insert Into FactSales(ProductId, DateId, SalesQty, SalesValue) Values(2,2,100,1000)
Insert Into FactSales(ProductId, DateId, SalesQty, SalesValue) Values(3,3,1,10)
Insert Into FactSales(ProductId, DateId, SalesQty, SalesValue) Values(1,4,20,200)
Insert Into FactSales(ProductId, DateId, SalesQty, SalesValue) Values(2,5,30,300)
 
In the script we will have no primary key defined but we will asssume that the Id-columns in both dimension tables are keys.
 
Start BIDS and build a new cube project with the following data soure view
 
Dsv
 
This is the product dimension with a hierarchy
 
Product
 
This is the date dimension with a hierarchy
 
Date
 
Finally, this is the view in the cube browser when the cube has been processed.
 
Cube browser
 
Now we can start the "fun" of loading a duplicate key into the product dimension table. Run the following update:
 
Insert Into Product(ProductId, ProductName, ProductCat) Values(1,’Wine’,’Alcohol’)
 
Both beer and wine will have the same "key". Process the dimension first and the cube after that. No error messages will appear.
 
The duplicate member will not show up in the dimension browser. This means that SSAS 2005 ignores duplicates and take th first record in the table.
 
The duplicat key will not show in dim
 
In order to see that you have duplicates in the dimension you can download and use BIDS-helper and the dimension health check functionality. This is the message after the duplicate dimension record has been inserted.
 
 
Dimension health check after dim duplicate 
 
 
In order to stop the processing of the dimension, if a duplicate record appears you will have to change the error configuration for the dimension. Change the default setting to custom and ignore error to report and stop.
 
 
 Change to report and stop cube
 
 
If you process the dimension, with this error setting, you will get an error
 
 
Process with the custom error config 
 
 
What will happen if you process the cube?
 
Cube processing will work
 
The error configuration in the dimension will not stop the processing of the cube. The cube will process.
 
Update: One additional test was to check what happens if you process the database, from BIDS, instead of the cube when you have duplicates in the product dimension. Right click on the database in the solution explorer and select process. I get an error without having configured error handling in the cube.
 
Process the database with error
 
 
Summary:  If you add a duplicate record to a dimension, SSAS 2005 will ignore that. The dimension and the cube will be processed without errors. If the dimension is configured to not ignore duplicates and stop processing, the dimension will not process but the cube will. If the database is processed, the error configuration in the dimension will stop the processing of the database.
 
I am running SSAS 2005 SP2 with Service update 8. 
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: