Can you generate surrogate keys in TSQL?
Posted by thomasivarssonmalmo on November 22, 2007
In a good data warehouse design you substitute data source keys with surrogate keys. These keys are withoút any meaning and made by integers and in SQL Server you set the property "Is Identity" to get an auto incrementing generation of the surrogate keys. This is known and a prevailing practise when you build a data warehouse in SQL Server.
Now to the main story. Can you generate these surrogate keys without the identity property of an integer column?
If you search the web you will mostly be recommended to build this in SSIS(Integration Services) with a script component. Is there no way to do this this in TSQL and an Execute SQL Task?
My businessproblem is that I have a source system that will supply members to a dimensions table. At the same time it is possible to generate new members in a separate application and the key of this dimension is not an identity column. When my SSIS package insert new members into this dimension I will have to know about the key values in advance in the dimension table.
After spending much time reading blogs of different SSIS gurus I opened a book, "Inside MS SQL Server 2005: TSQL Querying"(MSPress, Itzik Ben-Gan) and had a look at chapter 8 " Data Modification". On page 452 I found what I was looking for, Assignment Update.
This is my solution after reading that chapter.
Declare @i as int
Set @i = (Select Max(MyDimSurrogateKey) From MyDimensionTable)
Set @i = MyDimSurrogateKey = @i + 1
My update statement will first search for the highest value in the dimension key and assign that to the variable @i . After that it will update the dimension key in the stage table with the highest dimension key value plus one.
I am actually assigning the surrogate keys in my stage table for the dimension and doing the insert into the data warehouse dimension table after that.
It works if I have to insert several records or only one. It relies on that these surrogate keys have no real meaning and that this small script can always secure to insert the record with the highest key value.
Since integers can have large values I am not afraid of the Max strategy to get out of values.
I would like to hear comments about this and if there are some negative implications of this? In my business scenario it works.
Be aware of that it will only work with a data warehouse in SQL Server 2005. This is a TSQL extension.
Thank’s Itzik Ben-Gan!