Thomas.I Microsoft BI & Analytics

A Site Dedicated to General BI Issues and MS BI Issues: Architecture and Technology

Archive for the ‘Integration Services’ Category

Get the count from the rowcount transform

Posted by thomasivarssonmalmo on November 27, 2007

This is a blog regarding how you can write the value of a rowcount transform to a simple log table. It was not that easy to find information about this even in SSIS books and I did not like the option to use script for this since I prefer a simple standard solution.
 
Start with building this simple log table in the database of you choice.
 
Create Table SimpleLog
( DataRowCount int,
  InsertedDate datetime)
 
We will simply export the content of the DimTable in the AdventureWorksDW database part of the samples when you install SQL Server 2005.
 
Do a simple Select * From DimTable.
 
Start management studio and connect to this database. Under tools and options in the meny you can check Query Results, SQL Server and Results to text. For output format I have choosen comma delimited. Choose export to file for the result set and save it as a *.txt-file.
 
Management studio
 
 
Now we can start SSIS in BIDS. First we will start by creating a variable that will hold the value of the rowcount transform. Right click in the area of the controlflow if the variable window is not seen to the left. Create a new variable with the name NoTimeRecords and set the data type to int.
 
After that you choose tools and pick the data flow task  and place it in the control flow. Click on this controlflow to go to the data flow. Set up a connection to your file, that you previously created with a flat file source connector.
 
Flat file connection
 
 
 Then you add a row count transform and connect it to the flat file source.
 
Rowcount
 
In here you only connect variable name property to the variable that you created previously.
 
Next you create an OLEDB destination, connect it to the rowcount and create a destination table.
 
 Finally we are at the point where we will write the value of the variable to the simple log table.
 
 Go to the eventviewer tab and choose the dataflow task you have created, not the package. Choose the OnPostExecute eventhandler to the right and click on the link in the middle of the pane. Drag an Execute SQL task from tools to the pane.
 
EventhandlerSQL
 
 
Paste this SQL into the SQLStatementbox and configure the general tab as in the picture below.
 
Insert Into SimpleLog
(RecordRowCount, InsertedDate)
Values(?,GetDate())
 
General
 
On the parameters tab you configure the Execute SQL task according to the last picture.
 
Parameters
 
 
Finally we can execute the task and check the SimpleLog table for a record.
 

Posted in Integration Services | Leave a Comment »

Can you generate surrogate keys in TSQL?

Posted by thomasivarssonmalmo on November 22, 2007

In a good data warehouse design you substitute data source keys with surrogate keys. These keys are withoút any meaning and made by integers and in SQL Server you set the property "Is Identity" to get an auto incrementing generation of the surrogate keys.  This is known and a prevailing practise when you build a data warehouse in SQL Server.

Now to the main story. Can you generate these surrogate keys without the identity property of an integer column?

If you search the web you will mostly be recommended to build this in SSIS(Integration Services) with a script component. Is there no way to do this this in TSQL and an Execute SQL Task?

My businessproblem is that I have a source system that will supply members to a dimensions table. At the same time it is possible to generate new members in a separate application and the key of this dimension is not an identity column. When my SSIS package insert new members into this dimension I will have to know about the key values in advance in the dimension table.

After spending much time reading blogs of different SSIS gurus I opened a book, "Inside MS SQL Server 2005: TSQL Querying"(MSPress, Itzik Ben-Gan) and had a look at chapter 8 " Data Modification". On page 452 I found what I was looking for, Assignment Update.

This is my solution after reading that chapter.

Declare @i as int
Set @i = (Select Max(MyDimSurrogateKey) From MyDimensionTable)

Update MyStageDimensionTable

Set @i = MyDimSurrogateKey = @i + 1

My update statement will first search for the highest value in the dimension key and assign that to the variable @i . After that it will update the dimension key in the stage table with the highest dimension key value plus one.

I am actually assigning the surrogate keys in my stage table for the dimension and doing the insert into the data warehouse dimension table after that.

It works if I have to insert several records or only one. It relies on that these surrogate keys have no real meaning and that this small script can always secure to insert the record with the highest key value.

Since integers can have large values I am not afraid of the Max strategy to get out of values.

I would like to hear comments about this and if there are some negative implications of this? In my business scenario it works.

Be aware of that it will only work with a data warehouse in SQL Server 2005. This is a TSQL extension.

Thank’s  Itzik Ben-Gan!

Posted in Integration Services | Leave a Comment »

How to create a time dimension in SQL Server 2005

Posted by thomasivarssonmalmo on October 2, 2007

A lot of posts in the SSAS2005 newsgroup have questions regarding how to construct a time dimension.

In SSAS2005 you can generate a server time dimension but having control over this is much better and it is not that hard.

Let’s start with creating a list of dates.  Jamie Thomson (http://blogs.conchango.com/jamiethomson/rss.aspx) has a TSQL solution with a new feature in SQL Server 2005 called common table expressions or CTE: s

Start management studio and the database engine. Start a new query and copy this statement:

 

with mycte as

 

(

 select cast(‘2003-01-01’ as datetime) DateValue

 union all

 select DateValue + 1

 from    mycte   

 where   DateValue + 1 < = GetDate() + 1000

 

  )

 select DateValue

from    mycte

OPTION (MAXRECURSION 0)

If you prefer you can switch the GETDATE () function to a fixed date. You can either copy the result set to excel or add an Into-Statement directly after (Select DateValue + 1 Into MyTimeDim) to create a table directly.

With this code you will have the leaf level ready in the time dimension. Notice also that the time part is 00:00:00:000 and that you will have to secure that your fact records use the same time in order to avoid strange results.

The next question is how I could change that date and take away the time part and only leave the date for presentation in a dimension? In TSQL you have the CONVERT() function for that. Normally you use it to change data types but it also have an important third argument where you can format to a specific date format.

Try this TSQL code in management studio:

Select Convert (Char(10),GetDate(), 112)

You will see todays date in ISO-style(20071002) the way we present a date in Sweden.  Change the last argument to a code that suits you. You can find more information about this in Books on Line if you search on CONVERT.

In the dimension editor you will use the full date column as key and the ISO Date as the name column.

To build the natural hierarchies or levels above the date you can use the TSQL DATEPART function.

Be aware of that DATEPART returns an integer and is only unique on the year level (2006, 2007) but not on the quarter (1, 2, 3, 4) and month levels (1-12). You will have to build collections for the key columns of each level in the time hierarchy in the dimension editor (BIDS). For quarter you will use year (2007) and quarter (1) as the collection key. The month level will need the Year (2007) and the month numbers to be unique.

When this is finished you should check that the type property (dimension editor) is properly set for Year, Quarter and Month in the user hierarchy. If not your MDX time calculations will not work.

In the picture below you can actually see that the dates will appear correctly in Excel 2007 and in the right order with this mix of different name and key column.  I have the ISO-date format as name column but the original date column, with the time part, as the key column.

Posted in Integration Services | 1 Comment »

New SSIS book for BI-developers

Posted by thomasivarssonmalmo on June 13, 2007

I received a paid copy of Wrox: "Expert SQL Server 2005 Integration Services"  by two SSIS MVP:s Knight and Veeman.
 
Since the coming days will be the end of a hot period here in southern Sweden I will have have some time to read this book and try the examples. No golf in the coming days only SSIS.
 
I have only done a brief survey of the book and it looks good from the chapters I have examined.
 
One thing I do not like, however, is many pages of code blocks in C# and VB.Net in the first chapters. Gives me the impression that SSIS is not good enough and sometimes it is not. Still it have been good enough in the projects I have worked with.
 
Still, this book is a promising contribution to knowledge about data warehouse and SSIS.
 
Two other books on the market have been a little disappointing from a data warehouse perspective:
  • Kirk Haselden: "Microsoft SQL Server 2005 Integration Services" (SAMS)
  • Wrox: "Professional SQL Server 2005 Integration Services"

Both these books concentrated a little bit to much on SSIS features and totally missed the context of data warehouse/BI.

I will return with a more deep review in the next weeks.

Katmai will have to wait until CT2.

Posted in Integration Services | 2 Comments »