BISM MD: Introduction to Evaluation Nodes

This is the first introduction to the concept of evaluation nodes in SQL Server Profiler and what they can tell you about what is going on in the SSAS engine. I would also like to examine where the evaluation nodes show up in the Profiler trace. Evaluation nodes were introduced as events in SQL Server Profiler with the SQL Server 2012 release. They have been part of the engine before that but it is in the SQL Server 2012 edition they were added as Profiler Events.

I must acknowledge Akshai Mirchandani, from Microsoft, who has answered my questions and given me feedback on this text. If anything is wrong here, blame me.

SSAS/BISM Multidimensional Query Plans

SSAS produces query plans like the SQL Server RDBMS when receiving queries and before doing any real physical work and retrieve data. Query plans in multidimensional structures are complex and would require many chapters in a book to cover. It is not possible to go into all details about how this works in SSAS/BISM MD in this blog post. I refer to the Analysis Services 2008 unleashed for details.

There are two main parts that works in this process and that is the Formula Engine (FE), which does most of the work, except for going to the disk system and retrieve data. A simplified general picture is that the FE handles the MDX (queries) sent to SSAS and breaks that request down into small parts, called sub cubes. It then tries to find out if it can find the data in the FE cache or will have to send it further to the Storage Engine.
Retrieving actual data from disk or the file system cache is the work handled by the other part, the Storage Engine (SE) unless there is data cached in the FE.

Logical and Physical Query Plans

The SSAS FE creates both logical and physical query plans. The SE has its own query plans and it is not a part of this blog post. The word plan here is not fully accurate since the profiler trace shows how the query was executed and not a planned query execution. SQL Server RDBMS query optimizer use estimated query plans and actual query plans but that is not the fact for SSAS/BISM MD.

Evaluation nodes are essentially nodes in the logical plan. Instead of using the word logical query plan it is recommended to use Evaluation node. They get converted into nodes in the physical plan and might get executed. Evaluation nodes can be created but not always executed. They can also be cached and used by other cells during query execution. All of this takes place in the Formula Engine.

Physical plan nodes in the FE implement calculation operations like multiplication and division but nothing of that are exposed in the profiler trace events for evaluation nodes.

Can Evaluation Nodes help in Query Execution Analysis?

Evaluation nodes are chatty events like most SQL Server Profiler Trace events for SSAS/ BISM MD. A lot of data gets generated and not in a friendly User Interface. With more complex calculations than this example you can get a lot of Evaluation Node events that can be hard to find patterns in. This blog post can help you with seeing how the Evaluation Nodes appears as three groups of events, one group that will get executed and two that will not get executed. The post will also help you to find the xml-tags with interesting information about the query plan.

An example of interesting information is if evaluation node is run in cell by cell mode or block mode. Have a look at the table at the end of this blog post where several xml-tags in an evaluation node are explained.

 

The cube used and the queries

This is my sample cube with two measure groups (no calculations or measure expressions). It is the same cube that I used in my previous blog post.

The Cube Structure

There are no natural or user hierarchies’ only attribute hierarchies. Each MG has only one partition. There are no MDX scripts and no aggregations. The aggregation property of each attribute are set to default.

This is the first query that was executed on a non-cached cube.

The MDX Query

Below is the result of the simple query above.

The result of the MDX Query

The Complete Profiler Trace

To set up the trace with Profiler you need to add these events before running the query above.

ProfilerTraceSettings

I assume that the reader know how to configure the SQL Server Profiler.

Here is the first part of the Profiler Trace events.

FirstPartProfilerTrace

This is the second part of the profiler with the event following the last event above.

SecondPartProfilerTrace

Grouping the Trace Events

Let me first try to describe what is going on here with a high level picture.

EvaluationNodesTopView

The VALUE, FORMAT_STRING and LANGUAGE groups are derived from an xml-tag (CellProperty) in each evaluation node events that will be shown later. Each group of evaluation nodes have their own NodeIndex-property. The first group starts with NodeIndex 0.

EvalNodesNodeIndex

First the FE starts with analyzing the sets on each axis and builds subcubes that covers these sets. Then an iteration over the cells starts with a request of the current cells property value. Each intersection of axis sets points to a cell. FE will then find a subcube that was previously built and then create an evaluation node for that subcube. This will trigger the Init-Build-Prepare of the evaluation node in the Profiler trace. Data is then requested from the SE that is needed by this evaluation node, including data needed by other evaluation nodes (look at the Query Subcube Verbose event in the Profiler Trace), that were created indirectly by the Build/Prepare phases. Now it is time to run the evaluation node (event 7, RunEvalNode Start and 8, RunEvalNode End). This is optional sometimes since because evaluation nodes do not always have to be run (for non-value cell properties or cell-by-cell mode). Finally the value for the current cell property is fetched as a result of running the evaluation node (or calculate it as a cell-by-cell operation). Then these steps are repeated for the next cell.

I we look at the Profiler trace earlier we can see that the evaluation node that was run was cached and used by the other cells. The two later groups with CellProperties(, FORMAT_STRING, LANGUAGE) were never run since they do not have the RunEvalNode-events after them as their last step.

This is the first group of Evaluation nodes will CellProperty (VALUE) and NodeIndex(0).

FirstGroupOfEvaluationNodesValue

If I click on the Calculation Evaluation-2 InitEvalNodeEnd of the first group I can see this property:

InitEvalNodeEnd

 

Single Evaluation Nodes: Prefetch Operation

To add up the discussion about the first group of evaluation nodes we can say this:

  • · The evaluation nodes are initialized and built
  • · The same group of evaluation nodes are then prepared.
  • · The evaluation nodes are run.

Actually there is also a less clear step included in this group. That is a prefetch operation were all SE requests for all evaluation nodes that have been prepared are issued to the SE. The prefetch gathers together the subcubes that were detected as part of the prepare phase of o or more evaluation nodes and not just the last one. There is no event in the trace indicating the prefetch itself. This prefetch operation is a SE request for all prepared evaluation nodes and that is why the evaluation nodes show intermingling of SE queries. After the prefetching is finished the evaluation node can be run.

This is the second group with CellProperty (FORMAT_STRING) that was not run.

CellPropertyFormatString

There is no RunEvalNode event in this group.

This is the third group with CellProperty (Language) that also was not run

CellPropertyLanguage

Like the previous group of evaluation node events there is no RunEvalNode event in this group.

Evaluation node tags for the run evaluation node

Here I have two fragments from the same evaluation node in Profiler.
I refer to the table for details about each tag. The most interesting here are:

  • · <Exact>1</Exact>: Which means that the subcube is an exact match.
  • · <LazyEvalutation>0</LazyEvaluation>: The evaluation node was run in bulked mode
  • · <SparseIterator>1</SparseIterator>: The evaluation node did not have to iterate over the entire cube. It was iterated over a smaller, sparse , space.

The collection of the tags i refer to are here.

InterestingTags1

InterestingTags2

Summary

This blog post is an introduction to evaluation nodes in SQL Server Profiler with the scenario of a cube with no calculations. This is why the resulting information from the profiler trace in this scenario is limited. The next post will have a look at evaluation nodes when MDX expressions are involved.

Table:Evaluation Node Tags

EvaluationNodeTag

Explanation

Exact

Exact means that the subcube is an exact match of the cells that are needed by the query. Sometimes the server can build a subcube that is *larger* than the cells needed by the query, and in that case Exact will be false.

Empty

Empty means that the result of this evaluation node is empty, none of the cells have any data . This can happen if you have a subcube that is somehow not valid (e.g. perhaps the subcube has a coordinate that doesn’t auto-exist with other coordinates or perhaps some other situation where calculations get eliminated until nothing remains and the subspace is guaranteed to have no cells with data).

PlanWasRun

PlanWasRun means the execution plan for this evaluation node has or has not yet been run. It will get run when the Run events show up – or it will *never* be run, because FE ends up fetching cell values in a cell-by-cell mode.

PlanMustBeRun

PlanMustBeRun is too internal to explain. But essentially there are some situations where we build an evaluation node, and it turns out that it matches another evaluation node that has already been executed – in that case, we can just point the evaluation node to the cached result of that earlier evaluation node, and the plan for the new evaluation node does not need to be run, and so the flag will be set to false.

NaiveEvaluation

This is on the evaluation node item. Use the value of the LazyEvaluation tag instead.

Status

PrePareStarted/Built/Uninitialized/RunStarted/RunSucceeded

CalcsAtSameGranularity

This indicates whether the calculations are on the same granularity or not.

SingleCell

This means whether the subcube of the evaluation node is for single cell coordinates. This may translate into cell-by-cell.

LazyEvaluation

Indicates cell-by-cell mode or bulk evaluation.

SparseIterator

When in block mode, do we have to iterate over the entire (dense) space of the entire subcube or can we iterate over a smaller (sparse) space. Remember that Sparse iteration is good. E.g. if you have a calculation with the expression [Measure1] * 2, and Measure1 is a storage engine measure, then we can execute a storage engine request and iterate just over the cells returned by the Storage engine request – which is a much sparser space than the entire subspace because many of the cell values are null and we don’t need to calculate them at all.

EvaluationItemCount

The number of calculations that apply to the subspace – these can change as we proceed through the stages of Init/Build/Prepare, because some of the calculations will get overlapped/split/eliminated. The final set is what you see at the end of Prepare or the beginning of Run

Overlaps

Overlaps means whether individual calculations overlap with each other. E.g. a scope on all states and a scope on the state of Washington – the item representing the first calculation has an overlapping calculation.

CoversFullSpace

CoversFullSpace means whether the calculation item covers the full space of the evaluation node, or whether the item has filters (e.g. State = Washington) that make it smaller than the full space of the evaluation node.

HasConditions

HasConditions refers to IF conditions in the calculations.

NoCalculations

0 -> Only Storage Engine Query

CellProperty

As discussed above.

CalculationLocation

This is the location of the calculation. When it is only SE queries this tag contains the measure group. In general it will contain the location and expression that applies to the evaluation node item.

VaryingAttributes

Drives the way the expression is evaluated and makes the expression dependant upon it. The 0s and 1s are bits and the numbers from 0 to 140 are the bits that are turned on .

ISOWeeks in SSAS Revisited

I wrote a blog post a few years ago about IsoWeeks in SSAS and now it is time to do an update. Since SQL Server 2008 and later we have a TSQL Datepart() argument that will remove the need to build TSQL User Defined Functions to add this information into a date dimension table. You will see this in the code samples but have a look at the datepart() TSQL function i Books On Line for more information.

ISOweeks seems to be the fashion here in the Scandinavian countries. The main idea is that a year can contain 52 or 53 weeks depending on if the new year starts on Thursday or later. All ISOWeeks have seven days and the main problem for SSAS or BISM Multidimensional is that with ISO weeks they cross over calendar years and this means a many to many relation between ths IsoWeek and the calendar year if you try to add a calendar year as a hierarchy level above IsoWeek. The question is if you should add a year level above the IsoWeek or not. If you decide to add a year level that must be a business defined IsoYear level and not a calendar year. In this blog post we assume that such a level is needed and the business rule is that week 52 and 53 should always belong to the previous year and week 1 to the new year. As an outcome of this descision is that ISOYears and calendar years will not have sum up in the same way. Totals over years will be the same but not the yearly distributions.

Enough said and now it is time to transfer this IsoWeek requirement into code and a working solution. My code is a theoretical example and has no real world background except for the date dimension. It is simplified to check that the numbers are summed correctly and includes one order for each day that is part of the date dimension.

I will start with the code for a date dimension below.  I am using a CTE(TSQL Common Table Expression) with  a start date(2009-01-01) and an end  date to the last date of the current year ( where   Year(DateValue) < = Year(GetDate()) ) . You can also see the datepart-function with the ISOWeek argument. I am doing a select Into TSQL statement to create a new table from the CTE. The last column(1 as IsoYearWeek) was added to create an integer column that we will use in a second step. The column created with the TSQL case statement is the business rule mentioned earlier to create the IsoYear classification. I thanks my colleague Eva Eriksson for her kind help.

 

DateDimensionCode

 

In a second step I use this TSQL statement to update the IsoYearIsoWeek column:

Update dbo.TestDateDim
Set IsoYearIsoWek = IsoYear * 100 + IsoWeekInt

The result will look like this in Management Studio.

DateDimensionTable

 

Finally it is time to create the dummy fact table with one order for each date in the date dimension. This is also done with a CTE part of SQL Server since the 2005 version.

 

FactTable

It is time to build the date dimension and the cube in the SQL Server 2012 Data Tools previously named BIDS. I assume that you now how to do this except for the settings I have added all attributes to the date dimension. In the wizard you should set the type properties for the different date attribute members that will be part in MDX Time calculations. Some of the attribute types are kept as regular because they are not important for this example.

AttributeTypesDateDim

I have created these hierarchie below in the date dimension.

 

DateHierarchies

I have also created the attribute relations for the hierarchies like this.

 

Attribute relations

Finally it is time to add the fact table. In the cube structure tab it look like this.

 

FactTable2

And in the dimension usage tab it look like this.

Fact Table

 

Process the cube and start Excel to have a look at the cube. I am using Excel 2013. I have created two pivot tables from the same connection. The pivot table to the left is the IsoWeek hierarchy and the one two the right is the calendar  hierarchy. We have the same totals but the distributions differs. If you expand the IsoWeek hierarchy for IsoYear 2012 all weeks should have 7 as the sum of orderQty, like you can see in the second picture.

 

TheTwoHierrachies

 

IsoWeekExcel

I a future post I will cover a  many to many requirement in a date dimension with weeks.

My Quick Guide to SSAS Query Tuning Part One

The intention of this blog post is to write a practical introduction to SSAS Query tuning with the SQL Server Profiler tool. I think that SQL Server profiler can be very usefu but it also have its issues with delivering too much information and in a very unclear way. I hope that this post can be helpful and show how Profiler interacts with the SSAS engine.

The first step is to build a modified Adventure Works cube that only use basic measures from the fact table, no MDX scripts(except for the default calculate() and with no measure expressions. With measure expressions you will get more events, in Profiler for a single query, than you need to start learning this tool. I am using SSAS 2008 R2 SP1 and the Adventure Works DW database and cube project available on www.codeplex.com.

I have two measure groups in thiis model.

TheMeasureGroups

The following dimensions are used. They have no attribute relations yet, only flat connections with the dimension key in each dimension. You can use as many as you like. I have included more dimensions than I need for this blog post but I will refer back to this model in later posts.

TheDimensions

Finally, these are the relations between the dimension and measure groups. It is exactly as in the standard Adventure Works model but without the measure expressions.

The relations

There are no partitions in the measure groups and no aggregations in the measure groups. We will add that in a later post and see the impact of that. I will improve this model step-by-step.

I assume that starting SQL Server Profiler is not an issue here. I have a saved template that I reuse for query tuning. The focus in this blog post are some of the following events.

The Profiler Trace Events

Now it is time to start the fun. I assume that you have the SSAS service up and running and that you have started SQL Server Management Studio and connected to SSAS. Otherwise you will se a lot of file load events that I have described in two previous blog posts. Start the profiler just before running each query and after each query stop it and clear the events. What you will see here, at best in this blog post, is storage engine events and nothing in the formula engine since this model do not include MDX scripts yet.

I will start wih this query:

Select {} on 0
From [Adventure Works DW2008R2 Performance];

Here is a subset of the events that you will see in Profiler.

FirstQueryNull

There are no Query Subcube events so no activity has been going on in the storage engine. The query was sent to the server and a result was returned(Serialize Results Current).Resource usage shows zero on all counters. Thanks to Chris Webb research about these counters here

The second query will use a basic measure group measure on the columns axis and nothing more.

Select {[Measures].[Order Quantity]} on 0
from [Adventure Works DW2008R2 Performance]

.

Query2aPartitions

The is one reading from the measure group and partition containing the [Order Quantity] measure.

A new event occurs that was not shown in the first query. The event subclass numbers 2 and 22(query subcube and query subcube verbose means that data was not retrived from the storage engine cache.

Query 2 Query Subcube Verbose

If you look at the Query Subcube Verbose event in the more detailed window that appears if you mark that event you will see this.

Query 2 Subcube Verbose

We did only send a query that requested a measure from the cube but no dimensions, like you can see above. On each dimension you have a line of zeroes meaning that no specific dimension member was asked for in the query. Profiler only shows the dimensions related to the measure in the measure group. The line of zeroes is a mathematical model called vectors and that is how SSAS currently shows its internals.

This is not the main point here but the resource usage counters are the following for this query explained from Chris Webbs blog post above.

Query 2 Resource Usage

The reads tells you about the number of disk read operations for the query and the reads in kb tells you about the physical amount of data that was retrieved.

The number of rows scanned are the number of rows decoded/filtered/aggregated by the storage engine.The number of rows returned are the number of resulting rows from the same query. I will not go into detail about these counters.

The next query will retrieve the same measure as before but add a new measure from the second measure group.

Select {[Measures].[Order Quantity],[Measures].[Order Quantity – Fact Reseller Sales]} on 0
from [Adventure Works DW2008R2 Performance];

For the same measure that we asked for in the pervious query we will get this event in Profiler.

Fråga3Cache

The data above is retrived from the storage engine measure group cache. In the query subcube and subcube verbose you will see both event subclass 1 and 21 showing that the data was requested from the measure group cache.

The second measure goes through the same process as the measure in the previous query.

Query 3 second Measure

The data is not cached since it has never been retrieved before.

Both Query subcube verbose events still show the same zero for all dimensions. This is the query subcube verbose event for the second measure.

Query 3 second measure query subcube verbose

If you like you can run the third query a second time to confirm that both measures will be retrieved from the storage engine cache.

Finally it is time to see what happens in profiler when you run queries for dimension members. I will only use a date dimension to show this.

This is the first query.

Select {[Order Date].[Calendar Year].&[2010]} on 0
From [Adventure Works DW2008R2 Performance];

This is the clear message from the SSAs engine that you have requested a dimension member in the Query Subcube event.

Query Subcube query 4 query subcube

Can you see the 1 appearing at the end?

Query subcube verbose is a little bit more generous.

Query4 verbose

There is a number 2 on the order date dimension. This is the internal SSAS DataId of that member.

Finally if you run this query with many dimension members:

Select {[Order Date].[Calendar Year].&[2005],[Order Date].[Calendar Year].&[2006],[Order Date].[Calendar Year].&[2007]} on 0
From [Adventure Works DW2008R2 Performance];

Query 4 Many dimension members

Can you see the star * on dimension 4? This means that all members was requested from that attribute. 

If you run this query with two dimension members:

Select {[Order Date].[Calendar Year].&[2005],[Order Date].[Calendar Year].&[2006]} on 0
From [Adventure Works DW2008R2 Performance];

Query4 two members

Here you will see the plus sign + . This means that more than one member was requested.

I hope that this first part of the query tuning can explain the storage engine retrieval of data in Profiler and how it describes the retrieval of dimension members.

When Analysis Services Wakes Up Part Two

This is the second part of a series of introductions to performance tuning of Analysis Services. The main tool used in this blog post is SQL Server Profiler that we will use to see what happens when SSAS starts up after a restart of the service and when the first user connects to SSAS.

Like I showed in the first part was the loading of files that starts when you connect to SSAS with Management Studio (MMS) but before you start a new MDX query. This is shown in the File Load Events in the picture below.

clip_image002[6]

If the first user connection to SSAS by a client tool like Excel a lot of more events will fire off so in order to digest different groups of events my advice is to start with Management Studio.

What I did not mention in the previous post was that this is Meta data being loaded and it will always be initialized by the first user connection to SSAS after a service restart. Thanks to Akshai Mirchandani in the SSAS team for clarifying this.

This will also happen after a cube has been changed so a new Meta data file version needs to be loaded. You can try that by running a full process of all cubes and dimensions in a database-

This file loading process will load Meta data for all cubes databases that you have deployed on the SSAS server so the number of files depends on that. If you have a lot of cube databases deployed on the SSAS server the loading of this Meta data can take significant time.

The next step is to continue from Management Studio and start a new MDX query that will trigger some new events. In order to see this you will need to add the “Execute MDX Script Begin”- and End events to the Profiler Trace. The scripts are loaded into memory from the MDX Script file .These events will not be triggered when you connect to SSAS from MMS like we did initially. It is only when you start a new MDX query in MMS that these events will be triggered.

Actually you will see the file with the MDX scripts being loaded first.

\\?\C:\Program Files\Microsoft SQL Server\MSAS10_50.MSSQLSERVER\OLAP\Data\Adventure Works DW 2008R2.0.db\Adventure Works.0.cub\MdxScript.10.scr.xml

clip_image004[6]

Execute MDX Script Begin/End will also happen each time you clear the Cache. If you have Query dimension and Calculate Non Empty events activated in the trace they will appear together with the MDX Scrip Begin Events but this depends on the MDX Script events being evaluated internally(according to the MS White Pape:” Identifying and resolving MDX query performance bottlenecks in SQL Server 2005 Analysis Services”).

If the first connection to SSAS is from a tool like Excel a connection to a specific cube database and cube have been defined first and this means that not all meta data files for all cube databases and cubes will be opened. Management Studio will open up all files that are related to all databases under the SSAS server.

Progress Report Begin Events after Excel 2010 is connected to SSAS 2008 R2.

clip_image006[6]

When all files are loaded together with the MDX script let us clear the cache by running this command.

clip_image008[10]

After running the XMLA command Clear Cache we call the cube again by running this MDX query.

clip_image009[10]

That will trigger that the MDX script will be loaded again. This is a sample of the Profiler events being triggered.

clip_image011[6]

The Windows O/S has a file system cache that SSAS can benefit from by not having to read files from the I/O system but from memory which is must faster. The file system cache is never cleared with the XMLA Clear Cache command.

If I clear the file system cache with the SSAS stored procedure without a restart of the SSAS service, the clearing of the file system cache will not unload the MDX scripts from memory. The Execute MDX script will not reappear.

clip_image012[10]

clip_image014[6]

The loading of the meta data files that I mentioned first in this blog post are not affected by the clear file system cache stored procedure.

You download this stored procedure from www.codeplex.com and install it on your SSAS Server as a new assembly. I have my doubts on if you will be allowed to install it on a production system so I recommend using it on your own development sandbox.

When Analysis Services 2008 R2 Wakes Up

This is the start of a small series about SSAS performance scenarios that might help in everyday performance discussions with users and customers.

The first post is about why SSAS initially can be perceived as slow after a restart of the SSAS service and when the first user connects after the restart. It does not matter if the first users is a SSAS administrator or a user connection with Excel. I have used the Adventure Works 2008 R2 demo cube project but I think that you will see the same behaviour on versions from SSAS 2005.

This post only use the SQL Server Profiler performance tool and Management Studio or Excel 2010 to connect to the Adventure Works cube after a restart of the SSAS service.

Go to SQL Server Profiler under Performance Tools and start a new trace with a connection to SSAS. Use the standard template and save it to a file to get a first view about what you can do with Profiler. On the second tab I have selected the following events in Events Selection.

 

Profiler File Load Events

In a future post I will continue with more events in Profiler after a restart of the service but this can be enough for this scenario. Actually you will only have to select the File Load Begin and the File Load End events.

Now you can stop the SQL Server Profiler, if it is running, and push the clear trace window button in Profiler. Restart the SSAS service in SQL Server Configuration Manager and when the service is restarted you can start the Profiler with the trace that you have created. When both the SSAS Service and Profiler is started you can connect to SSAS with a client like Management Studio or Excel.

In Profiler you will see the following file load events, in the picture below, that will only show up after a restart of the SSAS service. This are the events that can make you perceive SSAS as slow during the first connection.

SSAS File Loading Profiler

If you play with clear cache XMLA commands or clearing the file system cache the file load events will never show up if you do this on the same cube database. When you connect with XMLA in Management Studio all database files on the server will be loaded from the SSAS files. With and MDX query it is only the target database files that will be loaded.

What can you do to improve performance of the SSAS file loading? Buy a quick I/O system(disks). With unlimited budget I will recommend solid state disks that will give you quite amazing performnce after a restart.

In a coming blog post we will have a look at more events after a restart of the service and compare that to what happens when you clear the SSAS cache and the file system cache.

Reflections on the new version of the SSAS Performance Guide

It is here and you should read it. I know about the real world scenarios of fixed price projects and set up something that is good enough. In your own professional struggle to learn about SSAS internals and design trade offs I would recommend any BI professional to read it. Even if you do not build Terabyte cubes like the SQL Cat Team does the recommendations in this white paper are important for all SSAS implementations.

In the SSAS 2008 R2 version I like the following.

It is well written and clear and not trying to force a lot of of abstract technical babble into your mind. It is actually the best Performance Guide I have read. The hard core geek stuff is instead in the SSAS Operations Guide . That is also an important part but by separating design and operations guide lines we have two more focused parts on SSAS development.

The most important line is this: “Achieving a balance between user freedom and scalable design will determine the succes of a cube”. Too much freedom and you will have cubes that do not work from a performance objective and too much performance requirements will make your users unhappy.

The dimension design part have improved a lot. Read all of that because it is about the design issues I meet daily. The graphics for the many design choices you will have to make are very valuable additions to this white paper.

What I would like to see in a future release is a specific document about MDX performance considerations. It is part of this document but the topics regarding MDX are numerous and it is impossible to give answers to that in this document.

Nice work SQL CAT and the reviewers.

Ignore processing errors in SSAS 2005 and 2008

When you start with a SSAS prototype or a proof of concept cube you seldom have the time to clean data in a comprehensive way. You want to show the customer a cube with their data and if it is not perfect that is not important in a prototype. In this simple example I will show how you temporarely can ignore dimension key errors like when you have a fact table record with a dimension key that is missing in the dimension table. This example will work both in SSAS 2005 and 2008. Remember that this is not a praxis recommended for production systems. Also check my blog entry about inferred members that shows how SSAS 2005 takes care of this problem in a better way in the long run.
 
 I will start with building one time dimension table and a fact table with the following script in SQL Server 2005 management studio.
 
Create Table TimeDim
( FullDate datetime)
 
Create Table FactOrphan
(FullDate datetime,
 Qty int)
 
Insert Into TimeDim Values(‘2009-07-19′)
Insert Into TimeDim Values(‘2009-07-20′)
Insert Into TimeDim Values(‘2009-07-21′)
 
Insert Into FactOrphan (FullDate, Qty) Values (‘2009-07-19′,1)
Insert Into FactOrphan (FullDate, Qty) Values (‘2009-07-20′,1)
Insert Into FactOrphan (FullDate, Qty) Values (‘2009-07-21′,1)
Insert Into FactOrphan (FullDate, Qty) Values (‘2009-07-22′,1)
 
When this is working correctly you will have the last record in the fact table, with the date ‘2009-07-22′ as an orphan which means that it is not related to a record in the time dimension table. If you have primary- and foreign key constraints between the two tables this situation will never occur because you will get an error when you enter the last fact record.
 
Open BIDS and create a new SSAS project, connect to the database where the two tables reside and create a data source view with these two tables. Create a logical primary key in the dimension table and relate the fact table to the dimension table in the data source view. The finished result  will look like this:
 
 
After this you can right click the cubes folder in the solution view and build the cube with the wizard. It will build both the dimension and the cube. Hierarchies in the dimension table is not important for this example.
 
Now it is time to process this cube and see what happens. Right click on the root folder at the top in the solutions window and select process. We will process this cube with the default settings.
 
 
Click run and see what happens. You will get the following long error message in the process log window.
 
"Errors in the OLAP storage engine: The attribute key cannot be found: Table: dbo_FactOrphan, Column: FullDate, Value: 2009-07-22. Errors in the OLAP storage engine: The record was skipped because the attribute key was not found. Attribute: Full Date of Dimension: Time Dim from Database: FactOrphans, Cube: FactOrphans, Measure Group: Fact Orphan, Partition: Fact Orphan, Record: 4. Errors in the OLAP storage engine: The process operation ended because the number of errors encountered during processing reached the defined limit of allowable errors for the operation. Errors in the OLAP storage engine: An error occurred while processing the ‘Fact Orphan’ partition of the ‘Fact Orphan’ measure group for the ‘FactOrphans’ cube from the FactOrphans database"
 
To force SSAS 2005 and later to ignore this error we will have to click on the change settings button after you have retried to process the cube again in the solutions window. This is the default setting of the dimension key error tab in the change settings dialogue. The first tab is not that interesting in this scenario. Activate the "Use custom error configuration setting" and all the choices below will be available. Change the setting to the ones in the next picture. 
 
 
If you want to log the discarded records you must change the "key not found" setting to "report and continue". This will help you to identify the records that was discarded in the cube processsing. The log file will look like the previous error message when did a default full processing.
 
 
Finally when you process with these setting the cube processing will work and you will see the following processing window.
 
 
 Summary: This is a way of building a prototype quickly when the ETL for the dimensions and fact table loads is not finished or have not started yet. It is not a long term solutions since fact records will be removed from the cube without any further notice unless you do someting with the log file.

Loading duplicates records into SSAS 2005

This post is about what will happen when you load duplicate dimension records into SSAS 2005. A following post will deal with duplicate fact records. Will SSAS 2005 generate an error when you try the first scenario? Not by default as we shall see in a simple cube.
 
Edit: After doing some additional testing I have updated this post at the end.
 
When you have primary keys in all dimensions, foreign keys in the fact table and a unique constraint on all the fact table foreign keys the problem with duplicate records will never occur in either the dimension tables or the fact tables. You will get errors when you load the tables in the ETL-process. In reality the data quality is not always that good or you might have performance objectives that will prevent you from having constraints in your starschema.
 
Run this script to construct two simple dimension tables and a fact table.
 
Create Table Product(
ProductId Int,
ProductName varchar(20),
ProductCat varchar(20))
 
Create Table Dates (
DateId Int,
DateKey DateTime,
YearNo Int)
 
Create Table FactSales(
ProductId Int,
DateId Int,
SalesQty Int,
SalesValue Decimal(10,2))
 
Insert Into Product(ProductId, ProductName, ProductCat) Values(1,’Beer’,’Alcohol’)
Insert Into Product(ProductId, ProductName, ProductCat) Values(2,’Water’,’NonAlcohol’)
Insert Into Product(ProductId, ProductName,ProductCat) Values(3,’Milk’,’NonAlcohol’)
 
Insert Into Dates(DateId,DateKey,YearNo) Values(1, ‘2007-06-01′,2007)
Insert Into Dates(DateId,DateKey,YearNo) Values(2, ‘2008-06-01′,2008)
Insert Into Dates(DateId,DateKey,YearNo) Values(3, ‘2008-07-01′,2008)
Insert Into Dates(DateId,DateKey,YearNo) Values(4, ‘2008-08-01′,2008)
Insert Into Dates(DateId,DateKey,YearNo) Values(5, ‘2008-09-01′,2008)
 
Insert Into FactSales(ProductId, DateId, SalesQty, SalesValue) Values(1,1,10,100)
Insert Into FactSales(ProductId, DateId, SalesQty, SalesValue) Values(2,2,100,1000)
Insert Into FactSales(ProductId, DateId, SalesQty, SalesValue) Values(3,3,1,10)
Insert Into FactSales(ProductId, DateId, SalesQty, SalesValue) Values(1,4,20,200)
Insert Into FactSales(ProductId, DateId, SalesQty, SalesValue) Values(2,5,30,300)
 
In the script we will have no primary key defined but we will asssume that the Id-columns in both dimension tables are keys.
 
Start BIDS and build a new cube project with the following data soure view
 
Dsv
 
This is the product dimension with a hierarchy
 
Product
 
This is the date dimension with a hierarchy
 
Date
 
Finally, this is the view in the cube browser when the cube has been processed.
 
Cube browser
 
Now we can start the "fun" of loading a duplicate key into the product dimension table. Run the following update:
 
Insert Into Product(ProductId, ProductName, ProductCat) Values(1,’Wine’,’Alcohol’)
 
Both beer and wine will have the same "key". Process the dimension first and the cube after that. No error messages will appear.
 
The duplicate member will not show up in the dimension browser. This means that SSAS 2005 ignores duplicates and take th first record in the table.
 
The duplicat key will not show in dim
 
In order to see that you have duplicates in the dimension you can download and use BIDS-helper and the dimension health check functionality. This is the message after the duplicate dimension record has been inserted.
 
 
Dimension health check after dim duplicate 
 
 
In order to stop the processing of the dimension, if a duplicate record appears you will have to change the error configuration for the dimension. Change the default setting to custom and ignore error to report and stop.
 
 
 Change to report and stop cube
 
 
If you process the dimension, with this error setting, you will get an error
 
 
Process with the custom error config 
 
 
What will happen if you process the cube?
 
Cube processing will work
 
The error configuration in the dimension will not stop the processing of the cube. The cube will process.
 
Update: One additional test was to check what happens if you process the database, from BIDS, instead of the cube when you have duplicates in the product dimension. Right click on the database in the solution explorer and select process. I get an error without having configured error handling in the cube.
 
Process the database with error
 
 
Summary:  If you add a duplicate record to a dimension, SSAS 2005 will ignore that. The dimension and the cube will be processed without errors. If the dimension is configured to not ignore duplicates and stop processing, the dimension will not process but the cube will. If the database is processed, the error configuration in the dimension will stop the processing of the database.
 
I am running SSAS 2005 SP2 with Service update 8. 

Get the ISO Week correct in the Adventure Works cube

A few months ago I wrote about how you can calculate the ISO week by creating a user defined function with TSQL in your database.  You will need to read that blog post first and use the code for the UDF that you can find here. What I  did not show was how to create a hierarchy in the date dimension in a SSAS 2005 cube.
 
If you simply add the calendar year level above the ISO week level(Date, ISO week, Calendar Year) you will see a problem if you expand the dates between the last week of previous year and the first week of the following year.
 
Missing days
 
The reason for this is the problem I have written about earlier: Many to many relations in natural hierarchies.  An ISO week can belong to more than one Calendar Year and some dates will disappear.
 
How can we solve this problem? I have created a business rule that simply says that ISO week 52 and 53 always belongs to the previous year and week 1 always belongs to the new year.  To test it on the Adventure Works cube you can open the Adventure Works DW database in management studio and run the following TSQL script. You must have created the UDF for ISO week before you run it.
 
Alter Table DimTime
Add  IsoWeekTest int,
        IsoYear int
Go
 
Set datefirst  1
 
Update DimTime
Set IsoWeekTest =  dbo.IsoWeek(FullDateAlternateKey),
     IsoYear =
 Case When dbo.IsoWeek(FullDateAlternateKey) = 1 and DatePart(dy,FullDateAlternateKey) <= 10 Then Year(FullDateAlternateKey)
     When dbo.IsoWeek(FullDateAlternateKey) = 1 and DatePart(dy,FullDateAlternateKey) > 362 Then Year(FullDateAlternateKey) + 1
     When dbo.IsoWeek(FullDateAlternateKey) = 52 and DatePart(dy,FullDateAlternateKey) >= 355  Then Year(FullDateAlternateKey)
     When dbo.IsoWeek(FullDateAlternateKey) = 52 and DatePart(dy,FullDateAlternateKey) Between 1 and 2  Then Year(FullDateAlternateKey)-1
     When dbo.IsoWeek(FullDateAlternateKey) = 53 and DatePart(dy,FullDateAlternateKey) Between 362 and 366 Then Year(FullDateAlternateKey)
     When dbo.IsoWeek(FullDateAlterNateKey) = 53 and DatePart(dy,FullDateAlterNateKey) Between 1 and 3 Then Year(FullDateAlterNateKey)- 1
Else Year(FullDateAlternateKey)
End
 
The set datefirst command is needed to set monday as the first date in the week. The SQL Server default is sunday. The case code is using the fact that the datepart function will always return generally usable results with the day of year argument combined with the ISO week number.
 
If you identify a combination that will not work and place the ISO week under uncorrect years please leave a comment. I have checked dates from 2003 to 2011 but can have missed something. The Adventure Works time dimension do not have that many members.
 
Open the Adventure Works cube project in BIDS and start with a refresh command in the data source view so that the new columnns will appear in the time dimension table. Now we can start with building the new hierarchy in the date dimension.
 
The changed hierarchy
 
 I have created the hierarchy with Date, ISO week and ISO year as the levels. Since the ISO Week number is not unique over years you must add the ISO Year as a collection key to the Iso Week attribute. See the picture below.  I use the name IsoWeekTest in the script but have changed the name in the hiearchy.
 
Collectionkey
 
Do not forget to add the Iso Week as the name column or you will get an error message when you try to process the dimension. The ISO Week attribute’s orderby property must also be changed to key.
 
In the hierachy you shall add the Iso Year attribute as an attribute relation on the Iso Week level. This means that you will have to remove the ISO Year attribute as an attribute dimension to the dimension key.
 
Process the dimension and browse it when it is finshed. Expand the hierarchy as in the following picture.
 
All dates shows up
 
The business rule seems to work fine and all dates shows up. Now you can process the cube and have a look at the hierarchy in a tool. Here is how it looks in ProClarity.
 
ProClarity
 
The ISO year and week hierarchy is finally up and running in the Adventure Works cube project.
 
 
 
 
 
 
 

Null processing of measures in SSAS2005

A few days ago a question regarding null processing of fact records was posted on the SSAS forum. It was also about nulls in dimension tables but let us focus on what to do with empty measures in a fact record. Should you put zero in them or allow null? I answered that post with the assumption that you should never feed a cube with zeros because it will make the NON EMPTY keyword useless in MDX Selects. Still I never checked the options in SSAS 2005 so this post is the outcome of the need to check this more carefully.
 
I recommend you to think twice about your fact table design because if you have null or zeros for measures you might have designed it wrong. If not you can continue with this technical solution that SSAS 2005 offers.
 
We will need to build a simple cube to understand how this works. Run the following TSQL script in management studio in order to create a time dimension and a simple fact table.
———————————————
Create Table TimeDim1
( FullDate datetime,
  MonthNo int,
  MonthNameT Char(3),
  YearNo Int)
Go
 
Insert Into TimeDim1 (FullDate) Values (‘2008-01-01′)
Insert Into TimeDim1 (FullDate) Values (‘2008-02-01′)
Insert Into TimeDim1 (FullDate) Values (‘2008-03-01′)
Insert Into TimeDim1 (FullDate) Values (‘2008-04-01′)
Insert Into TimeDim1 (FullDate) Values (‘2008-05-01′)
Insert Into TimeDim1 (FullDate) Values (‘2008-06-01′)
Go
 
Update TimeDim1
 Set MonthNo = Month(FullDate),
 MonthNameT = Left(DateName(mm,FullDate),3),
 YearNo = Year(FullDate)
Go
 
Create Table FactsWithNull
( FullDate datetime,
 SalesValue int,
 SalesQty int)
Go
 
Insert Into FactsWithNull (FullDate, SalesValue, SalesQty) Values(‘2008-01-01′, 100, 1)
Insert Into FactsWithNull (FullDate, SalesValue, SalesQty) Values(‘2008-02-01′, Null, Null)
Insert Into FactsWithNull (FullDate, SalesValue, SalesQty) Values(‘2008-03-01′, 100, 1)
Insert Into FactsWithNull (FullDate, SalesValue, SalesQty) Values(‘2008-04-01′, 100, 1)
Insert Into FactsWithNull (FullDate, SalesValue, SalesQty) Values(‘2008-05-01′, 100, 1)
Insert Into FactsWithNull (FullDate, SalesValue, SalesQty) Values(‘2008-06-01′, 0, 0)
—————————————–
 
When you run select * on the fact table you will have february 2008 with two null measures and june 2008 with two zero measures.
 
Copy of The fact table
 
Next, you build the simple cube with one time dimension like this.
 
The cube Project
 
 Now it is time to run the test on the two measures in the cube. The nullprocessing property of a measure is placed under the source property in the cube structure pane in BIDS. The settings available are automatic, preserve, error and zeroOrblank. I will test all except for error.
 
NullprocessingProperty
 
 The first case is NullProcessing = Automatic. This is also the deafult. You will see a result like this, below, in ProClarity or the cube browser in BIDS. The Null measures will be converted to zeros. If you add NON EMPTY on columns, february 2008 will still be seen. This is also the outcome of NullProcessing = ZeroOrBlank.
 
ProClarity Null processing automatic
 
The next case is Nullprocessing = Preserve. In this case, Nulls will be kept and presented as empty cells and when you apply NON EMPTY on columns the empty month, february 2008, will be removed.
 
ProClarity Null Preserve
 
Conclusion: I still do not see any reason for entering zero in empty measures because you have setting in SSAS 2005 that can convert null to zero. If you would like to use the NON EMPTY keyword, to remove empty rows or columns, you should keep the nulls with Preserve.
 
Edit: This geeky blog reached 20 000 hits this week. Next time i mention this will be when i reach 50 000 hits.