Thomas.I Microsoft BI & Analytics

A Site Dedicated to General BI Issues and MS BI Issues: Architecture and Technology

SQL Server 2012 Books

Posted by thomasivarssonmalmo on December 23, 2011

With a big release like SQL Server 2012 it can be expected to see some new book titles on the shelf next year.

Here is the list that I have found so far:

The SQL Server RDBMS

Microsoft SQL Server 2012 High Performance T-SQL Using Windows Functions(MS Press 30 April 2012):  SQL Server 2012 TSQL have some important additions to this programming language like being able to navigate to previous and next records and running sums, that we in the MDX community know quite well. Itzik Ben-Gan is the writer and that tells me that it is a good book.

SQL Server 2012 Query Performance Tuning(APress 30 April 2012): This is always an interesting subject and usually is a chapter in TSQL books. No details have been revealed yet about the content.

Analysis Services

Professional Microsoft SQL Server 2012 Analysis Services with MDX(WROX 30 April 2012). The title hides the fact that this book will cover both BISM Multidimensional with MDX and BISM Tabular with DAX. 

Applied Microsoft SQL Server 2012 Analysis Services:Tabular Modelling(PROLOGICA, March 2012). Teo Lachev is the author of this book that will describe the self service concept in BISM, how to build applications in Excel and how to publish in SharePoint 2010. There will also be chapters dedicated to BI professionals.

Marco Russo, Alberto Ferrari and Chris Webb are working on a book about BISM tabular. I will write more when I have the details. The Title is “Microsoft SQL Server 2012 Analysis Services: The BISM Tabular Model”

Reporting Services

Professional Microsoft SQL Server 2012 Reporting Services(WROX 30 April 2012).  I think that Reporting Services is the MSBI product with most new features added in each release so it is usually mandatory to buy a new book. This book release will also cover Power View.

Visualizing Data with Power View(McGraw-Hill Osborne 1 June 2012) This is a book about the new MS SharePoint report buliding tool for BISM Tabular models.  

Integration Services

Professional Microsoft SQL Server 2012 Integration  Services(WROX 11 April 2012). Integration Services in SQL Server 2012 is a major release so this book is also important. I am not sure if it will cover Data Quality Services and Master Data Management as well.

General MS SQL Server 2012 BI Books

Programming SQL Server 2012 (MS Press, 31 May 2012). I would have second thoughts about the title of this book. It is not about programming in TSQL which is a book with almost the same title in the SQL Server 2008 version. I can see some MSBI content like integration of data mining into applications, building and maintaining data warehouses and use Excel to build front ends for OLAP cubes. I will probably move this book to a general MS BI 2012 section in an update.

Introducing Microsoft SQL Server 2012(O’REILLY/MS Press, March 2012). Half of the chapters in this book will cover MS BI Topics like SSIS, DQS, PowerPivot, SSAS and MDX as well as SSRS.

 

These are the titles that I have found on Amazon this far. I will update this list as soon as I find more interesting titles.

Happy Christmas Everybody!

Advertisements

Posted in Books, No Category | Tagged: | 3 Comments »

When Analysis Services Wakes Up Part Two

Posted by thomasivarssonmalmo on December 5, 2011

This is the second part of a series of introductions to performance tuning of Analysis Services. The main tool used in this blog post is SQL Server Profiler that we will use to see what happens when SSAS starts up after a restart of the service and when the first user connects to SSAS.

Like I showed in the first part was the loading of files that starts when you connect to SSAS with Management Studio (MMS) but before you start a new MDX query. This is shown in the File Load Events in the picture below.

clip_image002[6]

If the first user connection to SSAS by a client tool like Excel a lot of more events will fire off so in order to digest different groups of events my advice is to start with Management Studio.

What I did not mention in the previous post was that this is Meta data being loaded and it will always be initialized by the first user connection to SSAS after a service restart. Thanks to Akshai Mirchandani in the SSAS team for clarifying this.

This will also happen after a cube has been changed so a new Meta data file version needs to be loaded. You can try that by running a full process of all cubes and dimensions in a database-

This file loading process will load Meta data for all cubes databases that you have deployed on the SSAS server so the number of files depends on that. If you have a lot of cube databases deployed on the SSAS server the loading of this Meta data can take significant time.

The next step is to continue from Management Studio and start a new MDX query that will trigger some new events. In order to see this you will need to add the “Execute MDX Script Begin”- and End events to the Profiler Trace. The scripts are loaded into memory from the MDX Script file .These events will not be triggered when you connect to SSAS from MMS like we did initially. It is only when you start a new MDX query in MMS that these events will be triggered.

Actually you will see the file with the MDX scripts being loaded first.

\\?\C:\Program Files\Microsoft SQL Server\MSAS10_50.MSSQLSERVER\OLAP\Data\Adventure Works DW 2008R2.0.db\Adventure Works.0.cub\MdxScript.10.scr.xml

clip_image004[6]

Execute MDX Script Begin/End will also happen each time you clear the Cache. If you have Query dimension and Calculate Non Empty events activated in the trace they will appear together with the MDX Scrip Begin Events but this depends on the MDX Script events being evaluated internally(according to the MS White Pape:” Identifying and resolving MDX query performance bottlenecks in SQL Server 2005 Analysis Services”).

If the first connection to SSAS is from a tool like Excel a connection to a specific cube database and cube have been defined first and this means that not all meta data files for all cube databases and cubes will be opened. Management Studio will open up all files that are related to all databases under the SSAS server.

Progress Report Begin Events after Excel 2010 is connected to SSAS 2008 R2.

clip_image006[6]

When all files are loaded together with the MDX script let us clear the cache by running this command.

clip_image008[10]

After running the XMLA command Clear Cache we call the cube again by running this MDX query.

clip_image009[10]

That will trigger that the MDX script will be loaded again. This is a sample of the Profiler events being triggered.

clip_image011[6]

The Windows O/S has a file system cache that SSAS can benefit from by not having to read files from the I/O system but from memory which is must faster. The file system cache is never cleared with the XMLA Clear Cache command.

If I clear the file system cache with the SSAS stored procedure without a restart of the SSAS service, the clearing of the file system cache will not unload the MDX scripts from memory. The Execute MDX script will not reappear.

clip_image012[10]

clip_image014[6]

The loading of the meta data files that I mentioned first in this blog post are not affected by the clear file system cache stored procedure.

You download this stored procedure from www.codeplex.com and install it on your SSAS Server as a new assembly. I have my doubts on if you will be allowed to install it on a production system so I recommend using it on your own development sandbox.

Posted in Analysis Services SSAS | Tagged: | Leave a Comment »

When Analysis Services 2008 R2 Wakes Up

Posted by thomasivarssonmalmo on November 13, 2011

This is the start of a small series about SSAS performance scenarios that might help in everyday performance discussions with users and customers.

The first post is about why SSAS initially can be perceived as slow after a restart of the SSAS service and when the first user connects after the restart. It does not matter if the first users is a SSAS administrator or a user connection with Excel. I have used the Adventure Works 2008 R2 demo cube project but I think that you will see the same behaviour on versions from SSAS 2005.

This post only use the SQL Server Profiler performance tool and Management Studio or Excel 2010 to connect to the Adventure Works cube after a restart of the SSAS service.

Go to SQL Server Profiler under Performance Tools and start a new trace with a connection to SSAS. Use the standard template and save it to a file to get a first view about what you can do with Profiler. On the second tab I have selected the following events in Events Selection.

 

Profiler File Load Events

In a future post I will continue with more events in Profiler after a restart of the service but this can be enough for this scenario. Actually you will only have to select the File Load Begin and the File Load End events.

Now you can stop the SQL Server Profiler, if it is running, and push the clear trace window button in Profiler. Restart the SSAS service in SQL Server Configuration Manager and when the service is restarted you can start the Profiler with the trace that you have created. When both the SSAS Service and Profiler is started you can connect to SSAS with a client like Management Studio or Excel.

In Profiler you will see the following file load events, in the picture below, that will only show up after a restart of the SSAS service. This are the events that can make you perceive SSAS as slow during the first connection.

SSAS File Loading Profiler

If you play with clear cache XMLA commands or clearing the file system cache the file load events will never show up if you do this on the same cube database. When you connect with XMLA in Management Studio all database files on the server will be loaded from the SSAS files. With and MDX query it is only the target database files that will be loaded.

What can you do to improve performance of the SSAS file loading? Buy a quick I/O system(disks). With unlimited budget I will recommend solid state disks that will give you quite amazing performnce after a restart.

In a coming blog post we will have a look at more events after a restart of the service and compare that to what happens when you clear the SSAS cache and the file system cache.

Posted in Analysis Services SSAS, No Category | Tagged: | 1 Comment »

Tomislavs MDX book

Posted by thomasivarssonmalmo on October 23, 2011

Tomislav appeared on the SSAS forum around 2007, if I am not totally wrong, and he soon became an authority and seem to have impressed even Mosha. Since he worked  with the client Softpro Cube player it was obvious that this guy knew about MDX inside and out.

Now he has written a book about MDX that I can recommend. Actually a few years ago I thought that we would only have two MDX books, since it is not a block buster in its topic. I like Tomislavs book since it is focused on the practical analytical challanges and problems and show how to solve that in 480 pages. This book is obviously the main MDX point of reference for all SSAS developers. It is not a book about the SSAS engine internals and performance tuning of MDX.

You can buy it from here: https://www.amazon.co.uk/Microsoft-Server-Analysis-Services-Cookbook/dp/1849681309/ref=sr_1_3?ie=UTF8&qid=1319373673&sr=8-3

Posted in MDX for everyone | Leave a Comment »

Reflections on the new version of the SSAS Performance Guide

Posted by thomasivarssonmalmo on October 14, 2011

It is here and you should read it. I know about the real world scenarios of fixed price projects and set up something that is good enough. In your own professional struggle to learn about SSAS internals and design trade offs I would recommend any BI professional to read it. Even if you do not build Terabyte cubes like the SQL Cat Team does the recommendations in this white paper are important for all SSAS implementations.

In the SSAS 2008 R2 version I like the following.

It is well written and clear and not trying to force a lot of of abstract technical babble into your mind. It is actually the best Performance Guide I have read. The hard core geek stuff is instead in the SSAS Operations Guide . That is also an important part but by separating design and operations guide lines we have two more focused parts on SSAS development.

The most important line is this: “Achieving a balance between user freedom and scalable design will determine the succes of a cube”. Too much freedom and you will have cubes that do not work from a performance objective and too much performance requirements will make your users unhappy.

The dimension design part have improved a lot. Read all of that because it is about the design issues I meet daily. The graphics for the many design choices you will have to make are very valuable additions to this white paper.

What I would like to see in a future release is a specific document about MDX performance considerations. It is part of this document but the topics regarding MDX are numerous and it is impossible to give answers to that in this document.

Nice work SQL CAT and the reviewers.

Posted in Analysis Services SSAS | Leave a Comment »

SQL Server DENALI Short BI Guide

Posted by thomasivarssonmalmo on September 3, 2011

A lot on my plate for the moment so I had limited time to blog but since I am back in consulting expect to see more posts here after the end of September.

SQL Server Denali is a big release from the BI perspective and I will add my own reflections here but Denali has been commented by al lot of good bI thinkers that I have links to ón my site.

First I think you will here the abbreviation BISM(Business Intelligence Semantic Model) as many times as you have heard about UDM(Unified Dimensional Model) in the past. Our old multidimensional friend Analysis Services, alias UDM, will be called BISM Multidimensional when Denali ships. If you read the product team blog it will still be called Analysis Services and PowerPivot Team blog.

What about the news for BISM Multidimensional in Denali? Not much regarding new features except for that you can select to bypass the 4GB string store limit for a cube. This probably  comes with a price as usual in the trade off between performance and features as usual in this business.  There is no product that comes with good performance  and a lot of features even if sales people claim that. The big news can be that this is still the best tool for building really large cubes on top of big data mart. It is not dead and you cannot exchange it for other MS BI servers when you need to construct complex business logic or need to browse large cubes.

BISM Tabular is the Vertipaq engine or column store mode for Analysis Services. When you install Analysis Services in Denali you will have to choose either modes, tabular or multidimensional, in the first installation. To add the second mode on the same server you will have to run the installation a second time. BISM tabular is really a server version of PowerPivot for SharePoint outside of SharePoint. You do not need SharePoint to install BISM Tabular or Analysis Services in tabular mode.

Some questions arise if you try to install BISM tabular and BISM multidimensional on the same box. BISM tabular is totally memory based so if you need both I would not run that on the same box as BISM Multidimensional or the SQL Server relational engine(RDBMS). In fact I think that you need several separate servers to support SQL Server RDBMS, BISM multidimensional, BISM tabular and SharePoint 2010(with Reporting Services). This will mean some additional decisions regarding in license cost and hardware budgets. It will also add additional complexities regarding infrastructure setups with Kerberos knowledge being mandatory.

Why would you need BISM tabular? First it is more flexible than BISM multidimensional. You can deploy Excel 2010 PowerPivot models to BISM tabular so users(or non consultants) with a good knowledge of DAX can build their models in Excel and send that to a an IT departement for further work or deployment. I must admit that it is far quicker for me, as a professional, to build models in Excel 2010 PowerPivot version 2 than doing the same in in BISM multidimensional. In the BIDS for tabular projects the UI experience is very near to building models directly in Excel 2010. I would recommend you to use BISM tabular for dimensions models that requires more changes within shorter time frames than what BISM Multidimensional can offer.

BISM tabular models can be queried in Reporting Services with MDX. That can be  a way to bypass the SharePoint 2010 requirement and integrate Reporting Services in native mode to other portals. Excel 2010 is also supported as a client and Excel also use MDX to query BISM tabular models.

The second pro of using BISM tabular is that it is the only supported data source for Crescent, the new reporting tool in Denali, focused on non IT professionals to build quite complex reports. Crescent use DAX, the query and expression laguage of BISM tabular. This is a killer application for non IT professionals and the first serious attempt from Microsoft to buld a user friendly reporting tool. The backside is the SharePoint 2010 Server requirement for Crescent.

DAX is now a limited query language and not only an expression language but without the capacities of MDX. It is like a midget MDX that use functions for the query part.

Reporting Servives in Denali and only in SharePoint 2010 integrated mode also have a new event subscription model for KPI changes that the users, themselves, can configure.

In TSQL and the relational engine new Windows functions are added for building cumulative values and having the notion of previous record like we had in MDX since the start. Reporting in TSQL on the relational source will have more capable tools in the Denali release.

PowerPivot in Excel comes with some great features that is also part of BISM tabular. You can build natural hierarchies in the dimensions tables. There is an ER diagram view where you can relate tables by a drag and drop columns UI, build hierarchies and see DAX calculations. DAX measures can be formated. Another important addition is that some DAX functions have been simplified. As an example is the native DistinctCount() function. No more CountRows().

The relational engine in Denali also supports the Vertipaq column store model with special indexes. Vertipaq or the column store is managed by the relational engine query optimizer so adding these indexes to a relational data mart can boost performance of TSQL based reports. I guess that this will make the ROLAP mode of BISM multidimensional even more interesting in the near future.

SSIS will be a huge release in Denali but I will have to refer to other good blogs for comments about the improvement in this product. I use SSIS not only for ETL but also for building recordsets for import into Excel when you do not have the time to wait 30 minutes for SSRS to build a report with 50 thousand records.

Posted in No Category | 1 Comment »

Twenty signs that you have hired the wrong SSAS consultant

Posted by thomasivarssonmalmo on June 15, 2011

These are some of my observations during my years with Analysis Services since OLAP Services in 1999.

All dimensions have parent child hierarchies. There is absolutely no excuse for this. Use parent-child hierarchies only when you have no other option.

The data source view has named queries with a lot of transformations, name changes of columns, derived tables, sub queries and unions in the same mess.  The ETL system has moved in to the cubes and the source data warehouse (If there is one) is not properly structured. It is a clear indication of that the back end data warehouse is a mess.

The measure names are still not right so all measure group measures have visibility = false and are replaced by calculated measures.

The dimension sources consist of 5 to 15 views joined together and have several layers of views on top of each other, the last with different naming standards. This makes it very hard to follow the ETL flow back to the source tables.

The dimension keys are text based, not integers.

All dimensions have no attribute relations in the user or natural hierarchies.

The natural hierarchies in the source dimension tables or views have undetected many-to many relations that will make the distribution of data across dimension members random. This can happen when you have hidden relations in the source table between attributes that are not primary keys.

The cube dimensions have a lot of report hierarchies, which is a hierarchy where the attributes do not have a one to many relations from the top level to the subordinate attributes. Color and the size of a product is an example.

The cube has no aggregations.

The cube consist of more than 15 dimensions in a single measure group and many of the dimensions consist mainly of single attributes with 1-10 members and no hierarchies.

The cube has 50-200 measures in a single measure group. Sometimes dimensions moves in to measures like actual, budget and forecast measures.

The cube has 5000 rows of MDX code when you copy and paste it in to word and use 8 as the font size. To be fair you can be forced to do this by business requirements but you should warn about it from a maintenance perspective. It is very hard to debug.

Measures have null for many dimensions. This is a clear sign that the data model behind the cube is not properly designed.

The cube has no check for division by zero errors in calculated measures so users will see ugly error messages in the table cells.

The consultants do not know what UAT is, has never seen a test protocol and do not know how to do tests while developing the cube.

The consultant thinks that Kimball and Inmon are two soccerteams.

The consultants do not know how to write documentation for a cube except for dropping all the MDX scripts in tables in word.

Dimensions key errors, while processing have been set to ignore errors and discard records in a production system. This is a sign of no handling of orphan dimension foreign keys in the fact table. SSAS has a great feature to take care of this with missing dimension keys.  With this approach the cube will have a subset of  the same data as in the source table.

The cube has 4-5 customer dimensions, not one customer dimension with 4-5 natural hierarchies. This is a migration from AS 2000 without a review of the cube model despite all the changes that came in SSAS 2005 and later.

There are no partitions in the cube when the fact table has 50-100 million records.

Posted in BI War Stories | 4 Comments »

Performance and Query tools for SSAS

Posted by thomasivarssonmalmo on May 6, 2011

SSAS consultants and management experts are starved with good tools for analyzing MDX queries. We have nothing. Please vote on connect to make a change to this on this link.

This is VERY IMPORTANT.

Posted in No Category | Leave a Comment »

Useful DAX Starter Functions and Expressions

Posted by thomasivarssonmalmo on April 25, 2011

I have been a little slow with the expression language of PowerPivot DAX mainly because I felt it a little hard to understand. Maybe this problem comes from my mind being occupied by MDX, TSQL and the expression language in Reporting Services.

Right now I have som starter functions that can help with getting to know the DAX syntax better and perhaps answer some common business questions that are not to complicated. All my examples comes from the Adventure Works star schema model that you need to import into PowerPivot first.

Start with doing your calculations on top of a Pivot Table with PowerPivot tables as the source. Calculations in the PowerPivot tables will only make your more confused. You find this editor under the PowerPivot tab in the Excel ribbon under new measure. Make sure that it is the fact table that is selected as the table name at the top.

New DAX Measure

DAX Aggregation formulas

Here I am thinking about the aggregation formulas that you get when you drop a measure in the values box in the Pivot Table design tool.  You do not need the editor above to create these standard aggregation formulas.

DAX Aggregation

The formula can be seen above the OK button in the picture above. If you drop a measure you get: SUM(FactOnlineSales[SalesAmount]). Wrapped in the SUM function is the table name combined with the column name. If you drop a text column in the values box you will get the COUNTA function instead which makes sense since you cannot SUM text values. This is the COUNTA example: COUNTA(FactOnlineSales[SalesOrderNumber]). If you try to change this to COUNT you will get this error message: “The function COUNT takes an argument that evaluates to numbers or dates and cannot work with values of type String.”

The beauty of DAX: Distinct Count

Here comes a new taste of DAX where we need to combine to DAX functions to get the result we want. Distinct Count is hard to set up in Analysis Services beacause of dfferent technical requirement of that model. In PowerPivot and DAX we have this available by using two functions that are not too complicated.  Remember that end users often confuse Count and Distinct Count.

DistinctCount

The syntax is as this: =COUNTROWS(DISTINCT(FactInternetSales[CustomerKey])).

First it is the CountRows function and with the Distinct function inside of it. In the Distinct function we point to the dimension key that we are interested of doing a distinct count on. In this case it is the CustomerKey in the fact table. ProductKey or any other foreign key will work in the same way.

DAX Time Calculations

It is important that your PowerPivot model has complete years regarding dates in order to avoid strange results. The Adventure Works star schema model has this limitation.

First is the accumulated value of a measure within a year frequently called YTD(Year to Date). Here we combine the outer TotalYTD function with the SUM function like this:

=TOTALYTD(SUM(FactOnlineSales[SalesQuantity]),DimDate[Datekey])

The example is from the Contoso Sales sample database. DimDate is the date dimension table and the DateKey is a date column. The fact table name here is FactOnlineSales and SalesQuantity is the measure in that fact table.

The result looks like this in Excel 2010.

TotalYTD

Ratios in DAX

Ratios introduce a new generic DAX function called Calculate and a little different thinking than in the previous examples. If we want to build a ratio in DAX with the measure SalesAmount we need the nominator, that is the current aggregated value in the Pivot Table and divide that with the denominator or total of SalesAmount that is always the same.

In DAX and with the Contoso sales example database this can be built in this way.

=SUM(FactOnlineSales[SalesAmount])/CALCULATE(SUM(FactOnlineSales[SalesAmount]),ALL(FactOnlineSales))

The result can be seen below.

Ratio DAX

Recommended reading

There is one book that I can strongly recommend if you want to learn more about DAX and PowerPivot models and that is Marco Russo’s and Alberto Ferrari´s “PowerPivot for Excel 2010 Give Your Data Meaning. More information can be found on this link.

http://www.powerpivotworkshop.com/

Posted in PowerPivot and DAX | Leave a Comment »

Are Data Warehouses Dinosaurs?

Posted by thomasivarssonmalmo on April 11, 2011

This is a continous discussion that need to develop new arguments as with the presence of in memory tools like QlickView and PowerPivot.  Even without these tools it is also interesting to discuss what additional values the investment in a Data Warehouse can give over other approaches.

One less succesful data warehouse development project, that I did not participate in but followed through other colleagues, that did work in this project, was about integrating more than 20 different source systems in as many countries. Reporting was done with Excel without any integration of the sources. I guess that end users printed out data in long sheets and then typed that information into Excel. This data warehouse project never involved end users until the project was almost finished. Then the end users were introduced to the data and asked about what reports they would like to have.  End users continued to use Excel and the labour intensive way of making reports because they did not understand the data that was supplied to them. Most of the definitions and cleaning of data was decided by techies and not by the business. This is a true story.

Another issue with data warehouses is that after they have been created they can be hard to change. End users will have to write demands for business requirements. These demands will have to be prioritized and maybe funded if you rely on an external supplier. After that it is requests for tenders and after the decision development can start.  This process can take between 2-6 months.  One way to increase speed here is to have a dedicated BI professionals group that implement change requests but the situation can really be horrible if this function is outsourced.

But these are the bad experiences of data warehouse development that should not hide the great benefits of building this. Remember one thing, though, data warehouse is not the same as business intelligence but can be one important part of that.

A succeful data warehouse development process mostly have strong business requirement forcus of building a consolidated top or centralized view of data in different source systems. Most larger orgaizations have there important business data in more than one system: Financial data, invoice data, delivery /logistics and CRM data mostly reside in different systems but you will need all these sources to build a coherent view of the business. The data that you extract from these source systems frequently have a lot of quality issues or need some assistance to reduce ambiguity. Data volumes can also be large and the schemas of tables overwhelming in complexity. The consolidated view of information is important since it very easy to fall into the spreadmart trap and develop stovepipes or information siloes each with their own structures. That is only a repetition of the mistakes to only use Excel spreadsheets for reporting, building the same isolated structures in a data warehouse.

My conclusion is that a well defined data warehouse that follows the overall consolidated business view is a good start for a PowerPivot model. PowerPivot in itself can extract data from a not to complicated registration system but if you have more than  one source things start to get complicated even for a self service tool.  Source systems responsibles will never allow the PowerPivot model of retrieving all data all over again each time the model needs a data refresh so in this case a data warehouse is a good support for self service tools.

Posted in BI Architecture | Tagged: | 2 Comments »