Thomas.I Microsoft BI & Analytics

A Site Dedicated to General BI Issues and MS BI Issues: Architecture and Technology

Archive for the ‘BI Architecture’ Category

My BI Year 2013

Posted by thomasivarssonmalmo on January 1, 2014

I had too little time to blog as much as I wanted. The one about SSAS evaluation nodes was the one that took me a lot of time to write and it should have a second part with more practical stuff but there was simply no time for that in 2013.

So what happened then? I can mention the word certification.  I took both the 70-463, implementing a data warehouse with SQL Server 2012, and the 70-461, querying MS SQL Server 2012 and I am currently preparing for the 70-462, administering MS SQL Server 2012 databases. The two last ones, from my point of view as a BI consultant, are a waste of time. I am very happy to have passed the 70-463, because it is very useful in my daily work. Parts of the TSQL certification are useful, like 30-40 percent but I would like to see a more BI-focused database engine part with admin and TSQL parts in one exam. Right know both exams are too developer and DBA focused. The goal is to get the MCSA title, since it is mandatory in the company that I work for. It will not make me better as a BI-consultant since I never do cluster installations, Always On installation or set up replication of the database engine. Most of my customers either has outsourced maintenance of hardware and software or would never let me play with their IT-infrastructure. I have colleagues that have these skills and do this work daily.

The parts of BI that have have spent most of my time on, the last ten years, like Reporting Services and BISM MD/Tabular come in the second certification step, to be a “MS Certified Solutions Expert.

Now to the other parts. The year was mostly spent on the ETL or ELT part of BI. I have been involved in name standard document for BI like clening up that the same term applies to different objects or different terms apply to the same object. I was also invloved in discussions with customers about ETL or ELT strategies.

I attended a 5 day long course on the new PDW V2 and that hardware and software solution really can compete on the high data volume scenarios of BI. Do not think about it as only a part of the BIG Data scenarios because it can integrate well with SQL Server environments without the BIG data part. My personal opinion is why you should invest in more than 4 socket CPU system because on the declining return on adding more CPU power. This also questions the Fast Track part of MSBI offerings since PDW V2 gives much better value for money.

I also worked with a POC for a customer that included SQL Server 2012, BISM Tabular, SharePoint 2013 and Excel 2013. The fun part was to load a 340 million record fact table in to  a dedicated tabular server and see that it worked very well and that compression worked to shrink data to 15 percent of the original size.

I met a very smart customer that maintain their own business catalog with product and customer definitions outside of their ERP system. When they upgrade their ERP system they use this business catalog with cleaned product- and customer keys. That is what I call true master data management.

Slowly the interst for more self service is riising here in Sweden.  Since Power Pivot requires quite capable hardware like 64 bit laptops with 8 GB RAM and SSD and Office 2013 that will be an investment for most organizations that requires approval since most laptops are still 32 bit with 2-4 GB RAM. On top of that a SharePoint infrastructure is also needed.  Still the idea that IT provides datasets to users that then use Excel 2013 with Power Pivot and perhaps Power Query is attractive since it will reduce pressure on IT to help with data mash up.

Advertisements

Posted in BI Architecture, BISM TAB & MD, Excel 2013 | Tagged: | 1 Comment »

Are Data Warehouses Dinosaurs?

Posted by thomasivarssonmalmo on April 11, 2011

This is a continous discussion that need to develop new arguments as with the presence of in memory tools like QlickView and PowerPivot.  Even without these tools it is also interesting to discuss what additional values the investment in a Data Warehouse can give over other approaches.

One less succesful data warehouse development project, that I did not participate in but followed through other colleagues, that did work in this project, was about integrating more than 20 different source systems in as many countries. Reporting was done with Excel without any integration of the sources. I guess that end users printed out data in long sheets and then typed that information into Excel. This data warehouse project never involved end users until the project was almost finished. Then the end users were introduced to the data and asked about what reports they would like to have.  End users continued to use Excel and the labour intensive way of making reports because they did not understand the data that was supplied to them. Most of the definitions and cleaning of data was decided by techies and not by the business. This is a true story.

Another issue with data warehouses is that after they have been created they can be hard to change. End users will have to write demands for business requirements. These demands will have to be prioritized and maybe funded if you rely on an external supplier. After that it is requests for tenders and after the decision development can start.  This process can take between 2-6 months.  One way to increase speed here is to have a dedicated BI professionals group that implement change requests but the situation can really be horrible if this function is outsourced.

But these are the bad experiences of data warehouse development that should not hide the great benefits of building this. Remember one thing, though, data warehouse is not the same as business intelligence but can be one important part of that.

A succeful data warehouse development process mostly have strong business requirement forcus of building a consolidated top or centralized view of data in different source systems. Most larger orgaizations have there important business data in more than one system: Financial data, invoice data, delivery /logistics and CRM data mostly reside in different systems but you will need all these sources to build a coherent view of the business. The data that you extract from these source systems frequently have a lot of quality issues or need some assistance to reduce ambiguity. Data volumes can also be large and the schemas of tables overwhelming in complexity. The consolidated view of information is important since it very easy to fall into the spreadmart trap and develop stovepipes or information siloes each with their own structures. That is only a repetition of the mistakes to only use Excel spreadsheets for reporting, building the same isolated structures in a data warehouse.

My conclusion is that a well defined data warehouse that follows the overall consolidated business view is a good start for a PowerPivot model. PowerPivot in itself can extract data from a not to complicated registration system but if you have more than  one source things start to get complicated even for a self service tool.  Source systems responsibles will never allow the PowerPivot model of retrieving all data all over again each time the model needs a data refresh so in this case a data warehouse is a good support for self service tools.

Posted in BI Architecture | Tagged: | 2 Comments »

Post PASS thoughts on BISM

Posted by thomasivarssonmalmo on November 17, 2010

I think that most of you are aware of the announcements during PASS last week when Microsoft presented their new software offerings in the BI area. The reactions was mixed from different great BI bloggers like Chris Webb,  Marco Russo and Teo Lachev and more will come.

BISM(Business Intelligence Semantic Model) is the next server based, stand alone PowerPivot, that will not require SharePoint 2010 server. At this moment it is not totally clear what will happen but let us say that it is the column based relational engine in the Analysis Service sphere that will only talk serverbased DAX with the clients and that you know from PowerPivot. BISM will not have a full understanding of MDX so todays scripts in SSAS will not translate into DAX. It is more of a side by side relation with some limited mutual understanding.

During the next years we will have both the ‘old’ Analysis Services multidimensional engine,SSAS, side by side with the BISM/VertiPaq relation and columns based in memory engine.  SSAS will have features that will take BISM some releases to catch up with that is mostly about complex calculations in MDX combined with dimensions(calculated members in dimensions, script assignements and so on). SSAS will still be here for many years and hopefully it will get improvements during releases.

From an IT-technical perspective this can be bad news but I will focus more on end user requirements. In the best of worlds the multidimensional storage in SSAS could be directly translated to the relational in memory model of BISM and MDX calculations could be translated to DAX but this is not the case.  A decision must be made of what kind of data marts that will benefit most from being in a multidimensional model and a column based, relational and in memory model. This is how I see it.

SSAS has both attributes and dimensions in the so called Unified Dimensional Model or UDM. This model offers great navigational help for end users with natural hierarchy navigation from top levels to leaf levels in parts of the hierarchy. On the other hand, if you want to relate different dimension this is done by nesting the hierarchies on rows or columns in a grid. In SSAS/UDM combining single attributes, and not hierarchies, from different dimensions, is not the strongest feature of that server. It is slow.

Some end users like the top down analysis approach with the birds eye perspective on data but other end users prefer a more flexible approach by combining single attributes from different dimensions and start with analyzing dimension data on leaf levels.

This last group is the one that I recognize daily as being less satisfied with the current versions of SSAS and much of their work would start with flattening SSAS cubes down to leaf level nested dimension levels. I assume that this group of users can be helped with both the current version of PowerPivot and the release of BISM in the next version of SQL Server, DENALI.

BISM and PowerPivot are mostly marketed as being fast with massive amounts of data but I think that the flexible attribute based model is also important.

Posted in BI Architecture | Leave a Comment »

The single version of the truth in PowerPivot

Posted by thomasivarssonmalmo on August 29, 2010

With the arrival of PowerPivot and the concept of self service BI might bring fear among experienced BI developers for the known spreadsheet anarchy of data consolidation and integration within a business. This is founded on the way classic Excel data repositories(that is any spreadsheet) work with the data part, the business rules part(formulas) and the presentation part in the same application. End users can collect data in diifferent ways, apply different filters and create their own business logic in the Excel spreadsheet and at a meeting you end up with endless discussions of what is the most correct Excel version of the truth.
 
How can an organization avoid the situation of their Excel reporting being beyond control and a black box where you put in some data and receive a result that you cannot check if it is correct or wrong? And will PowerPivot make this even worse?
 
Well to be honest, any application, even centrally managed SSAS, can get out of hand. I have seen SSAS cubes with 4000-5000 lines of MDX in a single cube. Add to that business logic in data source views by named queries and named calculations.
 
Also add an ETL system of the data warehouse were we have no standards for documenting how data is changed when it goes from the source system into the data warehouse. These systems mostly have a technical documentation that will not help end users and report creators.
 
All of the issues that I have described appears because of the lack of a blueprint of how information relates in a business and how it should be reported to get a view that can be shared among the whole business.
 
Back to PowerPivot. Todays PowerPivot data collections can become tomorrows centralized view that works if there is a framework for structuring data that is known among end users.  I think that this means that end users should think about their report requirements in terms of centralized dimensions and centralized facts. When I have started SSAS project with customers that do not have SSAS cubes(or cubes from other suppliers) we have started with workshops where we try to write down the dimensions and measures they are using in their business. After the dimensions have been identfied I continue with the hierarchies within each dimension.
 
If you have dimensions that work across spreadsheets, Pivot Tables, PowerPivot models and SSAS cubes you will cause less doubt on what you report. You will also have a better opportunity to consolidate this information across different business processes.
 
Let me bring a simple example of how the idea of this blueprint and thinking can work.
 
Date or Time is a frequent dimension in all reporting. If we discuss a calendar hierarchy in a date dimension that will consist of levels like Date->Month->Quarter->Year. If we add week over date we will have a problem that the entire organization will have to create consensus on. The levels in a week hierarchy can be Date->Week->Year but the rules of how weeks relates to years can be very different across countries. You have the ISO week with years having 52 or 53 weeks depending on when new years day is positioned in as the day in the week and you have the US system. Make a blueprint in the organization of which standard to use and use that standard in all your reports including PowerPivot.
 
You should think in the same way about your organization structure, your products or services. Make a blueprint of these structures and use them in your reports.
 
Actually I have not seen the end users like business controllers having problems with understanding how to build a data framework, with dimensions and fact tables, for reporting since I use their business terms. These discussions do not fit in to a classical IT versus end users conflict since I do not use strange relational diagrams with foreign keys.
 
One of the most interesting features in PowerPivot is that you can import a dimensional model and if your developers have done their job you do not have to care about BI-stuff like starschemas. The data in the dimension tables will be attached to a fact table if the definitions are correct in the backroom database. In PowePivot I can select to import from a relational database and SQL Server and point at the fact table and mark select related tables.
 
 
 
 
 The spreadsheet anarchy can find its way into PowerPivot, like all other uncoordinated efforts to build a report if the end users choose to select the linked table option in PowerPivot. In this case end users can apply different filters and the result can be an information pizza or chaos. Remember that this is not the outcome of the tool but a misuse of the tool.
 
The ideas here are not new but comes from many years reeding of Ralph Kimball’s books on Business Intelligence.
 

Posted in BI Architecture | Leave a Comment »

PowerPivot and BI architecture

Posted by thomasivarssonmalmo on January 31, 2010

PowerPivot is the new Excel 2010 add in and a server application in Sharepoint 2010. What will PowerPivot change in the existing BI architecture or in the everyday life of existing BI-solutions built on the MS-BI platform?
 
 Amir Netz suggested the anology of the freedom of the Blogs compared to the newpapers and tv-channels views of the world for the role of PowerPivot in a BI solution, in a discussion on Linked In. I guess that he means that the end users will be able to build their own analytical subspaces of their organizations information stores. He also agreed on my assumption that PowerPivot will not replace the centralized data warehouse vision.  My personal experience from working in a fairly large company with a centralized data warehouse is also that changes takes far to long time to implement.  The time horizon from a requirement to an implementation of that requirement can take 6-8 months. At the same time the requirement for reporting will change much faster that that and put a lot of pressure on the end users.
 
Before I tested the first CTP of PowerPivot I only had a few demos on YouTube and some texts describing the main features of PowerPivot. One vision that I had at that time was that the end user should be able to import data from different data sources and then move data to appropriate PowerPivot tables. If you have a large text file with dimension information and measures you should be able to split that import table into different dimensions and a fact table but this will probably not be possible in the first release. Still you have some functionality in the first release if you import that information in to Excel spreadsheets and then build linked tables into PowerPivot. The problem is only that you will have to accept the one million record limit of an Excel spreadshet.
 
If you have several datasources like your transaction data in SAP, AS400/DB2, Axapta, your financial data in separate systems, your budgets and forecasts in Cognos planning, CRM data in CRM-systems and your organizational structures in a separate tailor made system, can PowerPivot be of any use in this scenario? I do not think so. If you have a small business and one main data source that can export text files PowerPivot will work. PowerPivot can also connect directly to the transactional data source but then it will assume some knowledge from the end user of what tables that are important. That is seldom the fact.
 
With many different data sources you will never be able to do any useful with only PowerPivot as the tool. With one data source with a lot of tables, you will need great knowledge of that data model to find the relevant tables.
 
My conclusion is that PowerPivot will be most useful as a client on top of a data warehouse or a datamart with cleaned data and centrally defined measures and dimensions. With PowerPivot end users, with knowledge of the starschema data model can bypass the development and change requests of SSAS cubes.  But I think that this scenario assumes a skilled end user that have some knowledge of data models. It will also be possible to combine information from several data marts but this assumes an even more advanced end user.
 
Summary: I do not think that PowerPivot will be more than a new powerful client in a BI-perspective.
 
 Finally, What will be the main benefit of PowerPivot on top of one or several data marts?
 
First: Combining leaf levels of several large dimensions will be really fast with large fact tables. This is the weak spot of the current version of SSAS 2008. Import data from the data mart source of an SSAS cube and avoid doing that directly from a SSAS cube. PowerPivot works with both SSAS 2008 and SSAS 2008 R2 cubes.
 
Second: Another great benefit is that the end users will have an off-line data store within Excel 2010 that is both fast and compress data from the data source.
 
 Third: I have not tested this yet but I hope that PowerPivot will not have the limit of the current SSRS 2008 export of 65 thousand records to Excel 2007 will be gone with SQL Server 2008 R2 and Excel 2010 PowerPivot.
 
What is the main weakness of the first version: The sharing of calculations and business logic that you have in Analysis Services. You will have to add the same calculations over and over since they are limited by the PowerPivot store in each Excel sheet.
 
I will return with my analysis of the integration between the PowerPivot client and the Sharepoint 2010 server application part of PowerPivot.
 
For more information about PowerPivot have a look at Donald Farmer, Chris Webb, Vidas or Marco Russo !

Posted in BI Architecture | Leave a Comment »

A new decade for BI

Posted by thomasivarssonmalmo on December 31, 2009

In early 2000 I was consulting on OLAP Services(SQL Server 7) and ProClarity as a front end.  In third quarter of 2000 SQL Server 2000 and Analysis Services 2000 was released. It was this edition that really took of and brought BI to the masses here in Scandinavia. ProClarity relased their Analytic Server with a central web based thin client report repository and together with AS 2000 this combination had a high market share. Panorama also had a good client solution but was hurt when they sold their innovative client, Nova View, to Cognos that bought that product to keep if off the market.
 
In 2003 I was able to see the new Reporting Services add in for Visual Studio but I think that the broad adoption of SSRS came with the release of SQL Server 2005, SSAS 2005 and SSRS 2005.  SSAS 2005 meant a total redesign of existing AS 2000 cubes and I have never had any regrets about advicing my customer in that direction.
 
After the release of SQL Server 2005 I think most of the innovation slowed down within Microsoft.  Excel 2007 was the first version that was a serious SSAS 2005 client but still with a lot of issues that will not be solved until the release of Excel 2010. ProClarity was bought by MS in 2006 and all development of that nice product was stopped with a promise that the functionality would be part of the first release of Performance Point Server in 2007. Nothing of that happened. I saw that the decomposition tree was part of CTPs of Performance Point, demonstrated at the first MS BI conference in 2007, but was removed in  the final release.
 
Good data visualization tools should be the priority of the MS BI Platform but the development in this area are too slow. In the Perfromance Point Services for Sharepoint 2010 we will finally see the decomposition tree but what will happen to the Performance Map and Perspective in ProClarity and what happened to the 90 degrees software?
 
The most innovative part of MS BI client offerings, after 2005, is the data mining add in for Excel 2007. Using statistics for data analysis is still a complicated area to learn and more time and money should be invested in this area.  Why query a million records if you can find patterns detected for you with much smaller samples?
 
The small DM-team at MS have done a great job but I am not expecting any new functionality in SQL Server 2008 R2.
 
The big news in 2010 is the release of Excel 2010 and PowerPivot.  Excel 2010 will have a lot of built in MDX logic that will remove a large part of server side MDX development. Another great news is the slicers that will make the Excel 2010 Pivot Tables much more interactive. PowerPivot is also good news. I have loaded around 2,1 million records of real data into PowerPivot, and that still takes some time, but as soon as the data is in PowerPivot opening a workbook with that amount of data takes a few seconds. Compression from a textfile of 800 MB to 120 MB in Power Pivot and really quick response time in the Pivot Tables looks very promising.
 
SharePoint 2010 also looks promising but I have my doubts regarding the adoption time of a large and complex platform like that. This is very hard to explain to MS people. I have never had any problems with installing ProClarity Analytic Server or SharePoint services but SharePoint is much more complicated. Customers will also have to think more than once before letting an application that might require a lot of extra consulting hours on their already restricted budgets.
 
Upgrades of Office and Sharepoint always takes 2-3 years after the release.
 
In the next decade I will expect more integration of database technology and index techniques. If you have ever tried the English Query application, part of SQL Server 2000, you will know what I mean. EQ was far away of what I am thinking about here but the vision of writing a semantic query and get a result from SSAS or SQL Server is not.
 
Finally I would like to thank the SSAS development team that always have been interested in hearing about our opinions outside of MS. A special thanks to Mosha that left that group this year and his great blogging.
 
Happy new year everyone!

Posted in BI Architecture | Tagged: | Leave a Comment »

Visualize the ETL flow

Posted by thomasivarssonmalmo on November 8, 2009

There are several approaches of how to document the flow of data from the source systems into the data warehouse and data marts. 
 
This blog post is about finding a good enough starting point for documenting the data flow from the data warehouse to the data marts. How much detail should you be showing? Should you start by tracking each attribute on the way from the data warehouse to the SSAS cube? What tool should you be using? Should you document the code and translate that into business descriptions that end users can understand?
 
 Well it is easy to drown in an approach that is to big and will take to much time before delivering any useful. What I describe is an approch that is not the final approach but is god enough as a start but will able to improve over time into more detail.
 
 Let us start with the tool. I will use MS Visio 2010 that will shortly be provided as a public CTP but the templates are the same as in Visio 2007. You will be able to do the same in Visio 2007.
 
First, this is the template that I choosed not to use.
 
 
If you look at the options of objects to use in the left pane you have a great toolbox. The problem is that it will require a lot of time to learn about these options and teach the viewers of the documentation what is presented. The number of objects are to many.
 
 The other approach is to use the Gane-Sarson template in Visio.
 
 
 What I am trying to achive here is to document what data stores and what code(SSIS packages and stored procedures) that are involved in the data flow from the data warehouse to the data marts. I am actually looking after redundant data transformations and a more efficient way of bulding the marts since there is a stove pipe problem behind this. To many data marts have been built to solve isolated report requirements.
 
Procedures and SSIS packages are classified as processes since they do something with data. Data stores are tables, views and cubes. I am aware of that you can discuss if a view or a SSAS cube is an interface. I think that a view that change data is a process and not an interface.  For the moment I have classified cubes as data stores. I mainly use data store and process as the objects and try to fit in cubes, views, stored procedures and SSIS packages into that framework.
 
I have tried other approaches like documenting each measure and dimension attribute from the cube down to the data ware house. The problem is that it will not bring quick enough results to understanding the system. You will end up with large Excel lists with hundreds of columns and a collections of code that are to hard to understand even for BI-professionals.
 
The next step is to add business terms descriptions from the code to each process. I will discuss that in another blog post.  When you have this general classificatin available you can start working with the details in each process.

Posted in BI Architecture | Leave a Comment »

Thoughts about the BI-conference without being there

Posted by thomasivarssonmalmo on October 9, 2008

I was supposed to go the BI-conference but budget cuts cancelled this. Still I have followed the parts that have been published to the web and read several blogs about what has happened.
 
One of my primary concerns about the UDM concept, that has been with us since SSAS 2005, is that you will get very bad response time with MDX queries that combine leaf level members from different dimension.  Combine leaf level customers and leaf level products on rows and the time dimension with the date level on columns, and take a long coffee brake. 
 
But this is a common business requirement, to see dimensions combined at leaf levels. From this point of view I can see what empty business requirement space MS can fill with project Gemini.  It is pivot tables on steroids.
 
If you have missed this it is about being able to build cubes from 20-100 million records by an Excel 2007 add in, that uses a server in the background. Both Mosha, Chris Webb and many other important bloggers have more information about this.
 
The fact that nothing new was announced about Performance Point v2 nore the next Office release, that would be one of the main reasons for me to attend, made me less disappointed of not attending this year. 
 
My guess is that Office and Performance Point is too tightly connected to Sharepoint Server releases, in the MS release cycles.
 
Anyway , I would have liked to see Mosha’s MDX studio presentation and meet all you other bloggers at the conference. I miss that.
 
Thanks for reporting Dan English, Mosha and Marco Russo.

Posted in BI Architecture | 2 Comments »

You should pay more attention to BI methods

Posted by thomasivarssonmalmo on October 3, 2008

Marco and Alberto have written two chapters about BI methods that you should download and read.

It is here and it is important for any serious BI consultant to know about best practices, design patterns and project methods.

To much can go wrong if you think that general application design principles will work in the BI area.

Marco and Alberto. I have downloaded the first chapter and I promise to add comments soon.

Another thing. I am very happy about being a MVP on SQL. Got the email from MS on October the 1:st. This means one thing, more hard work. 

Most important, this means that people found some useful information on this blog and in the SSAS newsgroup. 

Posted in BI Architecture | 6 Comments »

One master warned us ten years ago

Posted by thomasivarssonmalmo on August 3, 2008

There is one article (Kimball: Bringing Up Super Marts Jan 1998)that I frequently read and mail the link to colleagues and friends in the BI-business.
If they change the link have a look at www.kimballgroup.com Articles and Papers.
 
It is written ten years ago when I entered the BI business and I still see these mistakes being repeated today.
 
Do not try to build a complete solution for all your BI requirements in one large project.
 
Next,  the stovepipe approach or building information silos is still to frequent together with the monolitic approach. Stovepipes will only copy the Excel anarchy approach by centralizing the same problem, that people go to meetings with different data about the same business. The monolitic approach will be to slow on delivering anything useful.
 
If a project is built according to one of these approaches it will spill over to your cubes and force you to create MDX code to support structures that were not good enough from the start.
 
The third point of that article is to distinguish between transaction cubes and snapshot cubes. I have been part of projects were we tried to build snapshot truth on top of transactions and had a cube with thousand of lines of MDX on top of transaction fact table. This is a frequent issue when you are trying to get budgets, forecasts and transaction views of data to fit in to the same fact table.
Do not chew the elephant in one bite!
 

Posted in BI Architecture | 1 Comment »