Thomas.I Microsoft BI & Analytics

A Site Dedicated to General BI Issues and MS BI Issues: Architecture and Technology

Archive for the ‘PowerPivot and DAX’ Category

Update Adventure Works with views

Posted by thomasivarssonmalmo on March 16, 2015

Here are two scripts to create two views in the Adventure Works DW database so that you can play around with current dates instead of having to go back several years.

First the date view:

Create View dbo.DatesPlus5Y

AS

(

Select

DATEADD(YEAR,5,FullDateAlternateKeY) as FulldateAlternateKey5Y,

DATEDIFF(MONTH,DATEADD(YEAR,5,FullDateAlternateKeY),GetDate()) as MonthSequence,

DateName(m,DATEADD(YEAR,5,FullDateAlternateKeY)) As MontName,

DatePart(m,DATEADD(YEAR,5,FullDateAlternateKeY)) As MontNbr,

Year(DATEADD(YEAR,5,FullDateAlternateKeY)) as CalendarYear,

Year(DATEADD(YEAR,5,FullDateAlternateKeY))*100 + Month(DATEADD(YEAR,5,FullDateAlternateKeY)) As YearMonth,

Year(DATEADD(YEAR,5,FullDateAlternateKeY))*100 + Datepart(qq,DATEADD(YEAR,5,FullDateAlternateKeY)) As YearQuarter,

Case When DATEADD(YEAR,5,FullDateAlternateKeY) = Convert(varchar(8),GetDate(),112) Then ‘Y’ Else ‘N’ End As CurrentDate,

Case When DATEDIFF(MONTH,DATEADD(YEAR,5,FullDateAlternateKeY),GetDate()) = 0 Then ‘Y’ Else ‘N’ End as CurrentMonth,

Case When DATEDIFF(MONTH,DATEADD(YEAR,5,FullDateAlternateKeY) ,GetDate())between 0 and 2 Then ‘Y’ Else ‘N’ End as CurrentThreeMonths,

Case When DATEDIFF(MONTH,DATEADD(YEAR,5,FullDateAlternateKeY) ,GetDate())between 0 and 5 Then ‘Y’ Else ‘N’ End as CurrentSixMonths,

Case When DATEDIFF(MONTH,DATEADD(YEAR,5,FullDateAlternateKeY) ,GetDate())between 0 and 11 Then ‘Y’ Else ‘N’ End as Current12Months,

Case When DATEDIFF(MONTH,DATEADD(YEAR,5,FullDateAlternateKeY) ,GetDate())between 0 and 23 Then ‘Y’ Else ‘N’ End as Current24Months,

Case When DATEDIFF(MONTH,DATEADD(YEAR,5,FullDateAlternateKeY) ,GetDate())between 11 and 23 Then ‘Y’ Else ‘N’ End as Current12MonthsLY

From [dbo].[DimDate])

 

And then a fact table view. Feel free to add the measures you need. I have added flags for current date, month , year and so on, to replace named sets that do not work in n Tabular and Power View DAXMD.

Create View dbo.FactInternetSalesUpdate

AS

(

Select

CustomerKey,

ProductKey,

DATEADD(YEAR,7,OrderDate) As OrderDate,

DATEADD(YEAR,7,ShipDate) As ShipDate,

DATEADD(YEAR,7,DueDate) As DueDate,

OrderQuantity

From [dbo].[FactInternetSales])

The MD cube or the Tabular model need a Daily process so that the current flags will be accurate. Thanks to MVP Peter Larsson.

Advertisements

Posted in BISM TAB & MD, Power View, PowerPivot and DAX | Tagged: | Leave a Comment »

Ancient Monument Spotting with Excel 2013 Data Explorer and Power View

Posted by thomasivarssonmalmo on April 14, 2013

While I am not working with BI or learning new technical stuff I read about archeology in Scandinavia and especially about what can be found in the region of Sweden where I live. One idea of mine is to be able to compare differnt types of acheological findings from different time periods and see if they appear on the same geographical spot or not. Can Excel 2013 help me with this? Yes it is possible with the help of Excel Data Explorer, currently in beta and PowerPivot and Power View.

First I need to show you the geographical spot so that you can relate to the area. I have highlighted one type of ancient monumnets in the Bing map that also shows the final result after this exercise. It is from Trelleborg municipality in south of Sweden since I have found a really good dataset from that part with a list of ancient monuments(stone age, bronze age and iron age). Most of these monuments are graves and burial mounds.

 

So where are we

The dataset can be found here on Wikipedia: http://sv.wikipedia.org/wiki/Lista_%C3%B6ver_fasta_fornminnen_i_Trelleborgs_kommun

Below you can see how this Wiki-page looks like. Name is the unique registered number for each ancient monument. “Lämningstyp” is the classification like rune stones, burial mounds. “Historisk Indelning” is the actual geographic spot of the monument. “Läge” is the latitude and longitude of the monument and the key to map this information to the Bing Map in Power View.

The Data source

I started with launching Excel 2013 with the Data Explorer preview and selected from web as the data source. I simply added the url above and Data Explorer started the import. As you can see I got several datasets for different sub areas in Trelleborg. Here is an issue I have not solved yet and that is if I am able to import all data sets with one query by setting a parameter. With my approach I am able to show other steps in the cleaning and consolidation process with Data Explorer.

 

First Query

The cleaning process is straight forward. I have right clicked the columns I do not need and selected hide. I have also changed the data type of the Id-nr, a not important column, to text with the same righ click and quick menu. Finally I have added an Index column not for any specific reason that being able to run a count aggregation on that column. The final query result looks like this with the steps shown to the right.

cleaning steps

I have created one query for each different sub area within Trelleborg. With all queries defined I have used the Append button to add the sub areas to the same result set.

 

The appending process

This Append process is not the best way to run a union over all queries. It is possible to do this much quicker by changing the append query slightly. You simply add the query names to the function like you can see below.

 

Extend the query

 

With this work done it is time to load the data into PowerPivot. In Excel 2013 this is done transparently and since I have only one single table there is no question about joins.

 

LoadToDataModel

After that I started Power View in Excel 2013(Insert)  the data model will appear. I select maps as the graphical tool in Power View. The selections for my map are the following. I need a measure in the size to see anything. I simply run a count on Lämningstyp(=classification of monument), and in Location I put the Latitude and Longitude data field. I added lämningstyp a second time for the color of the circles in Power View.

 

Selections for the map

The result look like this with two “Lämningstyp” selected in a filter. Gravfält is a burial field and Hög is a burial mound, usually from the bronze age. All sub areas are not loaded into the model so there are more monumnets than you see in the map.

Resulting Map

Below you also see the filter section.

The filter section

 

So what is the point of doing this? The is an available database together with maps of all ancient monuments in Sweden but without the flexibility of the Excel 2013 solution I have shown here. The fact that I can add colors to different groups or classification of monuments in Power View is very useful. It is possible to see different clusters of classifications on the same geographical spot or near by. I can also quickly select and unselect categories of monuments.

Posted in Excel 2013, Power View, PowerPivot and DAX | Tagged: , , | Leave a Comment »

Useful DAX Starter Functions and Expressions

Posted by thomasivarssonmalmo on April 25, 2011

I have been a little slow with the expression language of PowerPivot DAX mainly because I felt it a little hard to understand. Maybe this problem comes from my mind being occupied by MDX, TSQL and the expression language in Reporting Services.

Right now I have som starter functions that can help with getting to know the DAX syntax better and perhaps answer some common business questions that are not to complicated. All my examples comes from the Adventure Works star schema model that you need to import into PowerPivot first.

Start with doing your calculations on top of a Pivot Table with PowerPivot tables as the source. Calculations in the PowerPivot tables will only make your more confused. You find this editor under the PowerPivot tab in the Excel ribbon under new measure. Make sure that it is the fact table that is selected as the table name at the top.

New DAX Measure

DAX Aggregation formulas

Here I am thinking about the aggregation formulas that you get when you drop a measure in the values box in the Pivot Table design tool.  You do not need the editor above to create these standard aggregation formulas.

DAX Aggregation

The formula can be seen above the OK button in the picture above. If you drop a measure you get: SUM(FactOnlineSales[SalesAmount]). Wrapped in the SUM function is the table name combined with the column name. If you drop a text column in the values box you will get the COUNTA function instead which makes sense since you cannot SUM text values. This is the COUNTA example: COUNTA(FactOnlineSales[SalesOrderNumber]). If you try to change this to COUNT you will get this error message: “The function COUNT takes an argument that evaluates to numbers or dates and cannot work with values of type String.”

The beauty of DAX: Distinct Count

Here comes a new taste of DAX where we need to combine to DAX functions to get the result we want. Distinct Count is hard to set up in Analysis Services beacause of dfferent technical requirement of that model. In PowerPivot and DAX we have this available by using two functions that are not too complicated.  Remember that end users often confuse Count and Distinct Count.

DistinctCount

The syntax is as this: =COUNTROWS(DISTINCT(FactInternetSales[CustomerKey])).

First it is the CountRows function and with the Distinct function inside of it. In the Distinct function we point to the dimension key that we are interested of doing a distinct count on. In this case it is the CustomerKey in the fact table. ProductKey or any other foreign key will work in the same way.

DAX Time Calculations

It is important that your PowerPivot model has complete years regarding dates in order to avoid strange results. The Adventure Works star schema model has this limitation.

First is the accumulated value of a measure within a year frequently called YTD(Year to Date). Here we combine the outer TotalYTD function with the SUM function like this:

=TOTALYTD(SUM(FactOnlineSales[SalesQuantity]),DimDate[Datekey])

The example is from the Contoso Sales sample database. DimDate is the date dimension table and the DateKey is a date column. The fact table name here is FactOnlineSales and SalesQuantity is the measure in that fact table.

The result looks like this in Excel 2010.

TotalYTD

Ratios in DAX

Ratios introduce a new generic DAX function called Calculate and a little different thinking than in the previous examples. If we want to build a ratio in DAX with the measure SalesAmount we need the nominator, that is the current aggregated value in the Pivot Table and divide that with the denominator or total of SalesAmount that is always the same.

In DAX and with the Contoso sales example database this can be built in this way.

=SUM(FactOnlineSales[SalesAmount])/CALCULATE(SUM(FactOnlineSales[SalesAmount]),ALL(FactOnlineSales))

The result can be seen below.

Ratio DAX

Recommended reading

There is one book that I can strongly recommend if you want to learn more about DAX and PowerPivot models and that is Marco Russo’s and Alberto Ferrari´s “PowerPivot for Excel 2010 Give Your Data Meaning. More information can be found on this link.

http://www.powerpivotworkshop.com/

Posted in PowerPivot and DAX | Leave a Comment »

PowerPivot Schema Flexibility

Posted by thomasivarssonmalmo on January 15, 2011

In SSAS cubes we mostly use the starschema model with a central fact table and dimension tables as navigation paths down to the measures. That model will work in many scenarios and return valuable insights into business processes. In this model we do not connect fact tables to other fact tables directly but the relation appears with related dimensions.

With the arrival of PowerPivot much talk have been focused around DAX, the expression language, or formula language, for PowerPivot. I believe that this is only part of the story.

PowerPivot have a more flexible model where you can do this but also relate fact tables with different granularities directly to each other. This will give you added flexibility in a Pivot Table.

This is a recreation of a report scenario that I have tried on real production data but I will use one of the other Adventure Works demo databases that is more transaction oriented. I have created a product table(product attributes) and a date table(single dates, Year and YearMonth(200801) attributes. Fact tables are both SalesOrderHeader and SalesOrderDetail.

Result different granularity

You will recognize the dimension tables as the slicers but I have used some TSQL code to build this in this more detailed version of the Adventure Works processes. The two fact tables are the OrderHeader table and the Order detail table. Both tables include the OrderNumber column that I have dragged to the values area and changed the formula from SUM to COUNT.

The count function

The relations tab in PowerPivot is where you create the more flexible model.

Relations between the fact tables

In this model Product is related to the SalesOrderDetail(ProductId) column and  Date to the SalesOrderHeader table(OrderDate). SalesOrderDetail is related the SalesOrderHeader table with the SalesOrderId column.

Finally, this is the behaviour in the Pivot Table with this setup.

Click on Categories

Product categories and year level of the order date column are used as slicers above. If you click on a product category member in the slicer only the count measure of the SalesOrderDetail table will change(CountSalesOrderDetailsRecords).

Click On Year

If you click on single years both the CountSalesOrderHeader measure and the CountSalesOrderDetailsRecords will change, since the top fact table filters the lower fact table.

Edit: I would like to add that PowerPivot models are optimized as star schemas without a relation between the fact tables.  My approach above will face performance challanges with too much data.

Posted in PowerPivot and DAX | 4 Comments »

PowerPivot, Apples and Bananas

Posted by thomasivarssonmalmo on December 16, 2010

Edit: This post was updated after the format chaos that appeared at first.

More than 15 years ago I started my professional career in IT as an Excel teacher. One excercise that I frequently used was a simple math problem that you start with in your first year of school as a kid. Make sums of fruits in order to demonstrate how cell references works. I would like to use the same simple example in order to demonstrate how a simple data model in PowerPivot can be built for non database professional and other interested in an introduction to PowerPivot. The model that I will describe is a data model called a starschema.

First: Build a simple table in Excel. I use Excel 2010 which is a requirement for the PowerPivot add in for Excel.

alt

With the word snapshot I describe the table as a state at the end of the day describing what two people(Carl and Anna) have bought from the grocery store. I assume that both have not consumed the fruit yet. The transactions behind this snapshot are the following(see the second picture below). A transactional way of looking at the data is transformed into a summary about the state at the end of  a day regarding how many fruits that Carl and Anna have bought.

alt

The table above is formatted by using the Format as Table button in the Excel ribbon. On top of the transaction table above you can create a Pivot Table by using place the cursor in the transaction table and select Insert and Pivot Table in the Ribbon. The result is seen in the next picture below. This is how most end users use their table shaped data and Pivot Tables in their daily work.

alt

The example above is not how PowerPivot will work in the best way but a current praxis by many Excel users. PowerPivot will require you to organize the data differently than the data dumps in a spreadsheet. Remember that for this simple example PowerPivot is not necessary but I like to use simple examples to demonstrate a more complex scenario of how you can organize data in the best way. That is also the structure of the data that will be presented to Excel end users.

alt

I have constructed four small tables in the same Excel sheet above.  Three of the tables are dimension tables(Customer, Product and Date) and in the real world they will have more columns than in this simple model. The table with most columns, below the dimension tables, is the fact table that contains one measure(Qty) and the other columns are connectors to each dimension table above. Be aware of that I have called the customer identifier in the dimension table CustomerId but Customer in the fact table. When I add these surrogate keys, like they are called technically, the customer can change name. The surrogate key will be the same.  I have the same setup with the product dimension and the fact table.

Why would you do something complex like this when it is possible to do this much more simple, like the examples that I showed earlier? If you think a little bigger and imagine that you will have dimension tables with thousands of products and customers and maybe 1-2 million rows in the fact table. Also you will probably get these tables from the source system that are structured like the mini tables that I show in this spreadsheet. You can simply enter the same data and then format each mini table as a table by the format as table option in Excel.

In the next step will will add each table above to a PowerPivot model by using Linked Tables in PowerPivot.

Place the cursor in each mini table in the Excel spreadsheet and select the “Create Linked Table” under PowerPivot in the Excel ribbon.

After you have imported these tables you can see them in the PowerPivot window as separate tabs. I have renamed the tables with more descriptive names than table1.

alt

Relations are defined from the fact table to the dimension tables in PowerPivot.

I have defined a relation between the date column in the sales fact column and the Date column in the Date table. In the last step we will have to define the relations between the tables that we have added to the PowerPivot model.  Under the design tab in the PowerPivot Table Tool you can see the Manage Relationships button to get the dialoge window below.

alt

After all relations between the dimension tables and the fact table have been defined you will see the following relations.

alt

Below is the final report build in Pivot Tables with the PowerPivot tables as the data source. I also show the start micro tables in the spreadsheet but the data in them has been copied in to the PowerPivot data store.

alt

Posted in PowerPivot and DAX | Leave a Comment »

Soccer statistics in Gemini and Excel 2010

Posted by thomasivarssonmalmo on August 23, 2009

Soccer is the US name for football and even little Sweden have a national league with a top division called “Allsvenskan”. There is statististics published during the season regarding how many penalties, free kicks, red and yellow cards that the players in each team have collected. My thought was to see how I can collect and analyze that information in Gemini and Excel 2010.
 
Before that fun begins let me sum my thoughts about Gemini. Parts of it also come from different discussion forums and MS presentation material.  Gemini let you build large tables, from databases, web page information, Excel and text files, without having to use a ETL tool like Integration Services, connect that information and analyze that data in Excel Pivot Tables.  You can do this locally in Excel 2010 or publish the information to the Gemini Server that is part of the next version of Sharepoint. Gemini is not the same as the next version of Analysis Services that will be part of SQL Server 2008 R2 and is planned to be released next year.
 
Gemini is a new Sharepoint application that relates to Analysis Services but it is a new and separate track. The Sharepoint part will only bee needed for publishing Excel/Gemini data to a server. The client part, that I am using here, will only require Excel 2010.
 
Previous versions of Excel have only the notion of rows, columns and cells that make it hard to use this data source as a relational data store even if it is supplied with plenty of functions that tries to mimic the notion of relational data.
 
Back to Gemini and football. I have collected the statistics below from an official web page by simply marking and copying the 14 pages that includes all the Swedish licensed players in the highest division. The first page includes statistics about red and yellow cards for each player. I would like to have all statistics, inclusing also caused free kicks and received free kicks, on one page but that is not the case here. It is also the challange for Gemini to connect disparate information in an easy way.
 
You can see my LagNamn column (Team Name) as the last column in the Excel tab that will secure the unique identfier between the two tabs in Gemini.
 
 
 
 The second excel sheet contains the caused free kick and received free kick data but with the same player names and team codes as the first sheet. I have imported both into the same spreadsheet but have them on separate tabs. Both tables are formatted as tables with the Excel format as table option.
 
A first issue appeared here in my first attempt to copy and paste the information into Gemni that you simply do by marking all the records in a tab and select copy in Excel and then go to the Gemini add in tab in Excel and select load and prepare data. In the Gemini environment you will have the “to new table option” available in the ribbon in the second group of buttons from the left. The problem is that a few of the 417 licensed players have the same name and that gave me an error when I tried to create a relation, in Gemini, between the to tabs of data by only using play names( first and lastname). Excel 2010 has a button under the data tab in the ribbon where you can remove duplicates only to check if the tab contains duplicates. Excel 2010 will tell you that it has found duplicates before you need to remove them.
 
First i tried to build a concatenated string of the team and the player in Gemini with the DAX statement: = [Team] & [PlayerName], but the problem is that these statements will not show up when you try to relate the tabs in Gemini. If I do the same string concatenation in a column in Excel 2010 and then import that into Gemini, I can use it when I create relations between the tabs. You will see the final relation in Gemini below. Varningar is my first tab with red and yellow card information for each player and team. Frisparkar(free kicks) is the second tab and both use the Excel concatenation of team and player name as the key.
 
 
 
 Finally, when I have imported the data ifrom the two tabs into Gemini and have created the relation between them I can use the information in Excel Pivot tables and do the analysis.
 

 
 Now I can see each teams caused free kicks, received free kicks, red and yellow cards and penalties. I can drill down on each team to see the statistics for each player. “Diff frispark(Swedish) is the difference between the number of received free kicks and caused free kicks. “Diff straff” is the difference between thenumber of received and caused penalties. The two last columns is the sum of the number of yellow and red cards. Measuring the difference between the last two columns does not make any sence.
 
Gemini is not only about analyzing large amount of data but also about getting relational storage capacities naitivly in Excel 2010. It is that functionality that let you build quick structures for analyzing data that can be  a complement to the classical data warehouse approach.
 
The source for the statistics is here. It is only in Swedish but you can clearly see what I have started with.

Posted in Excel 2010, PowerPivot and DAX | 5 Comments »