Thomas.I Microsoft BI & Analytics

A Site Dedicated to General BI Issues and MS BI Issues: Architecture and Technology

Soccer statistics in Gemini and Excel 2010

Posted by thomasivarssonmalmo on August 23, 2009

Soccer is the US name for football and even little Sweden have a national league with a top division called “Allsvenskan”. There is statististics published during the season regarding how many penalties, free kicks, red and yellow cards that the players in each team have collected. My thought was to see how I can collect and analyze that information in Gemini and Excel 2010.
 
Before that fun begins let me sum my thoughts about Gemini. Parts of it also come from different discussion forums and MS presentation material.  Gemini let you build large tables, from databases, web page information, Excel and text files, without having to use a ETL tool like Integration Services, connect that information and analyze that data in Excel Pivot Tables.  You can do this locally in Excel 2010 or publish the information to the Gemini Server that is part of the next version of Sharepoint. Gemini is not the same as the next version of Analysis Services that will be part of SQL Server 2008 R2 and is planned to be released next year.
 
Gemini is a new Sharepoint application that relates to Analysis Services but it is a new and separate track. The Sharepoint part will only bee needed for publishing Excel/Gemini data to a server. The client part, that I am using here, will only require Excel 2010.
 
Previous versions of Excel have only the notion of rows, columns and cells that make it hard to use this data source as a relational data store even if it is supplied with plenty of functions that tries to mimic the notion of relational data.
 
Back to Gemini and football. I have collected the statistics below from an official web page by simply marking and copying the 14 pages that includes all the Swedish licensed players in the highest division. The first page includes statistics about red and yellow cards for each player. I would like to have all statistics, inclusing also caused free kicks and received free kicks, on one page but that is not the case here. It is also the challange for Gemini to connect disparate information in an easy way.
 
You can see my LagNamn column (Team Name) as the last column in the Excel tab that will secure the unique identfier between the two tabs in Gemini.
 
 
 
 The second excel sheet contains the caused free kick and received free kick data but with the same player names and team codes as the first sheet. I have imported both into the same spreadsheet but have them on separate tabs. Both tables are formatted as tables with the Excel format as table option.
 
A first issue appeared here in my first attempt to copy and paste the information into Gemni that you simply do by marking all the records in a tab and select copy in Excel and then go to the Gemini add in tab in Excel and select load and prepare data. In the Gemini environment you will have the “to new table option” available in the ribbon in the second group of buttons from the left. The problem is that a few of the 417 licensed players have the same name and that gave me an error when I tried to create a relation, in Gemini, between the to tabs of data by only using play names( first and lastname). Excel 2010 has a button under the data tab in the ribbon where you can remove duplicates only to check if the tab contains duplicates. Excel 2010 will tell you that it has found duplicates before you need to remove them.
 
First i tried to build a concatenated string of the team and the player in Gemini with the DAX statement: = [Team] & [PlayerName], but the problem is that these statements will not show up when you try to relate the tabs in Gemini. If I do the same string concatenation in a column in Excel 2010 and then import that into Gemini, I can use it when I create relations between the tabs. You will see the final relation in Gemini below. Varningar is my first tab with red and yellow card information for each player and team. Frisparkar(free kicks) is the second tab and both use the Excel concatenation of team and player name as the key.
 
 
 
 Finally, when I have imported the data ifrom the two tabs into Gemini and have created the relation between them I can use the information in Excel Pivot tables and do the analysis.
 

 
 Now I can see each teams caused free kicks, received free kicks, red and yellow cards and penalties. I can drill down on each team to see the statistics for each player. “Diff frispark(Swedish) is the difference between the number of received free kicks and caused free kicks. “Diff straff” is the difference between thenumber of received and caused penalties. The two last columns is the sum of the number of yellow and red cards. Measuring the difference between the last two columns does not make any sence.
 
Gemini is not only about analyzing large amount of data but also about getting relational storage capacities naitivly in Excel 2010. It is that functionality that let you build quick structures for analyzing data that can be  a complement to the classical data warehouse approach.
 
The source for the statistics is here. It is only in Swedish but you can clearly see what I have started with.
Advertisements

5 Responses to “Soccer statistics in Gemini and Excel 2010”

  1. Colin said

    Hi Thomas, nice article. I have a few comments:"Gemini let you build large tables, from databases, web page information, Excel and text files, without having to use a ETL tool like Integration Services"You don’t have to use an ETL tool with Analysis Services either, and unlike Gemini, the UDM doesn’t impose restrictions on the types of table relationships that you can use. I don’t see how you can import text files in Gemini, unless you do it through Excel."Gemini is a new Sharepoint application that relates to Analysis Services but it is a new and separate track."It’s has both client components and server components. You can happily install the add-in in Excel and never publish the file to SharePoint. You can keep a copy of the file in a shared folder in Windows Server, although this won’t be the most efficient way to share the file."Previous versions of Excel have only the notion of rows, columns and cells that make it hard to use this data source as a relational data store even if it is supplied with plenty of functions that tries to mimic the notion of relational data"This architecture has not changed in Excel 2010 and probably will not change in any future version either."First i tried to build a concatenated string of the team and the player in Gemini with the DAX statement: = [Team] & [PlayerName], but the problem is that these statements will not show up when you try to relate the tabs in Gemini."Doesn’t show up where? If I create a calculated column named FullName with concatenated string like =[FirstName]&" "&[LastName], FullName shows up in the Source Column (Foreign Key) list in the Create Relationship dialog box.

  2. Thomas said

    I do not know who commented here but thank you for being clear about Gemini. It is true that Gemini both has the client part and the server part. I should have mentioned that I was using the client part. What I tried to explain was also that Gemni makes it possible to build relational structures "within" Excel 2010, but within the Gemini add-in. For an end user it will look like an Excel application anyway. The concatenated string, in the Gemini table, did not always show up(should have mentioned that) but when it did I only got the all level values from the second table. When I did the concatenation in Excel first it all worked fine. I will try to reproduce this behavior and report it, later today.

  3. Colin said

    Hi Thomas, my name is Colin Banfield. It was I who commented previously. When I signed in to Windows Live I saw my name show up on the top right of the screen, and so expected to see the name as the comment author. I don’t see anywhere to input a name otherwise.I’m interested to know what you see as the biggest limitations in Gemini. You may have seen some of my not so positive posts in Microsoft Connect, but I’ll reserve final judgement until after I get to play with CTP3. One frustration I have is that there is no obvious way to perform any kind of grouping in Gemini PivotTables. The three grouping types available in Excel PivotTables are: group by date, group by number (discretize) and group by field items (you highlight the items you want to group). In BIDS, for instance, you have the Dimension Wizard to help you create time dimensions. Within the cube builder it’s easy to set discretization for an attribute. Gemini has no similar concepts. You can of course create calculated fields for date grouping and discretization, but these actions won’t be obvious to the most Excel users that bulid cubes in Gemini. Better would be for Gemini to provide wizards that automatically create the calculated columns based on the user input.I also think that the Table Wizard could have been better done to create the queries graphically. Heck, even the stone-age MS Query in Excel does a better job in most cases.

  4. Thomas said

    Hi Colin. I do not agree on your view about the table wizard. It seems straight forward for me. The biggest limitation is if you cannot import a large text file and split it up into dimension like tables and fact data. I am not sure about what will happen in the next CTP release but text files are a frequent data source. Wiithout that functionality you can only load lareg amounts of data from relational sources.Grouping do work in my example. Each team player appeared under the correct team. I can hide the player level under the team level.Anyway you idea about adding wizards is important becuase without it the tool can be to complex for end users.

  5. Colin said

    "Hi Colin. I do not agree on your view about the table wizard. It seems straight forward for me."Yeah, it’s straightforward until you want to *visually*:1) Extract distinct records2) Create aggregations prior to importing into Gemini3) Create a derived table by selecting the tables, fields and relationships you want (in case you don’t want all the individual tables).All of the above can be handled by the visual tools in the decade old MS Query application, and indeed can be done in the most basic query tools that I’ve ever used. In Gemini, you have to write a SQL script to execute these operations. Not by any means difficult, but may be beyond the capability of Gemini’s target users (Excel users)."Grouping do work in my example. Each team player appeared under the correct team. I can hide the player level under the team level."This assumes that you do the grouping up front i.e during database query. However, it’s often the case that the end user wants the flexibility to create special groups in the PivotTable. This often occurs when there’s no obvious way to perform the categorization in a database field. In an Excel PivotTable, you can create any abitrary group by selecting the field items you want to group, then right-clicking and selecting Group. I see this kind of ad-hoc grouping quite often.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: