Soccer statistics in Gemini and Excel 2010
Soccer is the US name for football and even little Sweden have a national league with a top division called “Allsvenskan”. There is statististics published during the season regarding how many penalties, free kicks, red and yellow cards that the players in each team have collected. My thought was to see how I can collect and analyze that information in Gemini and Excel 2010.
Before that fun begins let me sum my thoughts about Gemini. Parts of it also come from different discussion forums and MS presentation material. Gemini let you build large tables, from databases, web page information, Excel and text files, without having to use a ETL tool like Integration Services, connect that information and analyze that data in Excel Pivot Tables. You can do this locally in Excel 2010 or publish the information to the Gemini Server that is part of the next version of Sharepoint. Gemini is not the same as the next version of Analysis Services that will be part of SQL Server 2008 R2 and is planned to be released next year.
Gemini is a new Sharepoint application that relates to Analysis Services but it is a new and separate track. The Sharepoint part will only bee needed for publishing Excel/Gemini data to a server. The client part, that I am using here, will only require Excel 2010.
Previous versions of Excel have only the notion of rows, columns and cells that make it hard to use this data source as a relational data store even if it is supplied with plenty of functions that tries to mimic the notion of relational data.
Back to Gemini and football. I have collected the statistics below from an official web page by simply marking and copying the 14 pages that includes all the Swedish licensed players in the highest division. The first page includes statistics about red and yellow cards for each player. I would like to have all statistics, inclusing also caused free kicks and received free kicks, on one page but that is not the case here. It is also the challange for Gemini to connect disparate information in an easy way.
You can see my LagNamn column (Team Name) as the last column in the Excel tab that will secure the unique identfier between the two tabs in Gemini.
The second excel sheet contains the caused free kick and received free kick data but with the same player names and team codes as the first sheet. I have imported both into the same spreadsheet but have them on separate tabs. Both tables are formatted as tables with the Excel format as table option.
A first issue appeared here in my first attempt to copy and paste the information into Gemni that you simply do by marking all the records in a tab and select copy in Excel and then go to the Gemini add in tab in Excel and select load and prepare data. In the Gemini environment you will have the “to new table option” available in the ribbon in the second group of buttons from the left. The problem is that a few of the 417 licensed players have the same name and that gave me an error when I tried to create a relation, in Gemini, between the to tabs of data by only using play names( first and lastname). Excel 2010 has a button under the data tab in the ribbon where you can remove duplicates only to check if the tab contains duplicates. Excel 2010 will tell you that it has found duplicates before you need to remove them.
First i tried to build a concatenated string of the team and the player in Gemini with the DAX statement: = [Team] & [PlayerName], but the problem is that these statements will not show up when you try to relate the tabs in Gemini. If I do the same string concatenation in a column in Excel 2010 and then import that into Gemini, I can use it when I create relations between the tabs. You will see the final relation in Gemini below. Varningar is my first tab with red and yellow card information for each player and team. Frisparkar(free kicks) is the second tab and both use the Excel concatenation of team and player name as the key.
Finally, when I have imported the data ifrom the two tabs into Gemini and have created the relation between them I can use the information in Excel Pivot tables and do the analysis.
Now I can see each teams caused free kicks, received free kicks, red and yellow cards and penalties. I can drill down on each team to see the statistics for each player. “Diff frispark(Swedish) is the difference between the number of received free kicks and caused free kicks. “Diff straff” is the difference between thenumber of received and caused penalties. The two last columns is the sum of the number of yellow and red cards. Measuring the difference between the last two columns does not make any sence.
Gemini is not only about analyzing large amount of data but also about getting relational storage capacities naitivly in Excel 2010. It is that functionality that let you build quick structures for analyzing data that can be a complement to the classical data warehouse approach.
The source for the statistics is here. It is only in Swedish but you can clearly see what I have started with.