Are Data Warehouses Dinosaurs?
Posted by thomasivarssonmalmo on April 11, 2011
This is a continous discussion that need to develop new arguments as with the presence of in memory tools like QlickView and PowerPivot. Even without these tools it is also interesting to discuss what additional values the investment in a Data Warehouse can give over other approaches.
One less succesful data warehouse development project, that I did not participate in but followed through other colleagues, that did work in this project, was about integrating more than 20 different source systems in as many countries. Reporting was done with Excel without any integration of the sources. I guess that end users printed out data in long sheets and then typed that information into Excel. This data warehouse project never involved end users until the project was almost finished. Then the end users were introduced to the data and asked about what reports they would like to have. End users continued to use Excel and the labour intensive way of making reports because they did not understand the data that was supplied to them. Most of the definitions and cleaning of data was decided by techies and not by the business. This is a true story.
Another issue with data warehouses is that after they have been created they can be hard to change. End users will have to write demands for business requirements. These demands will have to be prioritized and maybe funded if you rely on an external supplier. After that it is requests for tenders and after the decision development can start. This process can take between 2-6 months. One way to increase speed here is to have a dedicated BI professionals group that implement change requests but the situation can really be horrible if this function is outsourced.
But these are the bad experiences of data warehouse development that should not hide the great benefits of building this. Remember one thing, though, data warehouse is not the same as business intelligence but can be one important part of that.
A succeful data warehouse development process mostly have strong business requirement forcus of building a consolidated top or centralized view of data in different source systems. Most larger orgaizations have there important business data in more than one system: Financial data, invoice data, delivery /logistics and CRM data mostly reside in different systems but you will need all these sources to build a coherent view of the business. The data that you extract from these source systems frequently have a lot of quality issues or need some assistance to reduce ambiguity. Data volumes can also be large and the schemas of tables overwhelming in complexity. The consolidated view of information is important since it very easy to fall into the spreadmart trap and develop stovepipes or information siloes each with their own structures. That is only a repetition of the mistakes to only use Excel spreadsheets for reporting, building the same isolated structures in a data warehouse.
My conclusion is that a well defined data warehouse that follows the overall consolidated business view is a good start for a PowerPivot model. PowerPivot in itself can extract data from a not to complicated registration system but if you have more than one source things start to get complicated even for a self service tool. Source systems responsibles will never allow the PowerPivot model of retrieving all data all over again each time the model needs a data refresh so in this case a data warehouse is a good support for self service tools.