How expensive is free software?
Posted at 6/25/2008 09:10:00 AM
I was just looking at a white paper from Greenplum describing how fast, easy and inexpensive a data warehouse is because it runs on commodity hardware. I have seen uncountable comments about how inexpensive the Microsoft data warehouse infrastructure is and even more comments about the low cost of using open source software. It would be easy to believe that for the price of a few computers and a little software you could have a data warehouse running next week.
At a recent TDWI conference in Germany, Larissa Moss delivered a presentation defining the development steps and the required team members in a data warehouse implementation project. The presentation did a beautiful job of defining what has to be done and who has to do it. It involves a minimum team of five or six full time people and a number of part time specialists. The entire project is divided into sixteen steps with each step having multiple activities.
While many organizations will not follow every step and activity of her plan, it is an excellent definition of what really should be done to ensure quality and minimize total cost of ownership. So, using her project definition as a guideline, it is easy to see that the cost of the database software and the server hardware represents only a small portion of the total cost of a data warehouse.
The use of commodity hardware and open source software can reduce the total cost of the project by a couple of percentage points at best. Reducing the amount of human resource required can reduce the cost of the project by half. Some of the tasks identified in her presentation won’t change with any technology but there are some large improvements possible that can result in cost reductions that substantially exceed the total cost of the hardware and software.
For example, the Data Analysis Step includes six activities. Each of these activities involves a number of people and takes a significant amount of time. Using free (or nearly free) relational database software and/or inexpensive commodity hardware, all six steps are still required to have a reasonable prospect of a successful project. If another type of database can reduce the time and expense of these steps by just ten percent, the entire cost of the software and hardware can be offset.
Using the CDBMS structure, two of the activities, “Refine logical data model” and “Expand enterprise logical data model” are no longer needed. The other activities, like “Analyze source data quality”, will remain essentially the same regardless of database structure. Removing two of the six activities will reduce the overall time and cost of just one step by about one third.
As another example, take a look at one of the activities in the first step, “Cost Justification”. If the cost of a project is a few hundred dollars there is no justification needed other than someone saying it will help them with their job. If the cost of a project is a few tens of millions, it will require extremely detailed and robust justification and approval at top executive level. In between, the justification should have a relatively linear relationship to the expected cost. If the Data Analysis step is reduced by one third, and other steps are reduced by a substantial amount, the cost of the entire project is reduced and the cost justification time and effort can also be reduced.
Going through all of the steps, it is easy to see how the total effort required for a successful data warehouse project could be reduced by half with no compromise on the quality of the result.
At a recent TDWI conference in Germany, Larissa Moss delivered a presentation defining the development steps and the required team members in a data warehouse implementation project. The presentation did a beautiful job of defining what has to be done and who has to do it. It involves a minimum team of five or six full time people and a number of part time specialists. The entire project is divided into sixteen steps with each step having multiple activities.While many organizations will not follow every step and activity of her plan, it is an excellent definition of what really should be done to ensure quality and minimize total cost of ownership. So, using her project definition as a guideline, it is easy to see that the cost of the database software and the server hardware represents only a small portion of the total cost of a data warehouse.
The use of commodity hardware and open source software can reduce the total cost of the project by a couple of percentage points at best. Reducing the amount of human resource required can reduce the cost of the project by half. Some of the tasks identified in her presentation won’t change with any technology but there are some large improvements possible that can result in cost reductions that substantially exceed the total cost of the hardware and software.
For example, the Data Analysis Step includes six activities. Each of these activities involves a number of people and takes a significant amount of time. Using free (or nearly free) relational database software and/or inexpensive commodity hardware, all six steps are still required to have a reasonable prospect of a successful project. If another type of database can reduce the time and expense of these steps by just ten percent, the entire cost of the software and hardware can be offset.
Using the CDBMS structure, two of the activities, “Refine logical data model” and “Expand enterprise logical data model” are no longer needed. The other activities, like “Analyze source data quality”, will remain essentially the same regardless of database structure. Removing two of the six activities will reduce the overall time and cost of just one step by about one third.
As another example, take a look at one of the activities in the first step, “Cost Justification”. If the cost of a project is a few hundred dollars there is no justification needed other than someone saying it will help them with their job. If the cost of a project is a few tens of millions, it will require extremely detailed and robust justification and approval at top executive level. In between, the justification should have a relatively linear relationship to the expected cost. If the Data Analysis step is reduced by one third, and other steps are reduced by a substantial amount, the cost of the entire project is reduced and the cost justification time and effort can also be reduced.
Going through all of the steps, it is easy to see how the total effort required for a successful data warehouse project could be reduced by half with no compromise on the quality of the result.