Which tool do I use?
Posted at 2/28/2008 11:58:00 AM
I monitor a few BI web sites and I talk with many IT folks working in DW and BI. In spite of how many times I see and hear it, I am still astonished when people discuss which is the best BI tool. Many years ago I heard a smart person say the whole world looks like a nail when the only tool you have is a hammer. It seems that rule is as constant as the sunrise.
Comparing SQL Server, Cognos, SAS and QlikView is like comparing hammers, saws, drills and wrenches. These are all very different products with different strengths and weaknesses. Each of them may be the best tool for something but none is best for everything in BI any more than one of the builders’ tools is the best for building a house.
My interest is in databases and I am frequently told something like “I can do that with (SSAS, Business Objects, Pentaho,…) when I describe a feature of our database.
Yes, perhaps, but I guess you could use a hammer to smash the end off a board if you didn’t have a saw. Does that make sense? Of course not, but many people do not see the obvious parallel.
For example, the iLuminate database indexes everything, always. This makes it an excellent foundation for analytics and information discovery. Because of the indexing, any column of any table can be either a dimension or a fact depending on the nature of the business question. Any column or columns of any table or tables can be used as selection criteria and there will never be a full table scan.
Can you duplicate these features with SQL Server? Theoretically yes, but in the real world, it can never be done. This would require building as many star schemas and indices as there are columns in the database. You would need more disk than can be attached to a computer and most of the stars would never be used. Unfortunately, you can’t predict which of them would never be used. With SQL Server you can resolve some of the business issues but never all of them.
Can you duplicate these features with an OLAP product? Even theoretically this is not possible. How about with SSAS or Business Objects or??? No. But can it be done with the right suite of products? Again, no.
The only product in a complete suite that could attempt to provide the indexing power is the database and the relational databases can not do it. Front end systems and mathematical algorithms can not even address the issue. There is no product or combination of products that can duplicate the benefits of the indexing function.
The real statement that people are making is that they know how to work well with a given tool and they would rather focus on the things they can do with that tool than solve business problems. As a result, the query from hell seems to have a long and healthy future.
Comparing SQL Server, Cognos, SAS and QlikView is like comparing hammers, saws, drills and wrenches. These are all very different products with different strengths and weaknesses. Each of them may be the best tool for something but none is best for everything in BI any more than one of the builders’ tools is the best for building a house.
My interest is in databases and I am frequently told something like “I can do that with (SSAS, Business Objects, Pentaho,…) when I describe a feature of our database.
Yes, perhaps, but I guess you could use a hammer to smash the end off a board if you didn’t have a saw. Does that make sense? Of course not, but many people do not see the obvious parallel.
For example, the iLuminate database indexes everything, always. This makes it an excellent foundation for analytics and information discovery. Because of the indexing, any column of any table can be either a dimension or a fact depending on the nature of the business question. Any column or columns of any table or tables can be used as selection criteria and there will never be a full table scan.
Can you duplicate these features with SQL Server? Theoretically yes, but in the real world, it can never be done. This would require building as many star schemas and indices as there are columns in the database. You would need more disk than can be attached to a computer and most of the stars would never be used. Unfortunately, you can’t predict which of them would never be used. With SQL Server you can resolve some of the business issues but never all of them.
Can you duplicate these features with an OLAP product? Even theoretically this is not possible. How about with SSAS or Business Objects or??? No. But can it be done with the right suite of products? Again, no.
The only product in a complete suite that could attempt to provide the indexing power is the database and the relational databases can not do it. Front end systems and mathematical algorithms can not even address the issue. There is no product or combination of products that can duplicate the benefits of the indexing function.
The real statement that people are making is that they know how to work well with a given tool and they would rather focus on the things they can do with that tool than solve business problems. As a result, the query from hell seems to have a long and healthy future.
How many DBAs does it take to...
Posted at 2/27/2008 02:01:00 PM
How many DBA’s does it take to change a light bulb? Come on! They have more important things to do.
How many DBA’s does it take to optimize performance of a data warehouse? The answer should be the same.
A DBA should be focused on making the information owned by the enterprise as accessible and useful to the enterprise as possible. Unfortunately, when the data warehouse has been built on a relational database, performance tuning issues will consume most of their time. Precious little time is left for tasks like identifying user needs, coaching designers and developers, developing and managing database access policy and the myriad of other tasks that directly benefit the business.
The value based structure inherently provides all of the performance tuning that can be done. In this system, each value is stored once regardless of where or how frequently it may be used. The core of the database is an index and linking system that links all of those values into their proper place in records.
Because of this structure, every column of every table is always indexed. There is no way to access anything in this database except through the index. Inserts, updates and deletes never result in overflow areas or empty spaces in the database – no reorganizing ever. The size of the database is dramatically less than it would be with the same data in an RDBMS based data warehouse, nearly eliminating physical design issues.
When the database system does all of that for the DBAs, they can spend a lot more time with users and developers to maximize the benefit that the organization gets from its information. The organizations information is an asset and it is the job of the DBA to ensure that the asset is used as effectively as possible. Spending time on ensuring that the database is designed to take advantage of every possible processor cycle does very little to help the business.
How many DBA’s does it take to optimize performance of a data warehouse? The answer should be the same.
A DBA should be focused on making the information owned by the enterprise as accessible and useful to the enterprise as possible. Unfortunately, when the data warehouse has been built on a relational database, performance tuning issues will consume most of their time. Precious little time is left for tasks like identifying user needs, coaching designers and developers, developing and managing database access policy and the myriad of other tasks that directly benefit the business.
The value based structure inherently provides all of the performance tuning that can be done. In this system, each value is stored once regardless of where or how frequently it may be used. The core of the database is an index and linking system that links all of those values into their proper place in records.
Because of this structure, every column of every table is always indexed. There is no way to access anything in this database except through the index. Inserts, updates and deletes never result in overflow areas or empty spaces in the database – no reorganizing ever. The size of the database is dramatically less than it would be with the same data in an RDBMS based data warehouse, nearly eliminating physical design issues.
When the database system does all of that for the DBAs, they can spend a lot more time with users and developers to maximize the benefit that the organization gets from its information. The organizations information is an asset and it is the job of the DBA to ensure that the asset is used as effectively as possible. Spending time on ensuring that the database is designed to take advantage of every possible processor cycle does very little to help the business.
Flexibility is Cash
Posted at 2/26/2008 07:30:00 AM
Continuing the last discussion, how can the information in an RDBMS be converted from a drill press into a pile of cash? Add flexibility. A drill press does one thing very well while cash can be used to accomplish many objectives. The structure of data in an RDBMS defines the boundaries of its use just as the motors, gears and levers of a drill press define its use.
A data structure that allows complete flexibility of use frees the corporate information asset to be used more like cash. As long as data is stored in records and indexed for access, it will do one thing very well and it will require armies of smart people to make it deliver minimal results for all other information functions.
There are some relatively new options for supporting information discovery and analytics in the market today that were not available when most data warehouses were being planned and implemented. Column structure was introduced by Sybase and other companies have joined in with their new products. Value structure is available only from Illuminate today.
Column structured database systems store data in columns rather than records. Adding a new record in this structure requires splitting the new record into individual values and then updating the column sets individually. Retrieving information involves finding the desired value in a column and the retrieving all of the values from the same relative position in all other columns. While this may sound complicated and difficult, it actually takes advantage of the fact that processor cycles are cheap and plentiful while disk bandwidth is limited and expensive. This structure reduces the I/O load and disk usage when compared to record structures.
The value structure stores unique data values once in a data pool and maintains an indexing system that reconstructs the records as needed. Like the column structure, this structure uses processor cycles to reduce I/O. However, the value based structure reduces I/O even more than the column structure and makes possible new analytical and discovery functions that are unique to this structure. This is as close as information gets to cash today.
A data structure that allows complete flexibility of use frees the corporate information asset to be used more like cash. As long as data is stored in records and indexed for access, it will do one thing very well and it will require armies of smart people to make it deliver minimal results for all other information functions.
There are some relatively new options for supporting information discovery and analytics in the market today that were not available when most data warehouses were being planned and implemented. Column structure was introduced by Sybase and other companies have joined in with their new products. Value structure is available only from Illuminate today.
Column structured database systems store data in columns rather than records. Adding a new record in this structure requires splitting the new record into individual values and then updating the column sets individually. Retrieving information involves finding the desired value in a column and the retrieving all of the values from the same relative position in all other columns. While this may sound complicated and difficult, it actually takes advantage of the fact that processor cycles are cheap and plentiful while disk bandwidth is limited and expensive. This structure reduces the I/O load and disk usage when compared to record structures.
The value structure stores unique data values once in a data pool and maintains an indexing system that reconstructs the records as needed. Like the column structure, this structure uses processor cycles to reduce I/O. However, the value based structure reduces I/O even more than the column structure and makes possible new analytical and discovery functions that are unique to this structure. This is as close as information gets to cash today.
There Are No Queries From Hell
Posted at 2/25/2008 10:18:00 AM
"Queries from hell" really represent "preparation from hell." If you have a really nasty query that is consuming the system, it is already too late. But don't give up and get a job packing groceries yet.
In most cases, the biggest flaw in the preparation is the selection of the wrong infrastructure components. A database that is ideal for supporting an OLTP system is dead wrong for supporting information discovery and analytical processes. And, of course, those are the processes that are usually responsible for those famous queries. Trying to support a data warehouse with a record-based storage system is like trying to fly a plane with a diesel engine; while it may be possible to do it, it makes no sense.
Looking at some alternative storage technologies opens new business opportunities for the data warehouse, develops the real asset value of corporate information and banishes those famous queries.
“Information is a corporate asset” is a mantra that many IT people have been chanting for years. Recognition of information as a real corporate asset would give the IT industry a new status they have longingly sought for decades. However, what kind of asset should it be?
A drill press is an asset and so is a pile of cash. A drill press can be used to efficiently perform one step of a manufacturing process and a pile of cash can be used to accomplish almost any business objective in any line of business. If information is rigidly structured, it is like the drill press. If it can be accessed in any way at any time for any purpose, then it is more like cash. The good old RDBMS makes a wonderful drill press.
In most cases, the biggest flaw in the preparation is the selection of the wrong infrastructure components. A database that is ideal for supporting an OLTP system is dead wrong for supporting information discovery and analytical processes. And, of course, those are the processes that are usually responsible for those famous queries. Trying to support a data warehouse with a record-based storage system is like trying to fly a plane with a diesel engine; while it may be possible to do it, it makes no sense.
Looking at some alternative storage technologies opens new business opportunities for the data warehouse, develops the real asset value of corporate information and banishes those famous queries.
“Information is a corporate asset” is a mantra that many IT people have been chanting for years. Recognition of information as a real corporate asset would give the IT industry a new status they have longingly sought for decades. However, what kind of asset should it be?
A drill press is an asset and so is a pile of cash. A drill press can be used to efficiently perform one step of a manufacturing process and a pile of cash can be used to accomplish almost any business objective in any line of business. If information is rigidly structured, it is like the drill press. If it can be accessed in any way at any time for any purpose, then it is more like cash. The good old RDBMS makes a wonderful drill press.
Labels: intro
What are "Queries from Hell"?
Posted at 2/20/2008 04:24:00 PM
In the world of data warehousing, administrators need to be wary of "queries from hell." These are variously defined, by IT Jungle as "gigantic queries that may have been accidentally set off by users and that can consume enormous amounts of systems resources," and by the IS Management Handbook as database queries that "can destroy perceived levels of performance if they are not identified and managed carefully."
They are defined here simply as queries that are really really hard to execute using standard data warehouse technology because they require database restructuring, days of SQL coding, inordinate processing resources, or just don't seem to adhere to the rules that 90+% of "standard" queries follow.
Telling business users that they can't have an answer to a question because even though the answer lies within a database, it is simply too difficult to extract, is not a feasible option. Neither, often, is explaining that the answer can be obtained, but it will take weeks or months of programming.
The answer is to segregate these queries from the 90+% of standard BI and business analysis queries, and treat them differently. Use the right tools for the job. That's the kind of thing you'll be seeing in this blog over the coming months and years.
They are defined here simply as queries that are really really hard to execute using standard data warehouse technology because they require database restructuring, days of SQL coding, inordinate processing resources, or just don't seem to adhere to the rules that 90+% of "standard" queries follow.
Telling business users that they can't have an answer to a question because even though the answer lies within a database, it is simply too difficult to extract, is not a feasible option. Neither, often, is explaining that the answer can be obtained, but it will take weeks or months of programming.
The answer is to segregate these queries from the 90+% of standard BI and business analysis queries, and treat them differently. Use the right tools for the job. That's the kind of thing you'll be seeing in this blog over the coming months and years.