Queries From Hell
A blog about data warehousing, correlation databases, associative and incremental queries, value-based storage, metadata, on-the-fly indexing, automatic data-driven schemas, BI tools, data mining, visual mapping, pattern recognition, and the limitations of standard SQL in answering "queries from Hell."
Or, how to discover what you don't know you don't know.

Denormalizing your world

Posted at 5/29/2008 04:11:00 AM
How many tables are in your data warehouse? 500? 1000? 5000? More? One data warehouse that I know of has approximately 500,000 tables!

Is it any wonder that business people can not use a data warehouse? Even most data marts are more complex than the business activity they represent. The paradox here is that if you try to list all of the real world entities that constitute a business information environment it is very hard to find more than fifty or sixty and a good case could be made for as few as twenty.

Where then do all of these tables come from and what do they mean? Most of them are the result of optimizing data structures to improve the performance of a computer. Good computer performance is gained at the expense of good business performance—that seems like a poor tradeoff.

Normalizing data structures and metadata rather than normalizing data is the first step in making information accessible to business users. A data structure that never repeated a metadata value, including repeats in the form of contrived names to represent the same thing in different contexts and levels of aggregation, would be the most understandable structure for business information users. This means one table for all customer data regardless of source or usage, providing a single customer view. The same goes for product, service, order, invoice, etc.

In an environment where one table represents one entity the average business user could easily navigate through the tables and BI for the masses would become a more realistic concept.

Add to Technorati Favorites      | Blogger Comments (0) |    


B-Eye Podcast from TDWI

Posted at 5/20/2008 05:36:00 AM
During the whirlwind that was last week's Data Warehouse Institute (TDWI) conference in Chicago, I had the opportunity to sit down with David Loshin of the Business Intelligence Network for a rapid-fire discussion on data warehouse design, implementation and use.

In this podcast, David and I talk about how organizations can quickly deploy enterprise data warehouses, eliminate the business requirements definition phase and use all of their data for functions such as finance and performance management. We also discuss how a correlation data warehouse lowers the barrier to entry by reducing both up-front and ongoing maintenance costs, and enables business users to perform ad hoc analysis, discovery and exploration.


Add to Technorati Favorites      | Blogger Comments (0) |    


What's in a Name?

Posted at 5/15/2008 06:04:00 AM
Naming of data objects has always been an interesting issue. Long ago, programmers named files and fields arbitrarily, at their own whim. Later, most IT departments standardized naming rules. Now there are industry standard naming conventions like International Standard ISO/IEC 11179-5 and complete naming systems like the Dublin Core Metadata Initiative. Yet most business users today could not even begin to interpret the meaning of the names used for the hundreds or even thousands of tables and the multiple thousands of columns in an enterprise information management infrastructure. These names identify the information they use every day but the names are so unintelligible that no one could interpret them without a written standard.

Why should objects that are used every day by the business user have names that are incomprehensible?

The primary reason is that data objects are structured to optimize computer performance rather than to reflect the business structures. A common thing like a customer record quickly becomes split into five or ten or more tables in third normal form and business information users can no longer find anything.

The best solution to date has been to build supplementary data structures especially for user access and name all objects in those structures according to a centrally managed naming control system. The business users then only see a data mart with a small subset of all data objects and are usually able to learn the meaning of that subset of names.

A better solution for naming is to keep just one data structure for each real world object and give the attributes of that structure names that align with the business names. In this environment, every business entity is represented in a single table and each attribute occurs once and has a meaningful name. Each customer has one customer record and each product has one product record. All information access systems use these tables and access the same data using the same names. The data marts, star schemas and OLAP cubes of conventional systems are just ordinary queries in this environment.

This concept, as simple as it may seem, has a couple of issues that keep it from being implemented. The first issue is computer performance. The massive redundancy created by this structure would be crippling in an RDBMS environment. The solution for this issue is simply to use the Value Based Structure in the CDBMS. In this structure redundant data storage does not cause any problem and normalizing provides no gain.

The second problem is application design. Applications are still designed to interact with data stores the same way they did thirty years ago. New data storage capabilities provide functions that didn’t exist thirty years ago and applications should be upgraded to use the best that is available. However, that is a discussion for another day.

Add to Technorati Favorites      | Blogger Comments (0) |