Queries From Hell
A blog about data warehousing, correlation databases, associative and incremental queries, value-based storage, metadata, on-the-fly indexing, automatic data-driven schemas, BI tools, data mining, visual mapping, pattern recognition, and the limitations of standard SQL in answering "queries from Hell."
Or, how to discover what you don't know you don't know.

What's in a Name?

Posted at 5/15/2008 06:04:00 AM
Naming of data objects has always been an interesting issue. Long ago, programmers named files and fields arbitrarily, at their own whim. Later, most IT departments standardized naming rules. Now there are industry standard naming conventions like International Standard ISO/IEC 11179-5 and complete naming systems like the Dublin Core Metadata Initiative. Yet most business users today could not even begin to interpret the meaning of the names used for the hundreds or even thousands of tables and the multiple thousands of columns in an enterprise information management infrastructure. These names identify the information they use every day but the names are so unintelligible that no one could interpret them without a written standard.

Why should objects that are used every day by the business user have names that are incomprehensible?

The primary reason is that data objects are structured to optimize computer performance rather than to reflect the business structures. A common thing like a customer record quickly becomes split into five or ten or more tables in third normal form and business information users can no longer find anything.

The best solution to date has been to build supplementary data structures especially for user access and name all objects in those structures according to a centrally managed naming control system. The business users then only see a data mart with a small subset of all data objects and are usually able to learn the meaning of that subset of names.

A better solution for naming is to keep just one data structure for each real world object and give the attributes of that structure names that align with the business names. In this environment, every business entity is represented in a single table and each attribute occurs once and has a meaningful name. Each customer has one customer record and each product has one product record. All information access systems use these tables and access the same data using the same names. The data marts, star schemas and OLAP cubes of conventional systems are just ordinary queries in this environment.

This concept, as simple as it may seem, has a couple of issues that keep it from being implemented. The first issue is computer performance. The massive redundancy created by this structure would be crippling in an RDBMS environment. The solution for this issue is simply to use the Value Based Structure in the CDBMS. In this structure redundant data storage does not cause any problem and normalizing provides no gain.

The second problem is application design. Applications are still designed to interact with data stores the same way they did thirty years ago. New data storage capabilities provide functions that didn’t exist thirty years ago and applications should be upgraded to use the best that is available. However, that is a discussion for another day.

Add to Technorati Favorites      | Blogger Comments (0) |