As I mentioned in my article “OLAP for OLTP practitioners”, I am working on a project that needs to create an analytical database for on-line analytical processing (OLAP). I have mostly worked with on-line transaction processing (OLTP) with some limited reporting features. OLAP is a new area for me. In OLAP, the main focus of the database itself is simply to store data for analysis; there is limited maintenance of data. In the previous article, I focused on the differences in requirements between the OLTP world and the OLAP world and, in particular, the OLTP focus on normalization and the OLAP focus on de-normalization to avoid joins of tables.
Obviously, I cannot attempt to explain all of the possible design patterns which are used in OLAP, but I wanted to describe one of them so that if you, like me, need to work with transactional and analytical information, you would have some guidance.
When working with analytical databases, a common design pattern is to use fact and dimension tables in a star schema (for more information see my article “OLAP for OLTP practitioners” and numerous online resources). But there are other design patterns that can be interesting to consider.
The challenge with models based on OLTP is that they are not compatible with OLAP. Similarly models for OLAP do not work well for OLTP. Finally, it would be nice to work with a Big Data solution, but these do have a single standard format and the way in which data is stored is not compatible with OLTP or OLAP.
In our project, we are investigating the usage of the spider schema, which we found on the Internet, as we would like to have as many of the advantages of these OLTP and OLAP techniques, with an opening for Big Data usage in the future. This article looks at aspects of the spider schema as it claims to support OLTP, OLAP and Big Data analysis. Below, I describe our understanding of the spider schema.
Designing for Future Requirements
Our goal is to store data in such a way that it can be efficiently retrieved and used for further processing or for analysis. When we store the data, we must also store the relationships between the data. As I understand it, the spider schema turns typical relationship design around in that the relationships between the data are stored with the data rather than being pre-defined as database relationships as we would expect in a relational database model for OLTP. In this way, the data set contains the information (relationships) which describe the data. The relationships between the data are stored within a separate table: a Relationship Table.
The Relationship Table is then the center of the data set and, like a spider, relates to each of the tables in which data elements are stored, making the relationship between each entity.
I have adapted and simplified for this article the spider schema example that can be found at www.spider-schema.info, which is a simplified model of a portion of an Enterprise Resource Planning (ERP) system: storing information about products, orders (transactions), customers (buyers), suppliers (sellers) and employees.
As an example, when linking an order (Transaction) to the data associated with it (products in the order, buyer of the order, shipping address, billing address), the transaction reaches associated data via the relationship references stored in the Relationship table rather than being directly linked to the other tables by foreign keys from table to table.
The spider schema appears to be different from a typical OLTP design as the foreign keys to the transactional data are stored in the Relationship table rather than in the transaction tables themselves. This eases the storage of historical data as a new entry can be created in the appropriate table while the relationship is updated in the central Relationship table. For example, if we enter a new Billing Address when a customer’s Billing Address changes the relationship between Buyer and Billing Address is changed in the central Relationship table with the effective dates modelled out into the Date table. Therefore, I always have the billing address of the buyer which was valid on a particular date in the past. Also, the Relationship Table is related to all data sets.
Data does not need to be modeled into specific data sets for specific uses for OLAP with fact-dimension tables; the spider schema would allow analytic data sets to be built from the underlying data set. For example, to determine “What was the total value of all orders shipped to Germany during each month of 2014?”, we select the orders from 2014 and the corresponding shipping addresses with Country = Germany.
When the data is normalized and modeled into the spider schema, the data already satisfies the Transactional needs and the Analytical needs. No other systems or processes are needed to use the existing data in OLAP.
Working on a project to create an application that is focused on analytics, reporting, and especially forecasting, has forced me to look at alternatives to my standard database designs for OLTP and to
What do you think about spider schema?
Have you had any experience with it?
I would be interested to hear your experiences about working with the spider schema and other alternatives to bridge between OLTP and OLAP.