A good data modeling exercise for beginners is to create a data model of an online store. Every time I give this exercise to my students, I’m surprised at how difficult it is for them.
Find the Concepts...
Let’s see how it can be done. We know we have to create a table for every concept in the domain. Think about the nouns and noun phrases you would use to describe the domain. Roughly, every noun is either a concept, an attribute of a concept, or an example. What are the basic concepts in an online store? Two words immediately come to mind:
- customers – people who buy things in our store, and
- products – items people buy in our store.
Every customer has a basic set of data which describe them: id (you usually need an id attribute in your table), name, email, and password. Similarly, a product has an id and a name. We could add more attributes for customers and products but for the sake of this example, these will do. We add the two tables into our model.
... As Well as the Abstract Concepts
This is a store, so obviously, we want to know what was ordered and by whom. “Order” is a keyword in most databases, so we shouldn’t use it for a table name. Instead, we’ll use the name
purchase for the third table in our model. The table has to be somehow connected to the
customer and to the
product. For a start, let’s just draw a reference between
customer, and between
customer-purchase reference is OK. Every purchase is done by one customer; every customer can make several purchases. This reference is here to stay.
However, there is something wrong with the
purchase-product reference. Several products can be bought in one purchase; several purchases can include the same product. But our reference only allows one product to be bought in a single purchase. Let’s delete the reference and think about a different way to model it.
One Big Text Field for All Purchased Products?
How about we add a big text field that can store the names or ids of purchased products? Now we can buy several products in one purchase. However, there are some problems here:
- First, it’s difficult to check that the product in the
purchased_itemsfield is really in the database.
- Second, if you want to change the name of the product (because you misspelled it) then you have to update all
purchased_itemsfield instances in the
- Finally, it’s difficult to analyse data in the database. For example, if you want to find out which product is bought most often, you need to use a text substring operation. And that’s never very efficient.
Several Product Columns in the Purchase Table?
What are some other options? We want a purchase to be connected to several products so maybe we should add several
purchase_item columns in a purchase table? Well, that’s tiresome (I only added 5 columns and got tired) and creates an artificial and stupid limit on the number of products bought.
Use an Intermediate Table!
The silly solution hints at the right solution. We want to have an unlimited number of products connected to the purchase. The only way is to have an intermediate connecting table. Let’s call it
purchase_item table is connected with
product. Now a purchase can include as many products as we want. As a bonus we can add additional data in the table: number of times bought, total price for this item, and so on.
- Tables in the model can represent not only physical objects like customer or product. Tables can represent more abstract concepts like a purchase. Other examples might be a booking in a hotel reservation system, a book_loan in a model for library, an appointment in a system for medical doctors, etc.
- When you model a transaction (i.e., buying or selling of many things), you usually need three tables: one for the transaction (purchase, or booking in a hotel reservation system), one for things bought/sold in a transaction (product, hotel room), and one for transaction items (purchase_item, booking_item). You can add additional information in the intermediate table if you need to.