A real-life example:
Let’s assume that we have a system which stores data for distributors who sell products manufactured by some company. Each distributor receives points for selling products and may redeem a specified number of points to obtain a discount or extra bonus. Points are calculated by an external system and updated very frequently – after processing of each order placed by the distributor himself, a member of the distributor group, or a customer assigned to the distributor. The distributor is identified using a distributor number. We are using a DB2 database.
We need to design a database which stores basic data about the distributor and points earned and redeemed. The first attempt is simple:
We have all data in one table and we made sure that it meets third normal form.
But when the system grew, we noticed that each record in the distributor table was updated very frequently. Data from this table was read every time a user logs into the system, and those reads were locked by updates (differences between database engines is another story). Because of these locks, request processing was slow. The problem was caused by mixing the following two types of data in one table:
- data updated very frequently but read rarely (points)
- data updated rarely but read frequently (personal data)
In fact, by mixing different data, we made a hidden dependency (through a data layer) between parts of the system. Frequent points updates impacted performance in other parts of the system not related to points.
To fix the problem we made changes in the data model:
Now, data with different update frequencies are separated. But still, when points updates are provided by the external system, we need to execute two SQL queries: one to get the distributor ID using the distributor number, and one to update points. Please note that the database still meets third normal form, there is no de-normalization!
Finally, after many changes, we ended up with following structure. Since points are now updated by an external system using direct database access, there is no connection between the two tables. The external system can provide information about distributor’s points before a related record will be added in the distributor table, so there is no foreign key. Such “orphaned records” in the distributor_points table do not harm the system, they are just never displayed. Please note that this data structure still meets third normal form.
- When your system is simple and there is low traffic, third simple form is enough.
- When your system is complicated and heavily loaded, then you should analyze how the system reads from and writes to your database and separate data based on the frequency of writes and reads (in general: data that have different natures).
- The fun starts when you have about 1000 transactions per second :).