In the previous two articles, we considered the two most common data warehouse models: the star schema and the snowflake schema. Today, we’ll examine the differences between these two schemas and we’ll explain when it’s better to use one or the other.The star schema and the snowflake schema are ways to organize data marts or entire data warehouses using relational databases. Both of them usedimension tablesto describe data aggregated in a
Databases are designed in different ways. Most of the time we can use “school examples”: normalize the database and everything will work just fine. But there are situations that will require another approach. We can remove references to gain more flexibility. But what if we have to improve performance when everything was done by the book? In that case, denormalization is a technique that we should consider. In this article, we’ll discuss the benefits and disadvantages of denormalization and what situations may warrant it.
A Unified View on Database Normal Forms: From the Boyce-Codd Normal Form to the Second Normal Form (2NF, 3NF, BCNF)
Normal forms for relations is a required topic of a database curriculum. Besides avoiding anomalies, which is already a big issue ¹ , knowing them certainly helps to understand what is going on in your or someone else’s database design. Even if at some point you decide to abandon a normal form, you should know what you are doing and how to pay a price for that.Here, I want to discuss normal forms up to Boyce-Codd Normal Form (that is 2NF, 3NF and BCNF). The reason I want to pack together so many normal forms is that they can be understood better collectively: when you have them packed you can easily see the differences. In some aspects I will follow the approach from a recent book by C. J. Date,
The First Normal Form (1NF)is exceptional. The other normal forms (2NF, 3NF, BCNF) talk about functional dependencies and 1NF has nothing to do with functional dependencies. Moreover, we have precise definitions for other normal forms and there is no generally accepted definition of 1NF.Does 1NF Equate to “Atomic Attributes”?When you look at various descriptions of 1NF the word that you see most often isatomic. It is common to say that a relation is in 1NF if all its attributes are atomic. A good, theory oriented book by C.J. Date (
When you read about normalization you usually get the set of conditions that a database in the nᵗʰ normal form should satisfy and the set of rules, a sort of a cook-book, for obtaining that normal form. But why these rules are safe to apply to your denormalized relations is a skip material. Here, I would like to present some elementary concepts on how we decompose relations and what can go wrong.The concepts I want to present may seem a bit complicated but, on the other hand, they are so basic that you rarely see it written. Still, they are essential for understanding how the normalization process works. One may prepare a perfect meal just by following a recipe but no one can master the art of cooking without understanding what’s going on behind-the-scenes. The same holds true with databases.
Why do you need all of this normalization stuff? The main goal is toavoid redundancyin your data. Redundancy can lead to various anomalies when you modify your data .Every fact should be stored only onceand you should know where to look for each fact. The normalization process brings order to your filing cabinet. You decide to conform to certain rules where each fact is stored.Nowadays the go-to normal forms are either the
Today we continue our series of posts on data normalization. In the previous post on data normalization I explained what functional dependency is. Today we will talk aboutcandidate keysin a table.A candidate key is a set of columns such thatall other columns in the table are dependent on it, and the set isminimal, that is if you remove a column, then the resulting set is not a candidate key.Example: Table Person
Do you remember the post about update anomalies ? I promised you we’d explain how to design tables which have no update anomalies. So here we go!Today we begin a series of posts on data normalization. We will talk aboutfunctional dependencies, a concept that needs to be explained before we dive deeply into database schema normalization.The subject is rather abstract and theoretical but I will try to restrain myself from going too deep into mathematics. I will try to keep things simple and down-to-earth. (The operative word being try ;) )
Let’s take a look at the following table:What’s wrong with this table? It’s difficult to modify data in it. Upon modification, several anomalies can occur:Insert anomaliesIt’s impossible to insert a product into the table if the product hasn’t been bought by a customer yet. Similarly, it’s impossible to insert a customer who hasn’t made a purchase yet.Update anomaliesIt’s difficult to update data in the table. If you want to change the name of the product, you have to update all rows where the product is bought. You cannot change the price of the product for all future purchases.
A real-life example:Let’s assume that we have a system which stores data fordistributorswho sell products manufactured by some company. Each distributor receives points for selling products and may redeem a specified number of points to obtain a discount or extra bonus. Points are calculated byan external systemand updatedvery frequently– after processing of each order placed by the distributor himself, a member of the distributor group, or a customer assigned to the distributor. The distributor is identified using a distributor number. We are using a DB2 database.