In the previous two parts, we’ve presented the live database model for a subscription-based business and a data warehouse (DWH) we could use for reporting. While it’s obvious that they should work together, there was no connection between these two models. Today, we’ll take that next step and write the code to transfer data from the live database into our DWH.The Data ModelsBefore we dive into the code, let’s remind ourselves of the two models we’ll work with. First is the transactional data model we’ll use to store our real-time data. Taking into account that we run a subscription-based business, we’ll need to store customer and subscription details, customers’ orders, and the order statuses.
Can you design an OLAP database model from an OLTP model? In this article, we’ll show you how!This is the second article of our data warehouse (DWH) series. You can find the first one here . The idea behind the series is to start with the OLTP (Online Transaction Processing) database model, present a possible solution for the reporting/OLAP (Online Analytical Processing) data model, and then finally consider the code we’ll use to perform the ETL process.
Welcome to a new series that shows you the practical side of the data warehouse (DWH)! In the first article, we’ll tackle a data model for a subscription business.In previous data warehouse articles ( The Star Schema , The Snowflake Schema , Star Schema vs. Snowflake Schema ) we focused more on the theory. In this series, we’ll show you how you could create a data warehouse for a real-life application, such as a database model. Today, we’ll take a look at the data model behind a subscription-based business. In upcoming articles, we’ll build a DWH and the code that makes the magic work.
When we start a data warehousing project, the first thing we do is define the dimensional tables. Dimensional tables are the interesting bits, the framework around which we build our measurements. They come in many shapes and sizes. In this article, we are going to take a closer look at each type of dimensional table.Dimensional tables provide context to the business processes we wish to measure. They tell us all we need to know about an event. Since they give substance to the measurements (i.e. fact tables) of the data warehouse (DWH) system, we spend more time on their definition and identification than on any other aspect of the project. Fact tables
The process of defining your data warehousing system (DWH) has started. You’ve outlined the relevant dimension tables, which tie to the business requirements. These tables definewhatwe weigh, observe and scale. Now we need to definehowwe measure.Fact tables are where we store these measurements. They hold business data that can be aggregated across dimension combinations. But the fact is that fact tables are not so easily described – they have flavors of their own. In this article, we’ll answer some basic questions about fact tables, and examine the pros and cons of each type.
Financial institutions, especially banks, usually have really large datasets. To use that data, it must be stored in such a way that it is easily available for generating reports. The trend now is to use a data warehouse to store all your relevant data, and to use smaller data marts (subsets of the warehouse) to keep specific data sets in a convenient place.But where to start? In this article, we’ll look at one possible solution, similar to a project I worked on in the past. While we implemented a different approach, the underlying idea was very similar.
In the previous two articles, we considered the two most common data warehouse models: the star schema and the snowflake schema. Today, we’ll examine the differences between these two schemas and we’ll explain when it’s better to use one or the other.The star schema and the snowflake schema are ways to organize data marts or entire data warehouses using relational databases. Both of them usedimension tablesto describe data aggregated in a
In a previous article we discussed the star schema model. The snowflake schema is next to the star schema in terms of its importance in data warehouse modeling. It was developed out of the star schema, and it offers some advantages over its predecessor. But these advantages come at a cost. In this article, we’ll discuss when and how to use the snowflake schema.The Snowflake SchemaThe snowflake schema’s name comes from the fact that dimension tables branch out and look something like a snowflake. When we look at the model above, we’ll notice it’s a fact table surrounded by a few dimension tables, some of which do the aforementioned branching. Unlike the star schema, dimension tables in the snowflake schema can have their own categories.
Today, reports and analytics are almost as important as core business. Reports can be built out of your live data; often this approach will do the trick for small- and medium-sized companies without lots of data. But when things get bigger – or the amount of data starts increasing dramatically – it’s time to think about separating your operational and reporting systems.Before we tackle basic data modeling, we need some background on the systems involved. We can roughly divide systems in two categories: operational and reporting systems. Operational systems are often called Online Transaction Processing (OLTP). Reporting and analytical systems are referred to as Online Analytical Processing (OLAP). OLTP systems support business processes. They work with “live” operational data, are highly normalized, and react very quickly to user actions. On the other hand, the primary purpose of the OLAP systems is analytics. These systems use summarized data, which is usually placed in a denormalized data warehousing structure like the star schema. (What is denormalization? Simply put, it’s having redundant data records for the sake of better performance.
There’s a lot to keep in mind when you’re designing a database, and very few of us can remember every valuable tip and trick we’ve learned. So, let’s take a look at some online resources that feature database design tips and best practices. As we go, I’ll share my own opinions on the ideas presented, based on my experience in database design.Obviously, this article is not an exhaustive list, but I’ve tried to review and comment on a cross section of sources. Hopefully, you’ll find the information that best suits your needs and goals.
In my last post , we looked at the need for an Agile Data Engineering solution, issues with some of the current data warehouse modeling approaches, the history of data modeling in general, and Data Vault specifically. This time we get into the technical details of what the Data Vault Model looks like and how you build one.For my examples I will be using a simplyHuman Resources (HR)type model that most people should relate to (even if you have never worked with an HR model). In this post I will walk through how you get from the
The world is changing.No –the world as we knew it in IThaschanged.Big Data & Agile are hot topics.But companies still need tocollect,report, andanalyzetheir data. Usually this requires some form of data warehousing or business intelligence system. So how do we do that in the modern IT landscape in a way that allows us to beagileand either deal directly or indirectly with unstructured and semi structured data?First off, we need to change our evil ways – we can no longer afford to take years to deliver data to the business. We cannot spend months doing detailed analysis to develop use cases and detailed specification documents. Then spend months building enterprise-scale data models only to deploy them and find out the source systems changed and the models have no place to hold the now-relevant data critical to business success.