The concept of materialized views (MVs) is almost 15 years old; Oracle first introduced these views in the 8i version of its DBMS. However, some well known DB vendors (like MySQL) still don’t support MVs or have added this functionality only quite recently (it’s available in PostgreSQL since version 9.3, which was released just a year ago). In this article I’ll try to give you some tips about when you should use MVs in OLTP systems.
What are materialized views?
You may think of materialized view as a snapshot of query results.
Views (not materialized) are “named queries” which make it easier and more elegant to run complex queries. They do not affect the execution time of queries – the original query (from view definition) is run each time you fetch some data from a view.
The materialized view contains a copy of the query results from a single point in time. This copy may be refreshed – the new results replace the old ones. In different DBMSs the refresh process may be automated or must be invoked manually. What’s more, you can even create indexes on a materialized view’s columns.
The ability to query from a single source that contains the data from a much more complicated structure, allows you to eliminate the cost of joins, multiple table accesses, etc. But, on the other hand, the results may be out-of-date. This is the trade-off between performance and accuracy. That’s why materialized views are not all-purpose mechanisms, but rather specialized tools that must be used consciously, with all the side-effects in mind.
When should you use a materialized view?
Rather obvious domains where materialized views are used are data warehouses and OLAP environments. In fact, these are the origins of materialized views. These implementations don’t require any further comment.
But materialized views are also quite commonly used in OLTP systems where the response time is a very important factor. Users won’t wait more than a few seconds for the output, even though a request may require a very complicated query to the DB. In such a case you may do some programming and design tricks, but the most common are:
- redesign the system and eliminate those “tough” queries;
- cache the results of such queries;
- use materialized views.
The first solution is almost never achievable. The business requirements are untouchable and often include the mandate that “it must work like that but faster.”
In the second scenario, you must identify all processes in the system that modify any resulted data and make sure they invalidate the cache. This also has one important drawback – the first query after cache invalidation will have to populate the cache, so the user will have to wait for this time-consuming query to finish. If the cache is invalidated too often, you’ll get almost no performance gain in introducing it.
Finally, you may use materialized views. You may do this only if it is acceptable that the data shown to the user is not the most recent (it may be one-hour-old, for example). You don’t need to change anything in your application code – just change a view to a materialized view and handle its refresh process (which in Oracle may be done at MV creation step). That’s all.
A real-life example
Imagine that you work on a project for a financial institution. One part of the web application is a module where you need to show the summary of the historical cash-flow on the user’s private accounts. There are a few kinds of operations that change the balance and the summary must include all of them. These are historical data, so there are millions of records in different tables.
In the first step you create a view in order to fetch all required data from single source at once. The result is pretty scary: 170 lines of pure SQL, 27 columns, 3 “UNION ALLs,” about 30 joined tables... It cannot work smoothly.
If you wanted to optimize the view with cache but always ensure that the view contains the most recent data, using cache wouldn’t be a good idea. There are a lot of transactions taking place in real-time so the cache would instantly get invalidated.
Since we need to present reports of historical data, the lag of one or two hours doesn’t make a big difference to the user. Its purpose is to show the trend and aggregated results, not the current transactions (there is another module showing those things). Well... this is a perfect use case for materialized view! This is a combination of an OLTP system with an analytical part.
Pros and cons?
+ data updates take place in the external process so it doesn’t affect users’ requests;
+ all data is in one place;
+ data is indexed, which make the queries very fast;
+ (in case of Oracle) data updates are automatic;
– data is outdated;
– (in case of PostgreSQL) you need to automate the process of data updates by yourself.
A bunch of useful links
If you want to give materialized views a try, you should take a look at the syntax details. Here you can find links to the documentation of the most common DBMSs supporting MV (or mechanisms similar to that):