When you’re using a data warehouse, some actions get repeated over and over. We will take a look at four common algorithms used to deal with these situations.Most DWH (data warehouse) systems today are on a RDBMS (relational database management system) platform. These databases (Oracle, DB2, or Microsoft SQL Server) are widely used, easy to work with, and mature – a very important thing to bear in mind when choosing a platform. Although they include special features for DWH configuration (like bitmap indexes, partitioning, and materialized views) we still have to take care of data transformation ourselves.
Mastering SQL analytical functions is a necessity for an aspiring BI/DWH professional. In this article, we’ll explore the history of SQL in a BI environment. We’ll also introduce some of the more common SQL analytical functions, including RANK, LEAD, LAG, SUM, and others.SQL can be used for business intelligence (BI) applications that combine data retrieval, analysis, and computation. These computations are more complex than what we see in OLTP systems. If you’re interested in getting into data warehousing (DWH) or business intelligence, you need to go beyond plain SQL and start digging into the analytical functions (also known as window functions, windowing functions, or OVER clauses).
Database schema migration is never an easy job. In fact, it can really be a headache, even when you’re working with a familiar system. For example, at times Oracle 10g may not drop the associated index for a primary key or unique constraint that has been dropped. In this article, I am going to explain when and why this happens.The Story:I’ve been working on the development of an e-commerce platform. Before every new platform version release, a migration script is prepared to move the database scheme and data from the old version to the new one. This is done by the database architect in his local environment. Then the migration script is tried out against a test environment that restores the production database and simulates real conditions. (Restoring the production database in the test environment is done using the Oracle 11g Data Pump tool.)
Having reference tables in your database is no big deal, right? You just need to tie a code or ID with a description for each reference type. But what if you literally have dozens and dozens of reference tables? Is there an alternative to the one-table-per-type approach? Read on to discover ageneric and extensibledatabase design for handling all your reference data.This unusual-looking diagram is a bird’s-eye view of a logical data model (LDM) containing all the reference types for an enterprise system. It’s from an educational institution, but it could apply to the data model of any kind of organization. The bigger the model, the more reference types you’re likely to uncover.
We’ve had tremendously positive feedback on my recent article that talked about “Why SQL is neither legacy, nor low-level, nor difficult, nor the wrong place for (business) data logic, but simply awesome” both within the blog’s comment section as well as on reddit .However, one of the sections triggered very controversial feedback. Clearly, not everyone agreed to:Fallacy #5: The database is the wrong place for business logicNow this is really the biggest of all fallacies. Who said that? Some guy from the 90s who had written a book on Object-Oriented Design Patterns? Of course, if you blindly follow random architecture rules, this may apply to you. But then, beware that you’re following 10-20 year-old “legacy” principles by not using SQL (see also
In the 3rdpost in this series, we looked at how we prepare data for use with a concept called the Business Data Vault. Now, in this final part, I will show you the basics of how we project the Business Vault and Raw DV tables into star schemas which form the basis for our Information Marts.Raw Data Mart vs. Information MartsAs of Data Vault 2.0, the terminology changed a bit to be more precise. With DV 2.0 we now speak of “
In my last post , we looked at the need for an Agile Data Engineering solution, issues with some of the current data warehouse modeling approaches, the history of data modeling in general, and Data Vault specifically. This time we get into the technical details of what the Data Vault Model looks like and how you build one.For my examples I will be using a simplyHuman Resources (HR)type model that most people should relate to (even if you have never worked with an HR model). In this post I will walk through how you get from the
Why Talk About Errors? Model Setup 1 – Using Invalid Names 2 – Insufficient Column Width 3 – Not Indexing Properly 4 – Not Considering Possible Volume or Traffic 5 – Ignoring Time Zones 6 – Missing Audit Trail 7 – Ignoring Collation Why Talk About Errors?The art of designing a good database is like swimming. It is relatively easy to start and difficult to master. If you want to learn to design databases, you should for sure have some theoretic background, like knowledge about database normal forms and transaction isolation levels. But you should also practice as much as possible, because the sad truth is that we learn most… by making errors.
Generating unique integers is a very common task in database systems. Many applications require each row in a given table to hold a unique value. One way to tackle this problem is to use sequences.What are Sequences?A sequence is a database object which allows users to generate unique integer values. The sequence is incremented every time a sequence number is generated. The incrementation occurs even if the transaction rolls back, which may result in gaps between numbers. Similarly, gaps may arise when two users increment the same sequence concurrently.
The concept of views and function-based indexes has been known for many years. One of the brand new solutions is a virtual column – a feature introduced in Oracle 11g. Apart from database giant, some well known DB vendors, like MariaDB and SQL Server, support the idea of computed columns. So let’s give virtual columns a try and examine their basic usage.Generally, there are two kinds of virtual columns:Virtualvirtual columnPersistentvirtual column
Anyone who had to schedule an intercontinental phone call knows that there is no such thing as a simpletimecallednow. What you should rather think about is a time comprised ofhere and now.The Earth rotates around its own axis. When it’ssolar noon(the sun is at its highest position) in one place, it’s already past noon in places to the east and it’s still before noon in places to the west.To make communication easier, at the end of the 19th century, the Earth was divided into 24 hour-wide
Oracle bases its language support on the values of parameters that begin with NLS. These parameters specify, for example, how to display currency or how the name of a day is spelled.The table below presents some of the NLS parameters. By using one of them, NLS_SORT, we can specify the sort method (binary or linguistic) for both SQL WHERE clause operations and NLSSORT function operations.To check the current NLS settings, type:
Many features have arisen throughout the evolution of the Oracle database. In this article, let’s focus on temporary tables. They were introduced fairly late (Oracle 8i) and are now often considered to be indispensable for DBAs and developers.DescriptionIt’s worth mentioning up front that the table itself is not temporary, but rather the data within it. The data in such a table is stored only as long as the session or transaction lasts and is private for each session, however the definition is visible to all sessions. Of course, after commit or disconnection, the data is lost but the definition of the table remains (it’s not necessary to perform many ddl operations – especially create table – which is a good practice). Likewise, other structures related to the temporary table like synonyms or views won’t disappear after the end of the transaction or session. Indexes created on a temporary table behave similarly.
Oracle’s Deviations From the RulesThe most famous offering from Oracle is its relational database, which also brings Oracle the most profits. As the second-largest software giant (after Microsoft), Oracle has some of its own conventions which may or may not be consistent with other generally accepted conventions. Maybe that’s the reason why so many don’t understand Oracle.First of all, Oracle doesn’t follow the SQL standard. In many computer science courses, students practice and learn the fundamentals of databases using an Oracle database. It just so happens that even I myself have seen a lecturer shrugging his shoulders when speaking about Oracle odd behaviours.
Date and time functions and formats are quite different in various databases. In this article, let’s review the most common functions that manipulates dates in an Oracle database.The functionSYSDATE()returns a 7 byte binary data element whose bytes represents:century,year,month,day,hour,minute,secondIt’s important to know thatselect sysdate from dualin SQL*Plus gives the same result asselect to_char(sysdate) from dualbecause SQL*Plus binds everything into character strings so it can print it. For more detailed explanation, look at
Synonyms are a very powerful feature of Oracle. They are auxiliary names that relate to other database objects: tables, procedures, views, etc. They work like Unix hard links; a pointer to point to an object that exists somewhere else.Synonyms can be created asPRIVATE(by default) orPUBLIC.public synonyms are available to all users in the database.private synonyms exist only in specific user schema (they are available only to a user and to grantees for the underlying object)
SQL has been around for decades and supports a many billion dollar market. However, many people still struggle with just how to pronounce the term SQL. Is it “S.Q.L” [ˈɛs kjuː ˈɛl] or is it “sequel” [ˈsiːkwəl]?SQL… Where it all startedLet’s start at the beginning.Relational databases came into existence with E.F. Codd’s 1970 publication “A Relational Model of Data for Large Shared Data Banks.” While Codd’s ideas were remarkable for the time, in San Jose, California, two colleagues named
The Vertabelo journey continues … We now have almost10,000users and the number of Vertabelo advocates keeps growing strong.Vertabelo users come from over100countries and speak various languages. What unites them?The relational database.Let’s see what relational databases they use:We wanted to determine the most popular database engine among Vertabelo users based on one of three widely-used operating systems:Windows,Linux,Mac OS.Guess what we find out?Among all supported popular databases (PostgreSQL, MySQL, Microsoft SQL Server, SQlite, Oracle, IBM DB2, HSQLDB),
The concept ofmaterialized views (MVs)is almost 15 years old; Oracle first introduced these views in the 8i version of its DBMS. However, some well known DB vendors (like MySQL) still don’t support MVs or have added this functionality only quite recently (it’s available in PostgreSQL since version 9.3, which was released just a year ago). In this article I’ll try to give you some tips about when you should use MVs in OLTP systems.
When looking at different kinds of ERD notations, it is hard not to come across Barker’s ERD notation, which is commonly used to describe data for Oracle. Richard Barker and his coworkers developed this ERD notation while working at the British consulting firm CACI around 1981, and when Barker joined Oracle, his notation was adopted.Let’s take a closer look at Barker’s syntax.The most important components in the ERD diagram are:entities, which can be thought as physical objects or elements that can be uniquely identified, and
Some time ago, the Vertabelo Team participated in the PostgreSQL Conference Europe 2013 . Some of the talks were really nice. One of them stuck in my head for quite a long time. It was Markus Winand’s lecture titled“Indexes: The neglected performance all-rounder.”Although I had had a solid background in databases, this 50 minutes long talk showed me that not everything concerning indexes was as clear to me as I had thought. This was the kind of lecture I like the most – when you sit and say to yourself “damn, I didn’t know that!”
Generally, we don’t limit query results. However, when we only care about the first few rows or to implement table pagination, limiting query results is just what we need. Database vendors provide us with such functionality; most of them in their own distinct way.ExampleLet’s take a look at the 2014 Sochi Olympics Men’s Normal Hill Individual ski jumping results in theskijump_resultstable. There is no index on theskijump_resultstable. The following queries are examples of a Top-N and pagination query, which I will use in the following parts of the article.
If you were to implement a Top-N or pagination query in an Oracle database, you wouldn’t find any dedicated clause to limit the query result like TOP, LIMIT or FETCH FIRST. For each row returned by the query, Oracle provides a ROWNUM pseudocolumn that returns a number indicating the order in which the database selects the row from a table or set of joined views.ExampleLet’s take a look at the
When you create a foreign key in your database, you can specify what happens upon delete of the parent row. There are usually four possibilities:ON DELETE SET NULLON DELETE CASCADEON DELETE NO ACTIONON DELETE RESTRICTToday we’ll investigate the subtle difference between the last two options.In Some Databases There Is No Difference at AllIn Oracle, there is no RESTRICT keyword . The only option is NO ACTION. In MySQL, there is
group_concat in MySQLMySQL has a very handy function which concatenates strings from a group into one string. For example, let’s take a look at thechildrentable with data about parents’ and children’s names.To get the names of children of each person as a comma-separated string, you use thegroup_concatfunctions as follows:The result:To make sure the names of the children in each string arealphabetically orderedand to use semicolon „;” as aseparator, use this query:
I like Oracle database. It is efficient, easy to use for beginners and professionals – its tools for query analysis and optimization are masterpieces. But it has a very annoying “feature” – empty text (containing 0 characters) is stored in the database as null.Where did such a feature come from?Probably from the ancient ages, just after the dawn of time, i.e., the late 70’s  . In that age, memory (RAM and disks) was very limited and system designers did their best to use as little memory as possible. In old versions of Oracle DB,
Recently a fellow database architect claimed that in Oracle the typeVARCHAR2(255)means a string of 255bytes, not characters. There is not much difference between the two in the English-speaking world. It matters though if you want to handle people with names like Kołłątaj . (Not that Hugo Kołłątaj – a famous Polish 18th century politician – would ever use any of our systems, but he became our byword for all non-pure-ASCII names). I was very surprised by what the architect said and I decided to further investigate the matter.
The question: Why doesn’t my script create tables?The other day I was testing Oracle SQL scripts generated by Vertabelo. Roughly, this is the code that was generated:I used sqlplus to execute my script and see if it’s correct.The script run without errors but the tables where NOT created.I copied and pasted part of the script into a new file:I executed it in sqlplus and this time the table was created.OK, I scratched my head. What is the difference between the two? The only real difference is the blank line at the end of the command. Could it REALLY be the reason?