The following fallacies are things that I hear all the time.
– Timeless. E.g. by someone who thinks that NoSQL databases are “modern”.
– Timeless. Someone who thinks that SQL is low-level.
In this article, let me clear up some common SQL prejudices and other fallacies. SQL is neither legacy, nor low-level. Here’s why:
Fallacy #1: SQL is legacy
The reason why many people think that SQL is legacy is because it’s an ISO/IEC standard, which is more than 30 years old now. Furthermore, it is based on relational algebra, a theory that is even older than SQL itself. When these technologies were implemented, most software ran on huge (for the time) mainframe machines, written in eerie languages like COBOL and FORTRAN. And indeed, the SQL syntax was inspired by the languages of the time. Everything written in UPPER CASE, tons of keywords, etc.
But let’s be frank, is your love for curly braces so intense that you let yourself believe that syntax is so important?
Modern SQL can do a lot of awesome things that few developers of general-purpose languages like Java, C#, PHP are unaware of. I’ll show you an example further down.
Fallacy #2 SQL is low-level
Agreed. Assembler is low level. With assembler, you’re instructing the machine to do exactly the sort of memory and register access that you tell it to. You access peripheral devices (e.g., the disk) via interrupts and other low level message systems. How come people compare assembler to SQL?
SQL is the highest-level language that has ever made it into mainstream production. SQL is the only really popular fourth generation programming language (being at the same time Turing-complete since SQL:1999 and the introduction of common table expressions).
SQL is the only really popular declarative programming language, which allows you to abstract over algorithms even better than any language that implements the functional programming paradigm.
With SQL, you tell the machine only what properties the desired data has, and how it relates to other data. How to retrieve this data is entirely left to the query planner, which generates hard-to-beat execution plans in the most sophisticated databases. Try hand-tuning a query in Oracle. You’ll always lose against Oracle’s Cost-Based Optimiser.
Fallacy #3: NoSQL is “modern”
Another fallacy is the idea that NoSQL is “modern”. More “modern” than SQL. This is how NoSQL database vendors like to present themselves, obviously, but it couldn’t be further from the truth. If you think that a (hierarchical) JSON data store is really modern, just look back a little bit in time and remember the following books (which no one reads these days anymore):
and the Semantic Web
by Bhavani Thuraisingham
Succeeding with Object Databases:
A Practical Look at Today’s Implementations
with Java and XML
by Akmal B. Chaudhri & Roberto Zicari
I claim that objects = XML = JSON. All of these data structures are hierarchical data structures, with an optional node identity feature.
People have tried to topple the relational model time and again. Perhaps the relational model isn’t perfectly well-suited for all data representations. But relational databases have always proved that they can adapt. Popular vendors have integrated objects (as ORDBMS), XML, and now, JSON. They profit from the fact that RDBMS are not strictly tied to relational algebra, but to SQL, which is still the very best query and bulk data processing language out there, even for semi-structured data.
In fact, SQL (and the relational model and algebra) were designed precisely because of the mess that existed before SQL. Read this interesting article to learn more: Codd’s Relational Vision – Has NoSQL come full circle?
Today, most big data database vendors are again betting on SQL as a query language, regardless of the underlying storage structure. Mark Madsen has displayed this very nicely at Strata Conf:
Let’s see an example
Enough of the fallacies, let’s look into why SQL is so awesome!
Let’s assume you work in a bank and your database structure looks like this:
You want to calculate a running total over your bank account transactions. You may have the following data set, where
BALANCE is a column that you want to calculate for each and every transaction:
You could, of course, write the algorithm in Java (or C#, PHP, etc.). At the end of this example, I’ll show you why a SQL-based solution is definitely superior to an imperative one.
But let’s focus on the SQL example first. How do you even calculate a running total? It’s simple, from a conceptual point of view:
The balance of each row can be calculated by adding the amount of the same row to the balance of the row below. This will calculate balances from top to bottom.
Inversely, the balance of each row can be calculated by subtracting the amount of the row above from the balance of the row above.
That’s it. A running total.
This can be achieved very easily in SQL as follows:
SUM(t.amount) OVER ( PARTITION BY t.account_id ORDER BY t.value_date DESC, t.id DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING )
The above is called a “window function” (arguably SQL’s most awesome feature). This means that for every row that we select in the SELECT clause, we calculate some function over all the data within a window, which is sliding relative to that very row.
So, when calculating the balance at the second position, we can sum up all the amounts over the window of rows that is:
- … in the same PARTITION as the current row, i.e. they have the same
- … ORDERED BY date (and within the same date by id), descendingly
- … limited to all the ROWS up until the PRECEDING row.
This is the amount that we can subtract from the current balance, i.e. the balance at the top.
The full query might look something like this:
SELECT t.*, t.current_balance - NVL( SUM(t.amount) OVER ( PARTITION by t.account_id ORDER BY t.value_date DESC, t.id DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ), 0) AS balance FROM v_transactions t WHERE t.account_id = 1 ORDER BY t.value_date DESC, t.id DESC
And the result can be explained, again, in this picture:
The fourth balance is equal to the current balance (on top) minus the sum of all the amounts above the fourth balance.
If the syntax is a bit overwhelming at first, let me reassure you that window functions are really very easy to understand. The above is all there is to it. Granted, there are a lot of keywords, which may scare you a little at first, if you’ve never seen them. But think of keywords as Java methods (just like we do in jOOQ). Here’s the example written in jOOQ, which allows you to write type-checked SQL directly in Java:
DSL.using(connection) .select(t.VALUE_DATE, t.AMOUNT, t.CURRENT_BALANCE.sub( sum(t.AMOUNT).over( partitionBy(t.ACCOUNT_ID) .orderBy (t.VALUE_DATE.desc(), t.ID .desc()) .rowsBetweenUnboundedPreceding() .andPreceding(1) ) ).nvl(0).as("balance")) .from (V_TRANSACTIONS.as("t")) .where (t.ACCOUNT_ID.eq(1)) .orderBy(t.VALUE_DATE.desc(), t.ID .desc())
Window functions are just a regular “API call” to easy-to-understand functionality.
And there’s more! Let’s get back to more fallacies:
Fallacy #4: OK, you’re a SQL person. But for me, it’s easier to write this in Java
Don’t be scared! And no, it’s not easier to write this in Java. The above few lines of SQL are all there is to calculating a running total. What if you have to add a running total of all the credit items only? And a total of all the debit items? And an average monthly turnover? Just a few lines of SQL, just a few additional window functions. If you have to write this in Java, you’ll have to store intermediary sums, run complex, nested loops, cache data from the database, perhaps, etc.
And there’s worse! Let me tell you a true story. Business calling:
Think about getting the salary booked only in the evening, but spending all the money already in the morning.
With SQL? Easy as pie. Just add a single additional line to your window function’s ORDER BY clause, and you’re set:
SUM(amount) OVER ( PARTITION BY account_id ORDER BY value_date DESC, SIGN(amount) ASC, id DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING )
Intra-day, we will now order transactions by the SIGN(amount). (Don’t forget to also display the transactions in that order).
With Java? OMG. OK, we have stored and cached all the balances in the presentation server, this has to change now, but in the admin server for bank personnel, we have actually implemented this faster using webservices, and oh crap. What happens if the user applies a text search on his transactions? We’ll have to be sure that the balance is still correct, and blah blah. Good luck with that refactoring (probably not integration tested).
Fallacy #5: The database is the wrong place for business logic
Now this is really the biggest of all fallacies. Who said that? Some guy from the 90s who had written a book on Object-Oriented Design Patterns? Of course, if you blindly follow random architecture rules, this may apply to you. But then, beware that you’re following 10-20 year-old “legacy” principles by not using SQL (see also Fallacy #1).
The truth is, your database is actually the best place to do such calculations. It has many many bits of information that help it calculate the above extremely fast, with no additional cost – all in memory. It has constraints, indexes, and all sorts of other meta data to optimise this query in a way that you probably wouldn’t have achieved in pure Java code. Certainly not with only 5 lines.
And another huge advantage of this approach is that the new business rule is implemented only at a single place, if you’re doing it right: In a database view that you can join to all the other tables that need access to transactions / balances. This view can now be added transparently to all applications, including the E-Banking system, the E-Document output system, the Perl scripts that need to send audit information to the tax bureau, etc.
I dare you to find a better, more reusable solution in Java (or C#, PHP, etc.)
I hope this article has proven to you that:
- SQL can do much much more than you probably knew
- SQL is much more advanced than many other technologies
- SQL is the perfect place to implement data logic
And the final fallacy:
Fallacy #6: Yeah, but we don’t want vendor-lockin
You don’t have to! Window functions (and many other awesome SQL features) are standardised since SQL:2003! You can re-use the same SQL code in DB2, Firebird, Oracle, PostgreSQL, SQL Server, Sybase SQL Anywhere, and many more!
Many of the above fallacies are the reason why SQL is underestimated. It’s time we move on and realize that with SQL, we can be incredibly productive and write awesome data logic in only a few lines of code.
Now, you have a great opportunity to learn SQL – just enroll in the free SQL Queries course at the newly released LearnSQL!
Happy SQL learning & coding!