We’ve had tremendously positive feedback on my recent article that talked about “Why SQL is neither legacy, nor low-level, nor difficult, nor the wrong place for (business) data logic, but simply awesome” both within the blog’s comment section as well as on reddit.
However, one of the sections triggered very controversial feedback. Clearly, not everyone agreed to:
Fallacy #5: The database is the wrong place for business logic
Now this is really the biggest of all fallacies. Who said that? Some guy from the 90s who had written a book on Object-Oriented Design Patterns? Of course, if you blindly follow random architecture rules, this may apply to you. But then, beware that you’re following 10-20 year-old “legacy” principles by not using SQL (see also Fallacy #1).
The truth is, your database is actually the best place to do such calculations. It has many many bits of information that help it calculate the above extremely fast, with no additional cost – all in memory. It has constraints, indexes, and all sorts of other meta data to optimise this query in a way that you probably wouldn’t have achieved in pure Java code. Certainly not with only 5 lines.
And another huge advantage of this approach is that the new business rule is implemented only at a single place, if you’re doing it right: In a database view that you can join to all the other tables that need access to transactions / balances. This view can now be added transparently to all applications, including the E-Banking system, the E-Document output system, the Perl scripts that need to send audit information to the tax bureau, etc.
I dare you to find a better, more reusable solution in Java (or C#, PHP, etc.)
Here is some criticism from reddit:
I don’t know how many times upgrading an app has been sidetracked by business logic hidden in esoteric procs. – by /u/federal_employee
No sorry. I’m not going to read your article, because SQL is absolutely not the place for business logic. It’s not even an opinion, it’s a fact. – by /u/kamiikoneko
The ideas are good in theory. Unfortunately modern SQL databases are not capable to process business logic. – by /u/geggo98
How the hell do you unit test your business logic when it’s in the database? – by /u/Rockytriton
Why is this such a controversial topic? In what situations is the database the right place, and in what situations isn’t it the right place for such logic?
What *IS* Business Logic, in the First Place?
Before cargo culting one or the other philosophy, we should first have a clear picture about what business logic really is.
Wikipedia defines business logic as follows:
In computer software, business logic or domain logic is the part of the program that encodes the real-world business rules that determine how data can be created, displayed, stored, and changed. It is contrasted with the remainder of the software that might be concerned with lower-level details of managing a database or displaying the user interface, system infrastructure, or generally connecting various parts of the program.
In other words, business logic is all the “interesting” logic that is not boring glue code or infrastructure code.
Business Logic in a Classic, Three Tier Architecture
Most applications are built in a classic three tier architecture, where we “cleanly” separate:
- The user interface
- The service layer
- The database layer
This three tier architecture has proven useful mostly because it makes it very easy, for instance, to replace or complement a rich client user interface (e.g., JavaFX or RCP if you’re using Java) with a web-based one using AngularJS, as an example. The idea is that the service layer will expose a clean API to the user interface(s), which have no knowledge about the service layer’s implementation.
The database layer then takes care of persisting any data that is needed by the service layer. This layered architecture is something that is also cargo-culted way too often, but that is a topic for another discussion.
So, in what layer does business logic go? The answer is: In all of them!
According to the Wikipedia definition, it can go everywhere, because each layer is suitable for “real-world business rules that determine how data can be created, displayed, stored, and changed.”
Let this sink in. There is no reason why any of the above layers is more generally suitable for business logic than the others.
So, why would suggesting putting some business logic in the database cause so much turmoil with our readership? Let’s look at pros and cons of business logic in the database:
Pro Database: Performance
There are two database use-cases where performance really matters in a way that moving logic inside of the database is crucial:
- Batching (e.g., ETL, Migrations, etc.)
When you run complex reports across a complex database schema, or against a huge data set, e.g., in a column store, then you don’t want to load all the data into a client application for processing.
As processors become increasingly parallel and memory becomes increasingly cheap (see http://yourdatafitsinram.com), modern databases profit enormously from the fact that all the data is located on a single node. Consider the Stack Overflow / Stack Exchange architecture, which is hosted on a single live SQL Server instance: https://data.stackexchange.com
Pro Database: Data Integrity and Validation
Some data integrity rules are easy to express with ordinary constraints. We have trivial things like:
- NOT NULL constraints
- Unique constraints
- Foreign keys
- Check constraints
All of these things model your business rules in very simple ways.
In more complex situations, triggers can become very useful in enforcing data integrity across several rows within the table, or even across several tables.
While validation can obviously take place in other application layers as well (in the UI for immediate user feedback, in the service layer, in case other systems need to be checked, too), keeping data integrity close to the actual data makes total sense – when it doesn’t hurt performance too much.
For example, in a banking application, where you want to ensure that a
CUSTOMER will not perform any
PAYMENTS beyond their overall
CREDIT_LIMIT, nor beyond each
Using a database trigger, you could check on each INSERT of a
PAYMENT, or on each UPDATE of a
AMOUNT whether the following query yields a non (zero, zero) result for CUSTOMER_CREDIT_LIMIT_EXCEEDED or ACCOUNT_CREDIT_LIMIT_EXCEEDED:
WITH future_payments (acco_id, amount) AS ( SELECT acco_id, SUM(amount) FROM payments WHERE execute_at > sysdate GROUP BY acco_id ) SELECT CASE WHEN cust.credit_limit > SUM(acco.balance - paym.amount) THEN 1 ELSE 0 END AS customer_credit_limit_exceeded, SUM(CASE WHEN acco.credit_limit > acco.balance - paym.amount THEN 1 ELSE 0 END) AS account_credit_limit_exceeded FROM customers cust JOIN accounts acco USING (cust_id) JOIN future_payments paym USING (acco_id) WHERE cust_id = SYS_CONTEXT('banking', 'cust_id') GROUP BY cust_id, cust.credit_limit
In the above example, we’re using Oracle’s SYS_CONTEXT function, which can be used to access information from a previously set context namespace. There are other means to achieve the same, e.g., by using TEMPORARY TABLES.
The nice thing about this approach is that you have to implement it only once, at the perfect spot. No developer will ever need to think of this business rule again, no matter how
PAYMENTS are entered into the database (e.g., via the web E-Banking system, a bank operator, an ATM machine, etc.)
Pro Database: Row Level Security
Some databases (e.g., Oracle) ship with built-in row level security, but your security rules may be much more complex than being able to access individual rows. You may implement Access Control Lists (ACLs) directly in your database. The following model shows an example of what is possible.
ACCOUNTS table contains the objects of interest, which should be access-protected.
ACCOUNTS belong to
CUSTOMERS (i.e., a legal entity), and
CUSTOMERS may have several
USERS (real people, typically employees). Not every
USER (e.g., employee) within a
CUSTOMER (e.g., employer) should be able to access every
ACCOUNT. This is why a special
ACCESS_RIGHTS table is added, to govern these relationships.
What’s more, the
ACCESS_RIGHTS table has an optional (NULLABLE)
ACCO_ID column, which means that if this value is NULL, then the
USER_ID should get access to all the
ACCOUNTS for that given
CUST_ID, like a master user.
These are the business rules. Now, where do you want to implement this logic? The answer is: In the database, of course. Ideally, you’ll write a view along the lines of
CREATE OR REPLACE VIEW v_access_rights AS SELECT acri.acco_id, acri.user_id, acri.cust_id FROM access_rights acri WHERE acri.acco_id IS NOT NULL AND acri.user_id = SYS_CONTEXT('banking', 'user_id') UNION SELECT acco.acco_id, acri.user_id, cust_id FROM access_rights acri JOIN accounts AS acco USING (cust_id) WHERE acri.acco_id IS NULL AND acri.user_id = SYS_CONTEXT('banking', 'user_id')
The above view can now be joined to every query that you’re running against the
ACCOUNTS table, to restrict access to ACCOUNTS, e.g.
SELECT acco.* FROM accounts acco JOIN v_access_rights acri USING (acco_id)
And why not take this one step further by adding a new V_ACCOUNTS view, which shields access to the real
CREATE OR REPLACE VIEW v_accounts AS SELECT * FROM accounts WHERE acco_id IN ( SELECT acco_id FROM v_access_rights )
By specifying the right GRANTS and ROLES in your database, you can from now on prevent access to the
ACCOUNTS table very easily, directly in the database. There’s hardly a better way to do that in a client application (i.e., service layer). And you most certainly mustn’t do this in the user interface layer.
Pro Database: Writing an API Using Stored Procedures
A lot of people who argue in favour of a clean API on the service layer will immediately understand the benefits of such an API:
- Abstraction – The service layer is a very simple, abstract façade of the system that’s behind it.
- Encapsulation – The system behind the service layer cannot be accessed. Its internals aren’t exposed.
- Versioning – Several API versions can be maintained in parallel, which allows for replacing the implementation without clients (e.g., the user interface) being impacted.
- Documentation – The API is the best place to put documentation. Because of the above features, an API consumer doesn’t have to know anything else about the backing system, apart from the API. This makes documentation very easy.
- Testing – The API can be mocked by the client with common assumptions that can be made about the behaviour of each API “method”. Because each API element is documented (see above), the backend can be replaced very easily with a stub. This allows for decoupling the development lifecycles of API consumer and producer.
All of the above are features of any API. Whether the API is implemented in REST, HTTP, SOAP, a set of Java interfaces, or a set of stored procedures doesn’t matter. It’s always an API.
There is actually no reason at all why in the database layer, we shouldn’t also expose an API to the “service layer”, i.e., to put a service layer in the database.
Apart from quality, one of the most important reasons to proceed with a stored procedure API is security. Imagine the previous database, where we had bank
ACCOUNTS. Let’s assume that
ACCOUNTS are low-security objects, whereas
CREDIT_CARDS are high-security ones.
In the above example, we’ve defined a “secret” area, where no GRANTS are given to any database users with UI access. We want our business logic to be completely shielded off from any layer above, because we don’t necessarily trust service layer programmers to get everything right. Besides, some undocumented fraud detection tables are also in this “secret” area, of which we don’t even want anyone to have knowledge.
So, we’ll write stored procedures, to provide an API to the service layer for everything related to
CREDIT_CARDS. This is an excellent use case, but there are many more for a clean API in the database, which doesn’t expose the underlying relational model.
Pro Database: Reusability of Logic Across a System With Common Database Access
It is still a common architecture pattern (see Stack Exchange) to have only a single, huge database that grants access to most surrounding systems, including
- End users
- Admin users
- Technical users (e.g., ETL, reporting, batch jobs)
An example that I’ve given in my previous post was the example of the running total. Let’s assume the following schema:
The model lacks the balance on each bank account transaction. We only have the individual amounts, and the current balance of the
ACCOUNTS table. It is rather easy to calculate the balance also for each individual
TRANSACTION using window functions:
CREATE OR REPLACE VIEW v_transactions AS SELECT t.*, t.current_balance - NVL( SUM(t.amount) OVER ( PARTITION by t.account_id ORDER BY t.value_date DESC, t.id DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ), 0) AS balance FROM v_transactions t WHERE t.account_id = 1 ORDER BY t.value_date DESC, t.id DESC
The key point of this approach is the fact that if the above query is stored in a database view, we can reuse the view in all places where we need the balance of a TRANSACTION. There isn’t any much simpler way of providing reusability than a SQL view
Pro Database: Fear the Object Relational Mapping (ORM)
Often, in more complex schemas, we need to traverse the same join paths all the time. Not only is this tedious to write, but if you’re doing that in your ORM, you’ll risk running tons of unnecessary queries due to a wrong configuration of lazy/eager fetching, resulting in N+1 problems. Many people refer to this problem as the object-relational impedance mismatch.
In this example, we’re going to be using the Open Source Sakila database, which is excellent for demo applications. The Sakila database is a simple DVD rental store with films, actors, stores, staff, customers, and all the rest that’s needed. What’s nice about this database is the fact that it has all the beautiful stuff in ordinary relational modelling, like one-to-one, one-to-many, and many-to-many-to-many relationships:
Let’s imagine we want to know the actor, film, and category that generated the most revenue. These are three very similar queries and an “eager” ORM aficionado might jump to the conclusion that it is wise to fetch all the data in memory in order to reuse some of the logic when traversing the entities via the ORM (e.g., Hibernate). The opposite is true. It is much better to push this logic into the database and perform the aggregation there, e.g., in the following three queries:
Actor with the most revenue
SELECT actor_id, a.first_name, a.last_name, sum(p.amount) FROM actor a JOIN film_actor fa USING (actor_id) JOIN inventory i USING (film_id) JOIN rental r USING (inventory_id) JOIN payment p USING (rental_id) GROUP BY actor_id, a.first_name, a.last_name ORDER BY 4 DESC
Film with the most revenue
SELECT film_id, f.title, sum(p.amount) FROM film f JOIN inventory i USING (film_id) JOIN rental r USING (inventory_id) JOIN payment p USING (rental_id) GROUP BY film_id, f.title ORDER BY 3 DESC
Category with the most revenue
SELECT category_id, c.name, sum(p.amount) FROM category c JOIN film_category fc USING (category_id) JOIN inventory i USING (film_id) JOIN rental r USING (inventory_id) JOIN payment p USING (rental_id) GROUP BY category_id, c.name ORDER BY 3 DESC
Yes, SQL is verbose. You’ll have to repeat the same (or similar) JOINs all the time for these queries. But every client-based solution is much worse than the approach taken above. And if you have a close look, what’s happening here is nothing but business logic in the database. We express the relationship between categories and revenue in a 4GL language called SQL.
Some may argue that this is just the old “ORM vs. SQL” argument. I claim: No, it’s the equally old “Run business logic in the database: yes or no” argument. And I say yes. That calculation should be done in the database.
Contra Databases: Vendor Independence
Obviously, the more logic you move into the database, the more your logic will be vendor dependent in the sense that it starts to depend more and more on a specific database.
This is a valid argument against putting logic in the database for systems where vendor independence is important. For instance, when you have a small product / web application that you want to sell to your customers and run on their existing infrastructure.
Large corporations often have enterprise licenses with Oracle, SQL Server, DB2, etc. Adding another application to access their database servers doesn’t cost as much as adding a new database server for which they might not have any operations personnel and/or experience.
If you’re in that software vendor segment, then being database agnostic can be a selling point. You can run on Oracle Enterprise Edition, just as much as on a free SQLite database.
This is not a desirable situation from many other perspectives, e.g., from a performance perspective, or when stuff really goes wrong and you don’t have the right experience to trouble shoot a specific database.
Contra Databases: Your System Doesn’t Grow Around a Central RDBMS
Even if this classic architecture is still very popular, often systems are really huge and instead of centralizing data in a single store, you’ll distribute it on multiple nodes, and possibly with multiple SQL and/or NoSQL storage technologies.
In this situation, moving logic into a single RDBMS is not necessarily wise as only a few other systems can profit from the logic. Commonly used middleware that is connected to other middleware using messaging systems will be a more useful place to put logic.
Contra Databases: You and Your Team Don’t Like the Relational Model and Way of Thinking
This is one of the most important de facto reasons for opposing putting logic into the database, and it has – to some extent – been “cargo culted” especially in “the Enterprise” and everywhere where object orientation is practiced as a main model.
I’m specifically mentioning object orientation, because the OO model has a couple of features that are very hard to model with the much simpler relational model. These features are:
- Inheritance – Apart from a few ORDBMS (Oracle, Informix, PostgreSQL), inheritance is very foreign to RDBMS, and even in those ORDBMS, inheritance is rather unpopular.
- Identity – A fundamental aspect of object orientation is object identity. When using an object, you are storing its reference at the client side. Many clients can store the same reference and refer to the same object. This isn’t really what relations and tuples are about: They’re values without identity. Even if most RDBMS use row identities / ROWIDs under the hood, to uniquely identify a record, this feature isn’t really widely available in the relational model.
Both of the above features aren’t crucially necessary to model a given domain. Everything can be modelled in relational terms, which is why the relational model has been so successful.
But object orientation is particularly good at modelling graphical user interfaces. And when those came about, the OO model had its biggest success.
Today, we’re moving back to more data-driven, value-typed models involving identity and inheritance-less tuples, records, and functional programming, which is much closer to SQL’s declarative programming. I’m claiming that this OO-based criticism will fade away as traditionally OO-centric developers will (finally) start seeing the advantages of mixing the models, instead of shoehorning everything into objects. Uncle Bob has put it very nicely:
The bottom, bottom line here is simply this. OO programming is good, when you know what it is. Functional programming is good when you know what it is. And functional OO programming is also good once you know what it is.
When OO fades into its pragmatic, limited-scope niche, the argument of avoiding business logic in the database based on the object-relational impedance mismatch will go away.
Contra Databases: Tooling for PL/SQL and T-SQL is Much Worse
This is, unfortunately, a very valid criticism and I don’t see that changing much in the near future.
When you’re working with Java, you can use NetBeans, Eclipse, IntelliJ, which are all excellent IDEs that make you an extremely productive programmer.
When you’re working with C#, VB.NET, etc., you can use Visual Studio, which is an even better IDE than the above.
This is not at all true for PL/SQL or T-SQL. First off, the languages themselves are rather simple and at the same time rather clumsy. While they nicely embed the SQL language in a procedural language, they don’t go far beyond a couple of very basic imperative language features.
Secondly, the runtime is very rigid. PL/SQL, for instance, doesn’t have anything like the JVM’s ClassLoaders, which allow for dynamic linking. Replacing a PL/SQL package can be a huge pain as it will invalidate all possible call sites, block processes that are concurrently calling the package (keep dreaming about generic or subtype polymorphism, dynamic dispatch, or higher order functions / lambda expressions).
Furthermore, there are hardly any libraries. In fact, outside of SQL tables and perhaps PL/SQL associative arrays, there aren’t even any decent data structures. If you’re used to the rich collection APIs in Java or .NET, working with associative arrays that can use only numeric or string keys will hardly get you excited.
Once you go to production, these things do not really matter. PL/SQL programs are extremely fast and can technically do everything a more advanced language can do. But when reading Paul Graham’s “Beating the Averages” where he introduces the fictional Blub language, it is easy to see that procedural database languages are just not modern enough to help companies deliver value in time.
So, if you don’t have a compelling reason to do so (mostly performance, reuse in database-centric applications, security), I cannot blame you for avoiding stored procedures.
Neutral: Unit Testing
This is another cargo cult that I’m personally rather neutral about, but there have been a lot of people who claim that unit testing code that is stored in the database is hard or impossible.
Technically, this isn’t true. In object oriented languages, polymorphism and dynamic linking is used to mock API with test-specific implementations that inject the expected behaviour for a given test. Similarly, you can replace a PL/SQL package body with an alternative implementation in the database, when you’re executing a given test. Remember, after all, we are testing APIs, and for that, we must simply replace the API implementation.
Clearly, dynamic linking again makes replacing implementations on the fly much easier, but this doesn’t mean that unit testing in the database isn’t possible.
There’s never a black / white truth about how to optimally do things in software.
On the topic of where to best put business logic (UI layer, service layer, data layer), there are certain signs that indicate whether the data layer is more or less appropriate.
Signs pro data layer are:
- Data integrity
- Data security
- Data logic reuse (in database-centric systems)
Signs contra data layer are:
- Business logic complexity
- Developer efficiency
- Lack of reuse (in non-database-centric systems)
A lot of discussions around such topics are usually religious cargo culting that can be settled objectively by putting all facts on the table. Business logic in the database makes a lot of sense, and is often even unavoidable. But no sane person really wants to overdo it, let alone move all the logic into the database. Just like no sane person really wants to treat the RDBMS like a dumb data store that doesn’t really do anything.