Testing is difficult. Good database design may make it more efficient and more easily applicable. It may help detect regression – a situation when a working code breaks due to some apparently unrelated change to the code.
You cannot always afford comprehensive regression testing of your system. But what you usually can afford to do is to implement a repeatable process for build and automated testing of your system. It may cost a little at the start but it will pay off greatly later.
What do you test in this automated process? The least you can do is to cover your code with traditional unit testing. This will make sure you know what your objects are doing, but this will not guarantee that your data layer is compatible with your code. In fact, it won’t even guarantee that you have any persistence layer at all.
To solve that you may use database unit testing. Basically, to do that, you need:
- a test database available during the build process,
- a good database schema with reasonable constraints,
- good database seeding (aka test data).
The test database should resemble the production database as much as possible. It should have the same version and configuration parameters. Every difference between the production and test database is a chance you will have something working in the test environment and failing in the production one.
The database schema should enforce as much consistency checking as is possible and reasonable – but no more than that. The available tools for consistency checking include:
- Foreign keys – you should always use them in entity relationships; omit only because of performance and integration issues as shown in article “When Normal Forms are Not Enough”.
- Unique keys – put them in all places where duplicate data is not allowed; it is far better to see an exception from the database then to create two users with the same login.
- NOT NULL – every field should be NOT NULL unless business rules permit it to be NULL,
- Checks – use them if:
- a field has a simple, easy to express rule, like being nonnegative or start with “A”, etc.,
- or a field depends on another field to be set, like “if authentication type is ‘password’; then password field must be set,”
- Length limits – if business rules say that a field must not exceed 10 characters, set that limit on the database apart from setting them in the validators on the UI side; but take into account multibyte characters as shown in the post “Could Hugo Kołłątaj use your system?” – is not the only database that has such problems.
- Variable ID sequence starts – various entities may have identifiers starting from different values, like user from 1000, company from 2000, etc.; if you use sequences to generate identifiers you may set different initial value for each sequence. This may help developers detect bugs during testing - when they see an identifier they may guess which entity it refers to.
I do not recommend using triggers and stored procedures as they may introduce performance penalties and are significantly more complex.
The database seeding (also known as test data) is data used in tests and it is perhaps the most important part of the database unit testing. It should meet the following requirements:
- It must be possible to refresh the data easily, from a script, launched manually by a developer or during the build process.
- The data should cover all important border conditions – if some important business field may be null, there should be an entity that has it set to null; if there is a field for storing last user login time - think about users who have never logged in, etc.
- The data should be organized into independent sets serving various testing purposes, so that:
- tests do not interfere with each other,
- there is no need to rebuild the test database between tests – takes time.
- The data should be small so that each test takes little time to run.
Let’s imagine we have an internet store. Customers pick available items, add them to the cart and – when they are ready – they submit an order.
The database structure might look as shown in the Vertabelo diagram below.
What could you do to help developers test the software in such a situation?
In this diagram all values are NOT NULL except item.cart_id and item.order_id. And they are nullable for a reason – I want every item to be either in cart or in an order, but not both. So when a customer submits an order, I create a new order and move items from the cart to the newly created order.
To make sure the invariant “in cart or in order, but not in both” I use a check on the table item that says:
((cart_id is not null and order_id is null) or (cart_id is null and order_id is not null))
Therefore I am sure that when someone forgets to remove ordered items from their cart the system will throw an exception rather than allow corrupt data to be created.
I am sure that every order number should be unique. Therefore I add a unique key with a single column: order.order_number. Primary keys guarantee uniqueness so you don’t have to worry about id columns.
Do not bother generating identifiers using SQL, like max(id)+1 or something like that. Use a sequence. It is much safer as it is guaranteed to be thread safe and is not rolled back on transaction rollback.
I suggest adapting a naming standard for identifier sequences, like
CREATE SEQUENCE item_seq START WITH 10000 NO CYCLE;
And that’s it! A good database structure and good test data may be quite a firm foundation for good code quality. It will throw exceptions eagerly - even on the production system. But each one exception thrown is one illegal data modification attempt prevented.