Spreadsheets – Excel, Google Sheets, or a sheet by any other name – are really cool and powerful tools. But then, so are databases. When should you stick with a spreadsheet? When should you move up to a database?
This is the continuation of my previous article “Spreadsheets vs. Databases: Is It Time to Switch?” where we’ve discussed the most common disadvantages of using spreadsheets to organize lots of data. In this article, we will find out how a database solves those problems.
Using a Database to Organize Data
My motto is “use the appropriate technology for your needs”. If you can run your business via sheets, great! If you need a simple database, MS Access isn’t a bad option. But if these products aren’t working for you, you’ll probably need a customized database and a web application. The database will store your data; the web app will be a user-friendly way to interact with the database and communicate with the data layer.
Our fictitious service business was not very complicated, so we could power it using a fairly simple data model. If you look at the image below, you’ll see that everything we need is stored in just five tables:
A key rule of database design is to keep related real-world data in one place. In this case, we’ll keep all our
client data in the client table. This way, we’ll avoid storing the same data in multiple locations (the bad kind of redundancy mentioned earlier). If we change anything related to a client, we’ll do it only once, in this table. This will greatly improve data quality and be good for performance.
The next table that contains real-world data is the
service table. Again, we can store all the details related to our services here and we can make changes to the data quite efficiently.
client table and the
service table are real-world entities that could exist without the other. However, creating a database with unrelated entities doesn’t make too much sense – it’s like having customers without products or services without buyers. So we’ll relate these two tables using the
has_service table. To store information about which clients have which service, we’ll use foreign keys that act as references to that client and service. These foreign keys point back to records in the service and client tables. We can also keep any additional information related to each client-service relationship in this table.
client_type table is used like a dictionary that stores every possible type of client. It’s best to keep different segmentations in separate dictionary tables (e.g. if we had customer types and employee role types, we’d store them in different tables). However, we only need one table because this is a simple model.
The last table in our model is the
replacement table. We’ll use it to relate two services: a service that we want to replace and the replacement service. This gives us the flexibility to offer clients replacements for existing services (much like changing from one mobile calling plan to another).
Databases are more complicated to set up than spreadsheets, but this actually gives them some significant advantages in terms of data integrity and security:
Keys and constraints
Databases have built-in rules and controls that will, if used properly, prevent most data quality and performance issues. Primary keys (columns that uniquely identify each record in a table) and foreign keys (columns that refer to a record in another table) are critical to data safety, but defining alternate or UNIQUE keys (that contain data unique to each record in a table) is also very helpful.
In relational databases, keys relate data from different tables. The table’s primary key is always UNIQUE, while a foreign key references the primary key from some other table. That reference relates data from these two tables (e.g. the foreign keys in the
has_service table relate customer data with the services they have). It will also warn us if we’re about to delete a primary key referenced in some other table. This will prevent us from deleting records that are still needed (as references) in another table.
Constraints define the kind of data that can be entered into a field. We can specify that the data must have value (NOT NULL), define a format for phone numbers, contain only letters, and so on. This means we can avoid data problems from people entering the wrong kind of data in a field.
Security and Permissions
Another very important database feature is controlling access to your data. This gives you the ability to set not only who can access your database but also to control what they can see or modify. This is a huge part of data security. For example, you could define a user role that would allow an employee to change customer details but not service details. You could also set rules on which employees could change or delete data. It’s a good standard practice to make sure that people only have access to the data they need to do their jobs.
Of course, we could try to recreate these functionalities in sheets (at least in some way), but that would definitely be “reinventing the wheel”.
Couldn’t We Just Use a Spreadsheet?
Of course we could. We could create sheets that follow the same pattern used in the data model. That would solve many data issues, but…
Replicating the data model in sheets is definitely not an ideal option. We’d lose all the advantages the database system provides to us, all the rules and constraints that keep data “healthy”, all the things that prevent accidental deletions and other errors. We’d lose out on optimization and, if the dataset was big enough, performance would take a hit.
Even if we solved that, what about sharing data, e.g. having multiple users using the same sheet at the same time? What data integrity and performance issues would this cause? This would be the opposite of keeping things simple.
So if you think sheets can’t handle your business needs, you’re probably already headed toward a database. If you find yourself stuck with data stored in sheets and you want to move to a database, you should:
- Create a database model that stores your data optimally.
- Build an application with the database in the background.
- Clear your data, transform it (if needed), and import it into the database.
- Continue working with the database only.
Which Should You Choose – Spreadsheet or Database?
In today’s article, we’ve learned how a database solves problems with using sheets to organize lots of data. My advice is always go with the simplest solution to your problem. If spreadsheets will do the job properly, use them. But if you’re a data-driven company, you should start using a database ASAP. The longer you wait to clean and migrate your data, the more painful the process will be.