Spreadsheets – Excel, Google Sheets, or a sheet by any other name – are really cool and powerful tools. But then, so are databases. When should you stick with a spreadsheet? When should you move up to a database?
You can use spreadsheets and databases for similar purposes. Given that both organize data and facilitate reporting, at times it can be hard to determine which is the best one to use. So let’s talk about the pros and cons of each option.
In the beginning…
If you’re just getting started in business, a spreadsheet (or a “sheet”) is almost always your first choice. Startups rarely have the budget to support a custom-made database. And besides, your business is new; you’ll have no idea if it will stay small, balloon into a huge corporation, or be somewhere in the middle.
Another factor is that your business’ structure and organization will likely change as it grows. So really, building a database at the start is not a common option. This is where sheets usually jump in.
The most important reason for using sheets is that they are available. You can start using Microsoft Excel, Google Sheets, or any other spreadsheet program with just a few clicks. You don’t have to plan out a complicated structure; you can simply enter your data, make calculations and reports, and share the info with coworkers. Sheets offer many cool built-in capabilities, and they can see a small business through for quite a while.
So let’s say you have all your data on sheets. Why should you consider building a database? In other words, why complicate your life if everything is working?
At this point, I’d suggest you ask yourself just how well everything is working. Remember, everything works okay until it stops working. In the case of sheets, the more data you have, the more problems you can run into. How do databases help you avoid these problems? And when should you consider switching?
Using Spreadsheets to Organize Data
Let’s assume we’ve started a company that provides telecommunication and Internet services to customers. We need to track which customer currently subscribes to which service. Customers could have more than one active service at a time, and the service could expire at the end of a set period or renew automatically.
Let’s take a look at a solution that uses sheets.
We have simply made a list of all the data we have, i.e. there is a mix of data in one place. We have customer data (columns A to E), service types (column F), and service details (columns G, H, and J).
At first glance, everything looks pretty good. We can view all the data without performing any complex actions. We can filter the data we need and create pivot tables or graphs for reporting purposes. So far, so good.
But if we continue to use sheets when we get more customers, we can hit a point when everything becomes too big for the sheets to manage. And this brings a new set of problems.
Potential Problems with Spreadsheets
Compared to spreadsheets, databases are complicated. But these “complications” serve a useful purpose; they prevent or at least minimize the following problems:
Data quality and consistency is a huge problem for bigger sheets. Although we intend to store data correctly, data quality problems are very common. People make mistakes, or we have unexpected information to enter. Just think of how the for scenarios below could present a problem:
- We want to add a new customer without specifying their service type. Should we add the customer details and leave out the service details? If we can insert only customers who have service details, that is an insert anomaly.
- What if we add service data when it becomes available, after creating the customer record?
- What if a customer subscribes to multiple services? Should we create a new record for each service, since we can only have one service type per record?
- What if we have multiple records for one customer and we need to update that customer’s information? Unless we change the information in all the relevant rows, our data will be inconsistent. We could have two different addresses for the same account; in that circumstance, how could we know which data is correct?
- What happens when we delete data? If we delete the entire row, we lose all that customer’s data. This is not a good idea; it’s better to remove only their service data and keep their customer data. But how can we do that if it’s all stored together in one row?
- What if only one customer subscribes to a service and we delete that record? If we delete that customer’s record, are we also deleting all record of that service? (This is called a delete anomaly.) Does that mean we don’t offer that service anymore? If we are still offering it, we’ve lost all parameters related to that service.
Obviously, there will be complications in storing the data for any business. We’ve all been on the receiving end of data quality problems – e.g. gotten bills for services we didn’t order, been charged twice for the same thing, or had a package sent to the wrong address. These things happen, and on a small dataset, it’s relatively easy to fix them. But what happens when we have thousands or even millions of rows? We’d soon be dedicating almost all our time to correcting these issues.
Performance problems happen when datasets get too large for a sheet to handle efficiently. You’ll experience data quality problems much sooner than performance problems, but that doesn’t mean that performance problems are unimportant. Au contraire; performance problems can be even more dangerous than data quality issues.
It’s common to search for specific rows, insert new rows, update or delete cell values in existing rows, and delete entire rows. All these actions require a lot of filtering, which is no problem on a small dataset. But when your sheets become really large, even a simple operation could take minutes. Spending half of your working day waiting for the filter to do its job is hardly a wise choice.
There’s also the related problem of redundancy – storing the same data multiple times on the disk (e.g. customer data is stored over and over again in multiple rows). This is also going to have an impact on performance.
On decent hardware, sheets with thousands of rows will be okay. But when you get into tens of thousands of rows, performance problems can rear their ugly heads. Needless to say, sheets with hundreds of thousands or even millions of rows will have extremely poor performance.
On the other hand, databases are here to solve performance issues. When everything is set up properly, working with millions of rows won’t pose any challenges.
Managing Historical Data and Reports
One more important problem with sheets is tracking data changes over time. If you simply delete data from sheets, you lose it. If you decide to store a daily sheet (to capture all the changes and preserve historical data), you’ll soon find yourself buried under tons of sheets. Creating reports from such a structure is really time-consuming, and the quality of any reports generated from it would be very questionable.
Do You Experience Such Problems with Your Data?
In today’s article, we’ve discussed some disadvantages of using sheets to organize lots of data. Have you ever experienced any of those problems? Are you ready to take your business to the next level? If the answer is “yes”, you’re at the right spot! Next week, we’ll learn how a database solves problems with storing data in sheets.