Sooner or later there comes a moment when database performance is no longer satisfactory. One of the very first things you should turn to when that happens is database indexing. This article will give you a general overview on what indexes are without digging into too much detail. We’ll discuss additional database index topics in future articles.
In general, a database index is a data structure used to improve queries execution time. To explain what an index is, we need to say a few words on how the data stored in tables is organized.
Tables may be organized in two ways:
- heap tables
- index-organized tables (IOTs), or clustered indexes (different vendors use different terminology, but the concept is common among them)
For now, let’s talk about heap tables only. We’ll get back to index-organized tables after we explain what an index is.
Understanding Heap Tables
In heap tables, data is stored in no particular order. New data is inserted without sorting or reorganizing previously inserted data in any way. This makes an insert operation execute very quickly but is inefficient when retrieving data.
For example, let’s say we have a table defined as above which contains one million rows. The table has no constraints and no indexes. The insert operation will be immediate, taking at most a few milliseconds of time:
test=> insert into people values ('111-22-3334', 'Mike', 'Fake'); INSERT 0 1 Time: 2,991 ms
But a query retrieving a person by SSN will take much more time:
test=> select * from people where ssn = '111-22-3334'; ssn | first_name | last_name -------------+------------+----------- 111-22-3334 | Mike | Fake (1 row) Time: 132,499 ms
This is because the data is not structured in any specific way. In order to fetch the required rows, the DB needs to scan the whole table. The query execution time will increase linearly with the number of rows in the table.
Creating an Index
In this simple scenario, we can speed up the query execution time by creating an index on the column by which we filter the results:
ssn. Here’s how to do it:
test=> create index ssn_idx on people (ssn); CREATE INDEX Time: 6960,246 ms
Now, let’s take a look at how fast the same query will execute:
test=> select * from people where ssn = '111-22-3334'; ssn | first_name | last_name -------------+------------+----------- 111-22-3334 | Mike | Fake (1 row) Time: 0,375 ms
Well, 0.375 ms opposed to 132.499 ms gives us... more than 350 times faster querying! The difference is huge.
This time the DB used the index. The index holds the
ssn column values and pointers to the corresponding table rows in a well-organized data structure. When executing the query, the index structure is traversed to collect all pointers to rows that match the filter predicate. After that, those rows are retrieved from the table and returned. This huge performance gain is due to the fact that not entire index structure but only part of it is visited. In this particular case, a very small part.
What’s important is that the previously created index will be used only for queries filtering by
ssn column. Take a look:
test=> select * from people where ssn = '111-22-3334' or last_name = 'Johnson'; Time: 238,426 ms
The execution time got significantly worse. This is because the DB needs to scan the whole table to find all people called “Johnson” and thus using the index is insufficient (and even pointless). To make such a query run faster we’d need to create a second index: on column
test=> create index ln_idx on people (last_name); CREATE INDEX Time: 13234,828 ms
Now, the query executes much faster:
test=> select * from people where ssn = '111-22-3334' or last_name = 'Johnson'; Time: 7,207 ms
This time, the DB traversed both indexes, joined pointers into one structure and then retrieved the actual results from the table.
Now, let’s take a look at index-organized tables (IOTs). IOTs are simply indexes which contain data from all table columns, not just a few of them. This way, the default heap structure is no longer needed. This has both advantages and disadvantages, but we won’t discuss them in this article.
Using indexes is not limited to heap tables only. You can create indexes both for heap tables and index-organized tables. You can have as many indexes as you need to, but there shouldn’t be too many of them per table. This is because of their overhead during insert, update and delete operations. For each of these operations, every index structure needs to be updated; this costs time.
As you can see, a database index is a very powerful mechanism to optimize query execution time. However, proper database indexing is a much more complex issue than described here. It requires more detailed knowledge about index structure and some rules of thumb you should follow when creating indexes. We’ll describe the finer details about database indexes in future articles so stay tuned!