Using database indexes is one of the easiest ways to improve the overall performance of a database, more specifically query performance, if you select the right type. Knowing the types of indexes in SQL is very important since different indexes provide different benefits. We review commonly used indexes from the most popular RDBMS and explain when to use them.
What Are Database Indexes?
A database index is an additional data structure created on top of the data in a table. You specify an index over a table and a column or set of columns. This creates an additional data search structure associated with that table and the set of columns.
In this article, we detail what an index is, how to create one, what types of indexes exist, and when to use them. If you’re looking for just the basics of indexes, we have a great article that takes you through the database index essentials.
What Is the Purpose of an Index?
The purpose of a database index is to improve query performance by speeding up data retrieval. This is done at the cost of additional storage space to hold the b-tree data structure and the pointers to the actual data.
Indexes are used to help the database find data quickly without having to go over every row in a table every time it is queried. Overall, indexes provide a very efficient way of accessing ordered records.
If you want more information about the internal workings of a database index, how the data is structured, and a real-life example of how an index is used, we have an in-depth article about database indexes.
How to Create an Index
Different relational database management systems have slightly different syntax for creating an index. Also, different database engines offer slightly different options when creating an index, as we see later in this article. However, there is a general syntax of creating a basic index that applies to all database engines. The syntax below creates a basic index on a table.
CREATE INDEX index_name ON table_name (column_name_1, column_name2, ..)
Now, let’s apply this to a real table. Suppose we have a
Customer table like the one in the screenshot below. We want to create an index to speed up the search by customer name.
CREATE INDEX IX_CustomerName ON Customer (FirstName, LastName);
When executed, this syntax creates an index called
IX_CustomerName on the
Customer table. This allows us to search much faster for data in the
The type of index created behind the scenes is sometimes called a nonclustered index or a binary search index. This type of index helps us run optimized queries for scenarios when we run queries similar to this:
SELECT FirstName, LastName, Email FROM Customer WHERE FirstName = ‘Mark’ and LastName = ‘Thompson’;
As a rule of thumb, we look at the columns used for filtering and see if we have an index on those columns whenever we want to optimize a query. If the columns in the
SELECT clause are very similar to the columns in the filtering clauses, we get an optimized plan and a fast execution.
But this is not a silver bullet, and there’s much more to indexing than just these rules. If you’re using SQL Server, we have an article outlining the strategies you can apply using different types of indexes depending on your scenario. Although it is specific to SQL Server, take a look through it anyway, since most of the advice can be adjusted to fit other database engines.
What Are the Types of Indexes in SQL?
Now that we have seen how an index is created, let’s review the main types of relational database indexes you can create to improve your queries. Some of these are available only with certain database engines, but we’ll point out where they are available.
All indexes store pointers to the data rows in a data structure called a search tree. The search tree is optimized for searches and is the backbone of the index. It allows for something similar to searching in a binary search tree except a bit more complex.
There are different types of indexes, each of which has different internal data structures and is useful for specific scenarios. We go into more detail later in the article. Here, we just briefly mention the types of indexes available:
- From the point of view of the characteristics of the index attribute:
- Primary Index
- Clustered Index
- Secondary Index
- From the point of view of the number of index references to a data file:
- Dense Index
- Sparse Index
- Specialized indexes for highly specific scenarios:
- Bitmap Index
- Reverse Index
- Hash Index
- Filtered Index
- Function-based Index
- Spatial Index
Let’s use the
Customer table from the above as an example. To see what sample data looks like, we write a simple
SELECT query and return everything from the table.
One of the most common indexes available in all modern and full-fledged relational database systems is the clustered or clustering index. A clustered index defines the order in which data is physically stored on data pages and implicitly in the table.
Let’s look at an example. Say the first two rows fit on Page 1, the third and fourth on Page 2, and the last row on Page 3, as shown below.
The purpose of a clustered index is to physically store the rows in ascending or descending order based on the column selected. The reason for creating such an index is to have the data always sorted, which helps very much in searching either for one or multiple values in a range. However, a clustered index shines best when we’re searching in a range.
Suppose our reporting dashboards always display customers sorted alphabetically. So, we want to store the data sorted alphabetically by first and last names in our database. To create a clustered index to do so, we write the query below:
CREATE CLUSTERED INDEX CI_FirstName_LastName ON Customer (FirstName ASC, LastName ASC);
This has a small impact on the previous query that returns all of the data. By creating a clustered index with an ascending sort on the first and last name columns, we have physically reordered the data in the pages. If we look into the data pages, our data shows up like this:
The data is now sorted by the first name and then the last name as we see. If we query to sort the rows alphabetically, this gives us the best performance because the rows are already stored sorted. This helps us avoid a sort in the query.
If we want to show the first 10 customers in alphabetical order, we avoid searching the entire table to find and select the 10 customers. The database just returns the data pages associated with the first 10 customers already sorted.
The bitmap index is another type of index, available only on Oracle at the time of this writing. It is useful in a very specific scenario: it is ideal when you have to query and filter a table on a column with a small number of distinct values compared to the total number of rows in the table.
Let’s look at a scenario with our sample data where we may benefit from a bitmap index. Imagine our
Customer table has not just 5 rows but over 10 million rows. Suppose we have to filter our query for female customers whose last name is Watson.
We may write a query like the following:
SELECT FirstName, LastName, Email FROM Customer WHERE Gender = 2 AND LastName = “Watson”;
The bitmap index is ideal here because we have only a few distinct values for gender compared to the 10 million rows of the entire table. To speed up this query with a bitmap index, we create it using the following syntax:
CREATE BITMAP INDEX BMP_Gender ON Customer (Gender)
And now, we select “Kate Watson” and her email from the screenshot below along with all other matching rows from the 10 million rows in the table.
The bitmap index is even more powerful when created in a
JOIN clause – for example, if we join the
Customer table with the
Sales table and filter by Gender. A bitmap index in this scenario looks something like this:
CREATE BITMAP INDEX BMP_Gender_Sales ON Customer (Gender) FROM Customer, Sales WHERE Customer.ID = Sales.Customer_ID;
Whenever we have a query that joins these two tables and filters by gender, we get close to peak performance.
The reverse index is similar to the regular index. However, instead of creating a binary search tree for a faster search in ascending order, the reverse index is optimized to search for data in descending order. The syntax for building a reverse index is very similar to that of the usual nonclustered index, except we specify the reverse or descending order.
Suppose we want to optimize for finding out which customers have placed the 3 most recent orders. The syntax for the index is like this:
CREATE INDEX IX_LastOrder_Customer ON Customer (LastOrderDate DESC);
The essential keyword in our syntax is
DESC. It tells the database engine to create the index in reverse order. So, we get the best performance whenever we query the
Customer table for the 3 most recent orders.
What Data Structure Does an Index Use?
As mentioned before, indexes are implemented on top of other data structures to optimize search operations. But what are these data structures?
The most common indexes use a balanced tree behind the scenes to speed up queries. Most database engines use either a balanced tree or a variation of a balanced tree like a b+ tree. The structure of a general balanced tree is shown below.
The top node is the root, and those below it are either child nodes or leaf nodes. We always start searching for our row from the root node and compare if the value we’re searching for is less than or greater than the value in the node at hand. The result of the comparison tells us which way to go, left or right, depending on the result of our comparison. In the example above, all values lower than 8 take us to the left, while values greater than 8 take us to the right, and so on.
A hash is used by hash indexes. It is a data structure that provides some of the fastest search speeds. Hashes allow the index to do very fast lookups on the data stored in a table.
The idea behind hashes is that the search key is taken through a hashing function instead of being searched through an index or the entire table. The input search key is converted to a hash value that determines the associated bucket. In the example below, the search key “Mike” is taken through the hash function and is associated with a bucket.
Each bucket in the array of buckets contains the same number of records. No matter how many distinct values exist in a column, every row is mapped to one single bucket. The matching row is selected and returned from that bucket.
Index Implementations by Relational Database Engine
As you can see, there are multiple types of indexes in a relational database. Each database engine has its proprietary index implementations. Let’s go over the most popular database engines, list the available indexes for each, and discuss when to use them.
PostgreSQL provides a long list of indexes that are useful in different scenarios:
- The B-tree index is the most common type of index. It is used to search for equality and range comparisons in columns that can be sorted.
- A hash index stores a 32-bit hash code derived from the value of the indexed columns. It is used when simple equality comparisons are needed.
- GiST is not a single index but rather a logical structure within which multiple different indexing strategies can be implemented. It is often used in finding the “nearest neighbor” in geometric data types.
- SP-GiST, like GiST, is a type of index that implements multiple indexing strategies. It is based on different data structures such as quadtrees, k-d trees, and radix trees and is used in similar scenarios as GiST.
- GIN is also called an “inverted index.” This type of index is used in scenarios in which the data is formed by an array. The inverted index contains a separate entry for each component value of the array.
- BRIN stands for “Block Range Index.” It’s used to store summaries of values in consecutive physical data pages within a table. They are best suited when the values in their rows are correlated with the physical order of the data pages.
Oracle has slightly fewer types of indexes. However, they are more robust in terms of applicability.
- The B-tree is the standard index type, also used by other database engines. It is great for primary keys and columns with a very large number of distinct values compared to the total number of rows.
- Bitmap indexes are used for the opposite scenarios of a b-tree. Specifically, you want to use them when the number of distinct values in a column is very small compared to the total number of rows.
- The function-based index is a type of index in which the value stored in the search tree is defined by a function. It provides excellent performance when we have functions in
SQL Server Indexes
SQL Server has a small number of indexes, but they are very powerful in what they provide.
- A clustered index is not just a way for the database engine to search in queries. It physically reorganizes rows in the data pages so that they are sorted (ascending or descending).
- The nonclustered index is the equivalent of the standard b-tree index in other database engines. It is generally great for searching through data with a very large number of distinct values.
- Filtered indexes are created on specific subsets of data. They can be used for optimizing searches for data with given criteria when the data is skewed. For example, the value 55 of a numeric column may be searched for frequently but is found only in a few rows compared to the total number of rows in the table. You can create a filtered index in a way similar to a nonclustered index but by specifying a
WHERE column = 55condition in the index definition.
MySQL also has some indexes with which you can speed up queries.
- The primary key creates a unique index that allows for very efficient and fast query performance when searching for unique values. It also benefits from the
NOT NULLoptimization because it can’t have NULL It is always used in defining the primary key and is created automatically when specifying the
- The unique index is similar to the primary key index. However, it is more lenient in the sense that it does allow
NULLvalues to be stored multiple times. It can be used to enforce additional uniqueness if a primary key has already been created.
Keep Database Indexes in Your Toolbox
If you have made it this far, that means you like reading about database indexes! I hope you have found the information in this article useful and learned a few new things. Knowing what indexes are available in a specific database engine helps you improve performance when queries start to bog down.
Sometimes the general B-tree index is not enough and may not fit the schema and/or the data. Knowing all other types of indexes in a relational database is like having a Swiss army knife in your toolbox. To read more about database indexes, have a look at our list of articles on this topic.