Back to articles list
- 4 minutes read

ColumnStore Indexes in MS SQL Server

Introduced in SQL 2012, ColumnStore indexes differ greatly from standard row-based indexes. Intended for OLAP systems, these indexes store data in a highly compressed, segmented fashion with the column as the basis (rather than typical row-based indexes). This type of column-based index allows for great performance gains in data warehouses where table scans, rather than seeks, are performed.

ColumnStore indexes have evolved significantly over the last few SQL Server versions:

  • SQL Server 2012 – Only supports nonclustered, read-only columnstore indexes.
  • SQL Server 2014 – Adds updateable, clustered columnstore index support. This is a noteworthy change, as it significantly simplifies the updating of tables with columnstore indexes.
  • SQL Server 2016 – Additional keyword support and tighter integration with the SQL engine is planned, including the ability to define columnstore indexes during table creation.

Syntax

The syntax for the creation of columstore indexes is virtually the same as standard indexes:

CREATE CLUSTERED COLUMNSTORE INDEX myColumnStoreIndex ON myOLAPTable
WITH (DROP_EXISTING = ON);

Usage

The demand for business intelligence (BI) implementations has never been higher; traditional data warehousing is still a popular method for storing information in a non-relational manner. Columnstore indexes were created as a response to the massive growth seen in these types of implementations.

While standard OLTP databases tend to rely on selecting ranges of data (scans, seeks), OLAP queries initiated by tools such as Microsoft’s Power BI will generally request entire columns for aggregations. This makes regular indexes largely pointless, as the range being scanned is the entire structure.

Columnstore indexes are not intended for data retrieved by OLTP applications. Their usage is limited to data warehousing tables – typically facts and dimensions modeled in a star or snowflake fashion. By compressing the large amount of column data being returned to the BI client, I/O pressure is significantly lessened.

Note that any table can have only one columnstore index. Best practices indicate this index should cover all columns that will be accessed by OLAP queries (Clustered columnstore indexes do this by default). Executing the following 3 statements:

CREATE NONCLUSTERED COLUMNSTORE INDEX IX_COLTest on ColumnStoreTest (XID)
GO
CREATE CLUSTERED COLUMNSTORE INDEX IX_COLTest_2 on ColumnStoreTest
GO
CREATE NONCLUSTERED COLUMNSTORE INDEX IX_COLTest_2 on ColumnStoreTest
GO

Will return error messages for the second two:

Msg 35304, Level 16, State 1, Line 2
CREATE INDEX statement failed because a clustered columnstore index cannot be
created on a table that has a nonclustered index. Consider dropping all
nonclustered indexes and trying again.
Msg 35339, Level 16, State 1, Line 2
Multiple nonclustered columnstore indexes are not supported.

Each column is stored separately in the columnstore index, internally. A query that retrieves multiple columns will have heavier I/O impact, but the compression used will still garner large performance benefits.

Here’s a basic example of how ColumnStore indexes should impact OLAP queries. This example will use only a single column for clarity, but most OLAP queries will select multiple columns.

First, we’ll create our test table, ColumnStoreTest and add some data to the TestInt column:

CREATE TABLE ColumnStoreTest (XID INT IDENTITY, TestInt INT)
GO

DECLARE @MyVal INT, @Loop INT
Set @Loop = 1
WHILE (@Loop < 2585293)
BEGIN
	Set @MyVal = (SELECT CAST(RAND() * 1000000 AS INT) AS [RandomNumber])
	INSERT ColumnStoreTest (TestInt) VALUES (@MyVal)
	Set @Loop+=1
END
GO

We’ll also create a clustered index on TestInt so both examples have indexed data.

CREATE CLUSTERED INDEX IX_COLTest on ColumnStoreTest (TestINT)

Now let’s retrieve our data after clearing buffers:

DBCC DROPCLEANBUFFERS
GO
Select TestInt From ColumnStoreTest
GO

Now we have our baseline costs, and can create a columnstore index to see if any performance gains are seen. Note that no columns are specified for a clustered columnstore index; by default, all columns are compressed and stored.

DROP INDEX ColumnStoreTest.IX_ColTest
GO
CREATE CLUSTERED COLUMNSTORE INDEX IX_COLTest on ColumnStoreTest
GO

Retrieving data:

DBCC DROPCLEANBUFFERS
GO
Select TestInt From ColumnStoreTest
GO

As expected, a huge gain is seen in I/O; the ColumnStore retrieval was almost 10X faster.

Let’s see how a nonclustered ColumnStore index performs. We have to specify the columns for Nonclustered ColumnStore Indexes:

DROP INDEX ColumnStoreTest.IX_ColTest
GO
CREATE NONCLUSTERED COLUMNSTORE INDEX IX_COLTest on ColumnStoreTest (TestInt)
GO
DBCC DROPCLEANBUFFERS
GO
Select TestInt From ColumnStoreTest
GO

We get the same costs. So why use one over the other?

Most importantly, clustered ColumnStore indexes are updateable, but cannot be combined with other indexes. They are also only available in Enterprise, developer, and evaluation SQL Server editions.

Nonclustered ColumnStore indexes can co-exist with row-based indexes.

For a complete list of differences, see the MSDN link that corresponds to the appropriate SQL Server version.

Key limitations of both type of columnstore index include:

  • The definition of columnstore indexes can’t be modified using ALTER INDEX. The index must be dropped and recreated. They can, however, be disabled and enabled using ALTER INDEX.
  • Sparse columns cannot be included in the definition of columnstore indexes.
  • INCLUDE functionality is not available.
  • The definition cannot include sorting (ASC or DESC); the compression algorithms handle this internally.
  • Additional limitations are mentioned in the MSDN syntax article.

Columnstore indexes are ideal for OLAP-style data models, and should be used in any SQL Server-based data warehousing implementations.

go to top