Two new features in SQL Server 2016 make expanding your database’s historical data storage much easier. Here’s how to implement them.
Storage. It’s a problem that anyone with a constantly-growing database has to face. We maintain great volumes of data that are rarely queried by business users. Most of these are historical or versioned data that someone checks once a year to see how “product X sales looked in 2005” or “record Z has changed over time”.
To help get this situation under control, SQL Server 2016 introduced the Stretch Database feature. It allows us to send data to Azure Storage and save local hard drive space. For the end user, the process is seamless: data can be pulled from the cloud on demand. (You also have the option to read data from your local datacenter only.)
There are lots of limitations placed on tables selected for Stretch, although many of them won’t have much impact (e.g. XML or spatial indices, sparse columns). MSDN maintains a comprehensive list of limitations that includes default constraints, computed columns, and change tracking/CDC.
Temporal tables – also introduced in SQL Server 2016 – are designed to automatically keep historical versions of changed records. In other words, it has every record version that ever existed over time (hence temporal). Note: This is stored without logical elements like triggers or OUTPUT clauses.
Temporal tables are great repositories for huge amounts of rarely queried data. They are also excellent examples of data that we could store outside of our probably quite expensive local storage. Temporal tables introduce SYSTEM_TIME and define PERIOD – the time when the data was valid in that SYSTEM_TIME. A thorough explanation of temporal tables is beyond this article, but you can find more information online. In particular, I encourage you to read Uwe Ricken’s great introduction.
Let’s see how to stretch a temporal table to the cloud. At first we’ll use a shortcut – the SQL Server Management Studio (SSMS) wizard – then we’ll look at doing the entire process with T-SQL.
What You Need to Stretch a Database
According to the comparison datasheet, Stretch Databases and temporal tables work with any edition of SQL Server 2016 – so just use whatever edition suits you. You will need SQL Server Management Studio (SSMS) for the demos, as we will run both T-SQL scripts and the SSMS Stretch Database wizard.
Last but not least, you’ll also need a Microsoft Azure subscription. If you don’t have one (or if you have no access to SQL Server), an account with Visual Studio Dev Essentials will get you free access to SQL Server 2016 Developer Edition and a $25 USD (25 EUR) Azure credit for 12 months.
First things first. Let’s create a database that will hold sample data. We’ll call it TemporalStretch, and for simplicity’s sake it will have a SIMPLE recovery model. Some functionalities – like Log Shipping, Mirroring, and Replication – require the FULL recovery model, but it’s nice to see that Stretch Database doesn’t need it. Initially, the database will have two tables:
- dbo.Product – has products and prices that change over time
- hist.Product – stores historical versions of products’ prices
We will create an additional schema, hist, for readability, but it will have little impact later on.
USE [master] GO CREATE DATABASE [TemporalStretch] ON PRIMARY ( NAME = N'TemporalStretch', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TemporalStretch.mdf', SIZE = 51200KB, MAXSIZE = UNLIMITED, FILEGROWTH = 65536KB ) LOG ON ( NAME = N'TemporalStretch_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TemporalStretch_log.ldf', SIZE = 51200KB, MAXSIZE = 2048GB, FILEGROWTH = 65536KB ) GO ALTER DATABASE [TemporalStretch] SET COMPATIBILITY_LEVEL = 130 GO ALTER DATABASE [TemporalStretch] SET RECOVERY SIMPLE GO
Now let’s load some initial product data. Because we need historical data, we’ll change the prices right away: they will go up two times, then down two times.
To generate a time gap between our historical versions, we will use the WAITFOR clause. It will help us create conditions for stretching the data later in the project.
Because we want to version our data, we use the temporal table feature. To create a temporal table, we use some additional T-SQL syntax for the CREATE TABLE – PERIOD FOR SYSTEM TIME (StartingDatetimeColumnName, EndingDatetimeColumnName) function. This marks table columns as PERIOD, which stores information about when this data was valid.
We also have to have columns defining the start and end of the PERIOD ; in this example, they are named ValidFrom and ValidTo. They are a datetime2 type, with any precision between 0 and 7. We’ll also need to state which column is responsible for tracking the start or end of the PERIOD; to do this, we use GENERATE ALWAYS AS ROW START/END clauses.
Inserting products from Product 21 to Product 35 is done in transactions with a gap of five seconds between each transaction. All the records inserted in a transaction have the same ValidFrom date. Here’s the code:
USE TemporalStretch; GO IF SCHEMA_ID('hist') IS NULL EXEC sp_executesql N'CREATE SCHEMA hist AUTHORIZATION dbo'; GO -- create products table with versioning CREATE TABLE dbo.Product ( ID INT IDENTITY CONSTRAINT PK_Product PRIMARY KEY, Name VARCHAR(15) NOT NULL, Price SMALLMONEY, PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo), ValidFrom DATETIME2(0) GENERATED ALWAYS AS ROW START, ValidTo DATETIME2(0) GENERATED ALWAYS AS ROW END ) WITH ( SYSTEM_VERSIONING = ON (HISTORY_TABLE = hist.Product) ) INSERT INTO dbo.Product (Name, Price) VALUES ('Product 11', 110), ('Product 12', 120), ('Product 13', 130), ('Product 14', 140), ('Product 15', 150) BEGIN TRANSACTION INSERT INTO dbo.Product (Name, Price) VALUES ('Product 21', 210), ('Product 22', 220), ('Product 23', 230), ('Product 24', 240), ('Product 25', 250) WAITFOR DELAY '00:00:05' -- 5 seconds INSERT INTO dbo.Product (Name, Price) VALUES ('Product 31', 310), ('Product 32', 320), ('Product 33', 330), ('Product 34', 340), ('Product 35', 350) COMMIT TRANSACTION WAITFOR DELAY '00:02:00' -- 2 minutes UPDATE dbo.Product SET Price = Price * 2. WHERE Name < 'Product 20' UPDATE dbo.Product SET Price = Price * 2. WHERE Name BETWEEN 'Product 21' AND 'Product 25' -- ----- WAITFOR DELAY '00:01:00' -- 1 minute UPDATE dbo.Product SET Price = Price - 8 WHERE Name < 'Product 20' UPDATE dbo.Product SET Price = Price - 14 WHERE Name BETWEEN 'Product 21' AND 'Product 25'
Database Magic With the Wizard
Step 1: Enable Stretch for the Table
We will get data from the hist.Product table. To do this, right-click on the table name and select Stretch > Enable.
Note: We’ve skipped the intro panel.
The window that pops up is a bit incomplete. We see the name of the table we want to stretch, but no schema information.
We can also right-click the database and select Tasks > Stretch > Enable. This gives us a similar window and a similar way to get a bit confused:
The wizard shows us that there is a limitation if we would like to migrate data from our main table. (This table has a primary key – a mandatory requirement for temporal tables – so we know it’s the main table).
Step 2: Filter the Data (Optional)
Back to the migration itself. We can push the entire table or only a subset of rows. Click Entire Table to filter data. The Select rows to stretch window will appear.
The enigmatic Name box is where you’ll enter a name for the filtering function. Let’s call it fnStretchProducts and make the date fields the parameters. Since we’ll send to the cloud for the last 10 changed records, let’s find the last change date:
SELECT DISTINCT ValidFrom, ValidTo FROM hist.Product ORDER BY ValidTo DESC
You should see two records. Take datetime from the ValidFrom column on the first row and put it in the filtering condition window. (For me, this value was 2016-06-29 17:19:00.)
Notice the letter T inside the value field – without it the wizard won’t be able to convert the full date and time value to the datetime2 type. We can’t edit the generated SQL; the filtering function just checks if there is data to migrate. If there are no rows matching the filter criteria, we see an error like ‘0 rows returned. Cannot create predicate with these values’.
Step 3: Set up the Database Server in Azure
Next, we sign in to Microsoft Azure and set up the server. We can use an existing server or create a new one. I opted for the latter and got an auto-generated server name. Since it’s a new server, we have to set up the admin username and password.
Step 4: Set up the Database Master Key
Next, we generate a database master key (DMK) to protect these automatically-configured database credentials.
Step 5: Set up the Firewall
This is the last thing to do – set up a firewall access rule: either source SQL Server Public IP or subnet IP range. I chose subnet.
The summary page shows all of our selected options and gives an estimated price for the Stretch Database service.
The last page displays successful operations (or errors):
And that’s it. We have our database connected to Azure.
When we go back to the SSMS, we can see database with a cloud icon. The stretched table has no special markings, unfortunately.
A quick look at the automatically-created filtering function – a table-valued function called hist.fnStretchProducts – shows that it uses the same schema as the stretched table. The important thing is that it has to use a SCHEMABINDING clause and returns the is_eligible column with a value of 1 for the generated condition.
CREATE FUNCTION dbo.fnStretchProducts(@ValidTo DATETIME2) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT is_eligible = 1 WHERE @ValidTo > CONVERT(datetime2, N'2016-06-29T17:19:00', 101) GO
Up Next: Checking Migration
We have configured the database and table to be stretched to the cloud. How can we check if migration is alive and kicking? Right-click on the database and select Tasks > Stretch > Monitor. You’ll see something like this:
We can also the check dynamic management view sys.dm_db_rda_migration_status. (RDA means Remote Data Archive and is used in the database name as well.)
Check the Data
To check data in Azure, connect to your created database server using SSMS. Use the created server name, select SQL Server Authentication, and provide the administrator’s login and password you chose in Step 3: Setup Database Server in Azure.
If you cannot connect, look up the firewall rules in Azure Portal, select your new server and navigate to Firewall properties. The client IP Address will be shown – just click Add client IP and hit Save.
When the connection is active, expand the Tables node in your newly created database.
The target table has a different name than its source. That’s because the schema is combined with the table name; it also has an identifier and a GUID. If we take a look at sys.dm_db_rda_migration_status, we will see the same identifier in the table_id column.
To see the migrated data, just select the data from the table.
There may be situations where we won’t see the table name under the Tables node. When I used SSMS versions prior to the July 2016 edition, table names didn’t show. However, you can check table names in Azure by querying sys.remote_data_archive_tables on the TemporalStretch database on premises (check the remote_table_name column), and then querying that table in Azure.
Do It All in T-SQL
We don’t have to use the SSMS wizard to stretch our databases. Let’s go step by step and manually stretch a database. This way, we’ll better understand the entire process.
Manual Set up Starts in Azure
We have to set up the cloud stuff first. The wizard did everything for us previously (Step 3: Setup Database Server in Azure). Now, though, we’ll handle it ourselves.
Before we start, make sure you have SQL Server available in Azure. If you don’t, you can create one by going to the Azure Portal and clicking New. Type SQL server in the search box, select SQL server (logical server), and configure it. We can also use the server (stretchserver...) that the wizard created earlier.
To interact with the server, we need to login. We’ll connect to Azure SQL Server using the administrator account in SSMS and create a login with the dbmanager privilege. (This is required if you want to create databases. Also, make sure you are connected to the master database in Azure SQL Server.)
CREATE LOGIN ForStretch WITH PASSWORD ‘
’; GO CREATE USER ForStretch FOR LOGIN ForStretch WITH DEFAULT_SCHEMA = dbo; GO ALTER ROLE dbmanager ADD MEMBER ForStretch; GO
We should now define the firewall rules; again, this is something the wizard did for us before. If you have to set up another firewall, the rules follow the same steps as in Check the Data.
Note: If you don’t set up the dbmanager role at the start, you may get a continuous CREATE DATABASE permission error later on.
Next: The On-Premises Part
First we’ll create the database and tables, using the same scripts as in the Setup part. We need to provide a different database name here; we won’t worry about originality, so we’ll name it TemporalStretch2.
One of the first things the wizard made for us was the filter function (Step 2: Filter the Data (Optional)). Use the same code again, changing only the date and time:
CREATE FUNCTION dbo.fnStretchProducts(@ValidTo DATETIME2) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT is_eligible = 1 WHERE @ValidTo > CONVERT(datetime2, N'2016-06-29T18:10:00', 101) GO
Step 1: Configure Instance
To prep a database for stretch, we have to configure instance. The wizard did this for us transparently, but to have the complete picture execute sp_configure and enable the remote data archive option:
EXEC sp_configure 'remote data archive', 1 RECONFIGURE
Step 2: Create a Database Master Key and Credential
Now let’s take care of connectivity. Readying a database for stretch requires two parameters: server name and credential. The former was created in the manual Azure setup, but the latter has to be created now.
A credential is saved authentication information that one can use to connect to external resources. To configure a credential for only one database, we can use the database scoped credential. First, we have to generate the database’s master key (look at Step 4: Set up the Database Master Key). We start with USE TemporalStretch2 to switch context for master key and credential generation.
-- create database master key USE TemporalStretch2; GO CREATE MASTER KEY ENCRYPTION BY PASSWORD = '
'; GO -- create database credential CREATE DATABASE SCOPED CREDENTIAL crForStretchDatabase WITH IDENTITY = 'ForStretch', SECRET = ' ' ; GO
SSMS doesn’t show a master key or credential for the database, but we can see them in the metadata:
SELECT * FROM sys.symmetric_keys; SELECT * FROM sys.database_credentials;
Step 3: Stretch the Database
Now we can stretch our database to the cloud. It’s the same step as you see in the wizard’s Results pane that we saw earlier (Configure stretch on database …). Here we’ll use this code:
ALTER DATABASE [TemporalStretch2] SET REMOTE_DATA_ARCHIVE = ON ( SERVER = N'
.database.windows.net', CREDENTIAL = [crForStretchDatabase] ) GO
We now have a local database connected to the cloud-based database that will hold the stretched data. Our final step will enable table stretching and filter the data in one process.
Step 4: Stretch and Filter the Table
To stretch the table, we alter it and enable migration (MIGRATION_STATE = OUTBOUND):
ALTER TABLE hist.Product SET ( REMOTE_DATA_ARCHIVE = ON ( FILTER_PREDICATE = dbo.fnStretchProducts(ValidTo), MIGRATION_STATE = OUTBOUND ) ); GO
Without FILTER_PREDICATE, we would transfer all the contents of the hist.Product table.
Digging the metadata
A few metadata system views will show us the information related to our stretched database. Here’s the complete list:
- sys.symmetric_keys – shows the database master key and other symmetric keys
- sys.database_credentials – information about database credentials
- sys.dm_db_rda_migration_status – detailed migration status
- sys.dm_db_rda_schema_update_status – shows schema update status
- sys.remote_data_archive_databases – the Azure database that holds stretched data
- sys.remote_data_archive_tables – tables in the Azure database that hold stretched data
- sys.external_data_sources – external data sources (RDA is one of them.)
Leaving the Cloud
We have the option to stop stretching our data into the cloud. We can bring it all back locally or just leave it where it is. To do this, right-click the appropriate table name in SSMS and select Stretch > Disable. Choose one of two options: Bring data back from Azure or Leave data in Azure.
ALTER TABLE hist.Product SET ( REMOTE_DATA_ARCHIVE (MIGRATION_STATE = INBOUND) ); GO
Remember that even if we bring data back from the cloud, it has not been removed from the remote database. The remote table remains in the remote database.
When Should You Use Stretch Database and Temporal Tables?
Stretching a database may be an option if you have problems getting more local storage for a growing database. The whole process is not that complicated, but it has a few potential problems – like proper remote login setup or leaving data in Azure after migrating back.
Temporal tables will generate lots of rarely queried data, so naturally we would like to get it out of local storage (while still keeping it close for auditing purposes). In this case, stretching this data to the cloud is a relatively easy solution.