Jurgen's Old Blog

Tuesday, 23 April 2013

Computed colums and DateTime queries

Computed columns are there for already a long time, but still, I feel it's worth refreshing one's memory once in a while, so consider this is as a reason for this post.

When using a DateTime value in our applications, we store the complete date and time with a precision of let’s say milliseconds. Most of the time when querying, we will use the date part of the DateTime and in some cases, the time part (up to the minute for example)

When you create an index on the DateTime column, you will create indexes on the precision of milliseconds, which causes the index to be very granular. To avoid this granularity, you should consider creating a computed column or more computed columns of the DateTime column, based upon your query requirements.

A computed Column?

A computed column in a SQL Server Database table is a column which is computed from an expression using other columns in the same table.

Expressions can be a:

Function
Constant,
Non-computed column name
Combination of the above connected by operators.

Computed columns are virtual columns that are not physically stored in the database (unless otherwise specified). Their values are recalculated on every query reference. When the PERSISTED keyword is used, the columns are physically stored in the database. The values are updated upon the change of a column that is used in the calculation.

Considerations for computed columns:

The expression cannot be a sub query
Indexes can be created when the computed column is marked as PERSISTED
Computed columns containing CLR functions must be determistic and PERSISTED.

Example
In the example, a computed column will be created for the DateTime column, the column will be persisted and an index will be created on the computed column.

I have a table [dbo].[ComputedColumnsTest] with 200000 records

We execute the following query:

SELECT Name, [Text]
FROM [dbo].[ComputedColumnsTest]

WHERE MONTH([DateTimeEntry]) = 2

The execution plan looks as follows:

We first Create an index on the DateTimeEntry Column

CREATE NONCLUSTERED INDEX [IX_DatatimeEntry]

ON [dbo].[ComputedColumnsTest]([DateTimeEntry] ASC)

INCLUDE ([Name],[Text])

GO

Our execution plan now looks like this:

Finnally, we create our computed column specification:

1. Via the SSMS

2. Via T-SQL

ALTER TABLE [dbo].[ComputedColumnsTest]
ADD [ComputedMonth] AS (MONTH([DateTimeEntry])) PERSISTED

We add an Index on the computed column

CREATE NONCLUSTERED INDEX [IX_ComputedMonth]

ON [dbo].[ComputedColumnsTest] ([ComputedMonth])

INCLUDE ([Name],[Text])

GO

And we change the query to

SELECT Name , [Text]

FROM [dbo].[ComputedColumnsTest]

WHERE ComputedMonth = 2

The query plan looks as follows:

When we run both queries side by side, we can see that they both use the index on the computed column.

When we compare both queries again, but we force the first one to use the original index, we have the following query plan.

We can now see that the query using the index on the computed columns has a much lower cost percentage regarding the complete batch.

Conclusion

Creating computed columns and indexes on computed columns do make a difference in performance, even if you are not using the computed column in your query. The SQL Server engine is intelligent enough to create it’s query plans based upon the best performing index.

Monday, 19 March 2012

What about my indexes?

No need to say that indexes on tables are important, as they directly affect the performance of your queries and data applications.

Therefor it is also needless to say that maintenance of indexes is as important as a good indexing strategy. About indexing strategies, there are already written thousands and thousands of pages, and it is for this reason that I want t spend a few lines on index fragmentation and usage. So,what about my indexes, how much are they fragmented, are they used at all?

First, a script is shown below, which will give you a result set, showing the indexes and their usage statistics:

SELECT TableName = t.name,

[RowCount] = si.rows ,

IndexName = ISNULL(i.name, i.type_desc),

IndexType = ISNULL(i.type_desc, ''),

IsFilteredIndex = CASE WHEN i.has_filter != 0 THEN 'True' ELSE 'False' END,

Filter = ISNULL(i.filter_definition , ''),

[FillFactor] = i.fill_factor,

[IndexDepth] = s.index_depth,

[FragmentationPercentage] = ISNULL(CAST(s.avg_fragmentation_in_percent AS DECIMAL(10,2)),0),

[FragmentCount] = ISNULL(s.fragment_count,0),

[FragmentationSizeInPages] = ISNULL(CAST(s.avg_fragment_size_in_pages AS DECIMAL(10,2)),0),

[PageCount] = ISNULL(s.page_count,0),

UserSeeks = ISNULL(us.user_seeks,0),

UserScans = ISNULL(us.user_scans,0),

UserLookups = ISNULL(us.user_lookups,0),

UserUpdates = ISNULL(us.user_updates,0),

LastUserSeek = us.last_user_seek,

LastUserScan = us.last_user_scan,

LastUserLookup = us.last_user_lookup,

LastUserUpdate = us.last_user_update

FROM sys.tables t

INNER JOIN sysindexes si ON si.id = t.object_id

INNER JOIN sys.indexes i ON i.object_id= t.object_id

LEFT JOIN sys.dm_db_index_physical_stats(DB_ID(),0,-1,0, null) s

ON s.object_id = i.object_id

AND s.index_id = i.index_id

LEFT JOIN sys.dm_db_index_usage_stats us

ON us.index_id = i.index_id

AND us.object_id = i.object_id

AND us.database_id = DB_ID()

WHERE t.name NOT LIKE '%_tracking'

AND t.name NOT IN ('anchor' ,'scope_config' , 'scope_info', 'schema_info')

AND si.name LIKE 'PK_%'

AND i.type_desc NOT LIKE 'HEAP'

ORDER BY t.name ASC, i.name ASC

Secondly, the following script will generate a result set, showing you the index fragmentation:

SELECT DISTINCT

SchemaName = s.name,

TableName = t.name ,

IndexName = i.Name,

Fragmentation = ps.avg_fragmentation_in_percent

FROM sys.dm_db_index_physical_stats(DB_ID(),0,-1,0, null) ps

INNER JOIN sys.tables t ON t.object_id = ps.object_id

INNER JOIN sys.schemas s ON t.schema_id = s.schema_id

INNER JOIN sys.indexes i ON i.index_id = ps.index_id

AND i.object_id = t.object_id

WHERE ps.avg_fragmentation_in_percent > 5

AND t.type = 'U' AND i.type <> 0

ORDER BY t.name;

Thirdly, a script will generate T-SQL code, which will allow you to either rebuild or reorganize, based on a 30% fragmentation ration to rebuild. This means that every index which is fragmented more than 30% should be rebuild, all others should be reorganized.

DECLARE @SQL NVARCHAR(2000);

DECLARE @SchemaName NVARCHAR(10),

@TableName NVARCHAR(255),

@IndexAction NVARCHAR(20)

DECLARE IndexingCursor CURSOR FOR

SELECT DISTINCT

SchemaName = s.name,

TableName = t.name ,

IndexAction = CASE WHEN ps.avg_fragmentation_in_percent > 30 THEN 'REBUILD'

WHEN ps.avg_fragmentation_in_percent <= 30 THEN 'REORGANIZE'

END

FROM sys.dm_db_index_physical_stats(DB_ID(),0,-1,0, null) ps

INNER JOIN sys.tables t ON t.object_id = ps.object_id

INNER JOIN sys.schemas s ON t.schema_id = s.schema_id

INNER JOIN sys.indexes i ON i.index_id = ps.index_id

AND i.object_id = t.object_id

WHERE ps.avg_fragmentation_in_percent > 5

AND t.type = 'U' AND i.type <> 0

ORDER BY t.name;

OPEN IndexingCursor;

FETCH NEXT FROM IndexingCursor

INTO @SchemaName, @TableName, @IndexAction;

WHILE @@FETCH_STATUS = 0

BEGIN

PRINT '/* Index maintenance for [' + DB_NAME() +'].[' + @SchemaName + '].[' + @TableName + '] */'

PRINT 'ALTER INDEX ALL ON [' + DB_NAME() +'].[' + @SchemaName + '].[' + @TableName + '] ' + @IndexAction + ';'

PRINT 'UPDATE STATISTICS [' + DB_NAME() +'].[' + @SchemaName + '].[' + @TableName + '];'

FETCH NEXT FROM IndexingCursor

INTO @SchemaName, @TableName, @IndexAction;

END

CLOSE IndexingCursor

DEALLOCATE IndexingCursor

Have fun!

Jurgen

Friday, 4 November 2011

A view on storing file data in SQL Server

Storing file data (BLOBs) (eg. pdf, word, excel ...) in SQL server always has been some kind of dark magic to a lot of people.

Before SQL Server 2008 we could store the BLOB data in VARBINARY(MAX) datatypes, in which you also had to convert the BLOB data into the binary format yourself ...

With SQL 2008, the FILESTREAM datatype was introduced. FILESTREAM enables us to integrate SQL Server with the NTFS file system ... BLOBs are stored as real files on the file system.

More on FILESTREAM: http://msdn.microsoft.com/en-us/library/bb933993.aspx

--Enable FILESTREAM

EXEC sp_filestream_configure @enable_level = 3;

--CREATE THE TABLE

CREATE TABLE TestFS

(

ID INT IDENTITY(1,1),

FileName NVARCHAR(200),

Content VARBINARY(MAX) FILESTREAM

)

As from the new version of SQL Server, SQL 2012 (aka 'Denali'), a new feature has been introduced ... FILETABLE, based on the existing FILESTREAM feature in SQL Server. In fact, the FILETABLE feature is nothing more than storing files in special filetables in SQL server, but you will be able to access them from the file system. It is said that FILETABLE will be the next generation of FILESTREAM ...

More on FILETABLE: http://msdn.microsoft.com/en-us/library/ff929144(v=sql.110).aspx

--Create the table

CREATE TABLE DocumentStore AS FILETABLE

WITH (

FileTable_Directory = 'DocumentTable'

FileTable_Collate_Filename = database_default

);

GO
Have fun
Jurgen

Thursday, 15 September 2011

SQL Server Denali - LAG and LEAD

This is a short post on 2 new T-SQL features called LAG and LEAD, they are kind like Bonny & Clyde, they are mentioned in the same phrase...

LAG and LEAD accesses the data from respectively the previous and the next row in a table without using self joins.

Syntax:

LAG (scalar_expression [,offset] [,default])
OVER ( [ partition_by_clause ] order_by_clause )

LEAD (scalar_expression [ ,offset ] , [ default ] )
OVER ( [ partition_by_clause ] )

USE
AdventureWorks2008R2

GO

SELECT TerritoryName, BusinessEntityID, SalesYTD,

LAG (SalesYTD, 1, 0) OVER (PARTITION BY TerritoryName ORDER BY SalesYTD DESC) AS PrevRepSales

FROM Sales.vSalesPerson

WHERE TerritoryName IN (N'Northwest', N'Canada')

ORDER BY TerritoryName;

SELECT BusinessEntityID, YEAR(QuotaDate) AS SalesYear, MONTH(QuotaDate) AS SalesMonth, SalesQuota AS CurrentQuota, LEAD(SalesQuota, 1,0) OVER (ORDER BY YEAR(QuotaDate), MONTH(QuotaDate)) AS NextQuota

FROM Sales.SalesPersonQuotaHistory

WHERE BusinessEntityID = 275 and YEAR(QuotaDate) IN ('2005','2006');

Have Fun!