Tuesday 23 April 2013

Computed colums and DateTime queries

Computed columns are there for already a long time, but still, I feel it's worth refreshing one's memory once in a while, so consider this is as a reason for this post.

When using a DateTime value in our applications, we store the complete date and time with a precision of let’s say milliseconds. Most of the time when querying, we will use the date part of the DateTime and in some cases, the time part (up to the minute for example)

When you create an index on the DateTime column, you will create indexes on the precision of milliseconds, which causes the index to be very granular. To avoid this granularity, you should consider creating a computed column or more computed columns of the DateTime column, based upon your query requirements.

A computed Column?

A computed column in a SQL Server Database table is a column which is computed from an expression using other columns in the same table.

Expressions can be a:
  • Function
  • Constant,
  • Non-computed column name
  • Combination of the above connected by operators.
Computed columns are virtual columns that are not physically stored in the database (unless otherwise specified). Their values are recalculated on every query reference. When the PERSISTED keyword is used, the columns are physically stored in the database. The values are updated upon the change of a column that is used in the calculation.

Considerations for computed columns:
  • The expression cannot be a sub query
  • Indexes can be created when the computed column is marked as PERSISTED
  • Computed columns containing CLR functions must be determistic and PERSISTED.

Example
In the example, a computed column will be created for the DateTime column, the column will be persisted and an index will be created on the computed column.

I have a table [dbo].[ComputedColumnsTest] with 200000 records

 


 
We execute the following query:
 


SELECT Name, [Text]
FROM [dbo].[ComputedColumnsTest]

WHERE MONTH([DateTimeEntry]) = 2
 
The execution plan looks as follows:


 
We first Create an index on the DateTimeEntry Column


CREATE NONCLUSTERED INDEX [IX_DatatimeEntry]
ON [dbo].[ComputedColumnsTest]([DateTimeEntry] ASC)
INCLUDE ([Name],[Text])
GO

 
Our execution plan now looks like this:

 

Finnally, we create our computed column specification:

1. Via the SSMS


2. Via T-SQL

ALTER TABLE [dbo].[ComputedColumnsTest]
ADD [ComputedMonth] AS (MONTH([DateTimeEntry])) PERSISTED

We add an Index on the computed column

CREATE NONCLUSTERED INDEX [IX_ComputedMonth]

ON [dbo].[ComputedColumnsTest] ([ComputedMonth])

INCLUDE ([Name],[Text])
GO

And we change the query to

SELECT Name , [Text]
FROM [dbo].[ComputedColumnsTest]
WHERE ComputedMonth = 2

The query plan looks as follows:



When we run both queries side by side, we can see that they both use the index on the computed column.

When we compare both queries again, but we force the first one to use the original index, we have the following query plan.



We can now see that the query using the index on the computed columns has a much lower cost percentage regarding the complete batch.

Conclusion


Creating computed columns and indexes on computed columns do make a difference in performance, even if you are not using the computed column in your query. The SQL Server engine is intelligent enough to create it’s query plans based upon the best performing index.

Monday 19 March 2012

What about my indexes?

No need to say that indexes on tables are important, as they directly affect the performance of your queries and data applications.
Therefor it is also needless to say that maintenance of indexes is as important as a good indexing strategy. About indexing strategies, there are already written thousands and thousands of pages, and it is for this reason that I want t spend a few lines on index fragmentation and usage. So,what about my indexes, how much are they fragmented, are they used at all?
First, a script is shown below, which will give you a result set, showing the indexes and their usage statistics:  
SELECT  TableName = t.name,
                 [RowCount] = si.rows ,
                 IndexName  = ISNULL(i.name, i.type_desc),
                 IndexType  = ISNULL(i.type_desc, ''),
                 IsFilteredIndex = CASE WHEN i.has_filter != 0 THEN 'True' ELSE 'False' END,
                 Filter  = ISNULL(i.filter_definition , ''),
                 [FillFactor] = i.fill_factor,
                 [IndexDepth] = s.index_depth,
                 [FragmentationPercentage] =  ISNULL(CAST(s.avg_fragmentation_in_percent AS DECIMAL(10,2)),0),
                 [FragmentCount] = ISNULL(s.fragment_count,0),
                 [FragmentationSizeInPages] = ISNULL(CAST(s.avg_fragment_size_in_pages AS DECIMAL(10,2)),0),
                 [PageCount] = ISNULL(s.page_count,0),
                 UserSeeks = ISNULL(us.user_seeks,0),
                 UserScans = ISNULL(us.user_scans,0),
                 UserLookups = ISNULL(us.user_lookups,0),
                 UserUpdates = ISNULL(us.user_updates,0),
                 LastUserSeek = us.last_user_seek,
                 LastUserScan = us.last_user_scan,
                 LastUserLookup = us.last_user_lookup,
                 LastUserUpdate = us.last_user_update      
FROM    sys.tables t
INNER JOIN sysindexes si  ON si.id = t.object_id
INNER JOIN sys.indexes i  ON i.object_id=  t.object_id
LEFT JOIN sys.dm_db_index_physical_stats(DB_ID(),0,-1,0, null) s
                                                            ON s.object_id = i.object_id
                                                            AND s.index_id = i.index_id
LEFT JOIN sys.dm_db_index_usage_stats us
                                                            ON us.index_id = i.index_id
                                                            AND us.object_id = i.object_id
                                                            AND us.database_id = DB_ID()
WHERE    t.name NOT LIKE '%_tracking'
         AND t.name NOT IN  ('anchor' ,'scope_config' , 'scope_info', 'schema_info')
         AND      si.name LIKE 'PK_%'
         AND i.type_desc NOT LIKE 'HEAP'
ORDER BY t.name ASC, i.name ASC
Secondly, the following script will generate a result set, showing you the index fragmentation:
SELECT  DISTINCT
               SchemaName = s.name,
               TableName = t.name ,
               IndexName = i.Name,
               Fragmentation = ps.avg_fragmentation_in_percent
FROM    sys.dm_db_index_physical_stats(DB_ID(),0,-1,0, null) ps
INNER JOIN sys.tables t ON t.object_id = ps.object_id
INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
INNER JOIN sys.indexes  i ON i.index_id = ps.index_id
                                               AND i.object_id = t.object_id
WHERE   ps.avg_fragmentation_in_percent > 5
        AND     t.type = 'U' AND i.type <> 0
ORDER BY t.name;
Thirdly, a script will generate T-SQL code, which will allow you to either rebuild or reorganize, based on a 30% fragmentation ration to rebuild. This means that every index which is fragmented more than 30% should be rebuild, all others should be reorganized.
DECLARE @SQL NVARCHAR(2000);
DECLARE  @SchemaName NVARCHAR(10),
                 @TableName NVARCHAR(255),
                 @IndexAction NVARCHAR(20)

DECLARE IndexingCursor CURSOR FOR
SELECT   DISTINCT
                 SchemaName = s.name,
                 TableName = t.name ,
                 IndexAction = CASE        WHEN ps.avg_fragmentation_in_percent > 30 THEN 'REBUILD'
                                                            WHEN ps.avg_fragmentation_in_percent <= 30 THEN 'REORGANIZE'
                                             END
FROM    sys.dm_db_index_physical_stats(DB_ID(),0,-1,0, null) ps
INNER JOIN sys.tables t ON t.object_id = ps.object_id
INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
INNER JOIN sys.indexes  i ON i.index_id = ps.index_id
                                                     AND i.object_id = t.object_id
WHERE    ps.avg_fragmentation_in_percent > 5
         AND      t.type = 'U' AND i.type <> 0
ORDER BY t.name;

OPEN IndexingCursor;

FETCH NEXT FROM IndexingCursor
INTO @SchemaName, @TableName, @IndexAction;

WHILE @@FETCH_STATUS = 0
BEGIN
         PRINT  '/* Index maintenance for [' + DB_NAME() +'].[' + @SchemaName + '].[' + @TableName + '] */'
         PRINT 'ALTER INDEX ALL ON [' + DB_NAME() +'].[' + @SchemaName + '].[' + @TableName + '] ' + @IndexAction + ';'
         PRINT 'UPDATE STATISTICS [' + DB_NAME() +'].[' + @SchemaName + '].[' + @TableName + '];'
        
         FETCH NEXT FROM IndexingCursor
         INTO @SchemaName, @TableName, @IndexAction;

END


CLOSE IndexingCursor
DEALLOCATE IndexingCursor

Have fun!
Jurgen

Friday 4 November 2011

A view on storing file data in SQL Server

Storing file data (BLOBs) (eg. pdf, word, excel ...) in SQL server always has been some kind of dark magic to a lot of people.

Before SQL Server 2008 we could store the BLOB data in VARBINARY(MAX) datatypes, in which you also had to convert the BLOB data into the binary format yourself ...


With SQL 2008, the FILESTREAM datatype was introduced. FILESTREAM enables us to integrate SQL Server with the NTFS file system ... BLOBs are stored as real files on the file system.

More on FILESTREAM: http://msdn.microsoft.com/en-us/library/bb933993.aspx




--Enable FILESTREAM
EXEC sp_filestream_configure @enable_level = 3;

--CREATE THE TABLE
CREATE TABLE TestFS
      (
            ID INT IDENTITY(1,1),
            FileName NVARCHAR(200),
            Content VARBINARY(MAX) FILESTREAM
      )


As from the new version of SQL Server, SQL 2012 (aka 'Denali'), a new feature has been introduced ... FILETABLE, based on the existing FILESTREAM feature in SQL Server. In fact, the FILETABLE feature is nothing more than storing files in special filetables in SQL server, but you will be able to access them from the file system. It is said that FILETABLE will be the next generation of FILESTREAM ...

More on FILETABLE: http://msdn.microsoft.com/en-us/library/ff929144(v=sql.110).aspx



--Create the table
CREATE TABLE DocumentStore AS FILETABLE
    WITH (
          FileTable_Directory = 'DocumentTable'
          FileTable_Collate_Filename = database_default
         );
GO
Have fun
Jurgen













Thursday 15 September 2011

SQL Server Denali - LAG and LEAD




This is a short post on 2 new T-SQL features called LAG and LEAD, they are kind like Bonny & Clyde, they are mentioned in the same phrase...

LAG and LEAD accesses the data from respectively the previous and the next row in a table without using self joins.

Syntax:


LAG (scalar_expression [,offset] [,default])
OVER ( [ partition_by_clause ] order_by_clause )


LEAD (scalar_expression [ ,offset ] , [ default ] )
OVER ( [ partition_by_clause ] )

USE
AdventureWorks2008R2

GO

SELECT TerritoryName, BusinessEntityID, SalesYTD,
       LAG (SalesYTD, 1, 0) OVER (PARTITION BY TerritoryName ORDER BY SalesYTD DESC) AS PrevRepSales
FROM Sales.vSalesPerson
WHERE TerritoryName IN (N'Northwest', N'Canada')
ORDER BY TerritoryName;




SELECT BusinessEntityID, YEAR(QuotaDate) AS SalesYear, MONTH(QuotaDate) AS SalesMonth, SalesQuota AS CurrentQuota, LEAD(SalesQuota, 1,0) OVER (ORDER BY YEAR(QuotaDate), MONTH(QuotaDate)) AS NextQuota
FROM Sales.SalesPersonQuotaHistory
WHERE BusinessEntityID = 275 and YEAR(QuotaDate) IN ('2005','2006');


Have Fun!