January 2008 - Posts

Example of how SET options impact the Query Optimizer index choice

I put together a scenario that demonstrates how a query session’s SET options can impact which index is used by the Query Optimizer. In this example, I’ll start off by creating a schema-bound view in the AdventureWorks database:

CREATE VIEW dbo.vw_SalesOrderDetail
WITH SCHEMABINDING
AS
SELECT SalesOrderID, SalesOrderDetailID,
CarrierTrackingNumber, ProductID, LineTotal
FROM Sales.SalesOrderDetail
WHERE LineTotal BETWEEN 35.000 AND 50.000
GO

Next, I’ll create two indexes on the view I just created. The first index is a unique clustered index on the composite key of the underlying base table, and the second index is nonclustered on the CarrierTrackingNumber column:

CREATE UNIQUE CLUSTERED INDEX
idx_vw_SalesOrderDetail_Unique ON
dbo.vw_SalesOrderDetail (SalesOrderID,
SalesOrderDetailID)
GO

CREATE INDEX
idx_vw_SalesOrderDetail_CarrierTrackingNumber ON
dbo.vw_SalesOrderDetail (CarrierTrackingNumber) GO

Now I’ll show the estimated query execution plan for a query that selects against the base table (on top of which I created the view and associated indexes):

SET SHOWPLAN_ALL ON
GO

SELECT SalesOrderID, SalesOrderDetailID,
CarrierTrackingNumber, ProductID, LineTotal
FROM Sales.SalesOrderDetail
WHERE CarrierTrackingNumber = ‘8E3A-4564-99′ AND
LineTotal BETWEEN 35.000 AND 50.000

The argument column of the 4th row of the SHOWPLAN_ALL recordset reveals an Index Seek against the following object:

OBJECT:([AdventureWorks].[dbo].
[vw_SalesOrderDetail].
[idx_vw_SalesOrderDetail_CarrierTrackingNumber]), SEEK:([AdventureWorks].[dbo].
[vw_SalesOrderDetail].[CarrierTrackingNumber]=N’8E3A-4564-99′) ORDERED FORWARD

Even though I was just querying the base table - the query optimizer was able to figure out that the index on the CarrierTrackingNumber column for the indexed view would be a good choice for the query.

Now I'll show the estimated query again - this time designating a different SET option from my session default:

SET SHOWPLAN_ALL ON
GO

SET QUOTED_IDENTIFIER OFF
GO

SELECT SalesOrderID, SalesOrderDetailID,
CarrierTrackingNumber, ProductID, LineTotal FROM Sales.SalesOrderDetail
WHERE CarrierTrackingNumber = ‘8E3A-4564-99′ AND
LineTotal BETWEEN 35.000 AND 50.000

Looking in the SHOWPLAN_ALL result set - the Index Seek we saw before is missing - replaced by a Clustered Index Scan against the base table:

OBJECT:([AdventureWorks].[Sales].
[SalesOrderDetail].
[PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID])

What happenend? Index views require that the QUOTED_IDENTIFIER be set ON when the view is created (as well as some other SET options documented in BOL). In order for the Query Optimizer to use the index on the view, these SET options need to match. If they don’t - the Query Optimizer doesn’t consider the index for the query execution plan.

Cross-posted from www.JoeSack.com.

Identify implicit data type conversions using XQuery

It has been a long time since I have posted, but in 2008 I will try to be better about it. I've been regularly blogging at www.joesack.com - however I'll be cross-posting to SQLJunkies.com again too (now that I found my login and password).

On to the topic at hand...

Implicit data type conversion is a non-obvious operation that can eat away at your query performance. Let’s say you have a column called “EmployeeID” in your database. This is recognized as the employer identifier, and it is understood by the application team that an EmployeeID in Table “A” is the same as an EmployeeID in Table “B”. However one database developer uses a bigint for Table “A” and maybe a varchar for Table “B”. You have just built in a hidden cost to all queries that join Table “B” and Table “A” on EmployeeID. Add millions of rows to those tables, and you have just added a potentially significant cost.

How do you identify data type mismatches for the same column then? One method is to look at INFORMATION_SCHEMA.columns and look for all matching names that have varying data types. We can also check INFORMATION_SCHEMA.parameters and see any parameter names that have varying data types. The flaw here though, is that we depend on matching names. If the database designer named a column “EmpID” and another “EmployeeID” - we will have to be very familiar with the database and application design in order to know for sure that these two columns really represent the same thing. We are also limited to checking persisted objects - tables and procedures for example - and not ad hoc queries that contain implicit data type conversions in temporary tables or operations passing values from one query to another…

One solution introduced in SQL Server 2005, is the ability to use XQuery to probe the contents of an execution plan. If an implicit data type conversion occurs for a query, we can identify it in the XML query plan. I created the following query that uses the “exists” method to check for any query plan that has the implicit attribute set to “1″. I order it by worker time, looking for those queries with significant CPU usage. I also look at execution count - for those queries that are executed frequently:

SELECT qs.total_worker_time,
SUBSTRING(st.text, qs.statement_start_offset/2,
(CASE
WHEN qs.statement_end_offset = -1
THEN LEN(CONVERT(nvarchar(max), st.text)) * 2
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2)
as SQL_Text,
qp.query_plan,
qs.execution_count,
st.dbid,
st.objectid
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_query_plan
(qs.plan_handle) qp
CROSS APPLY sys.dm_exec_sql_text (qs.plan_handle) st
WHERE query_plan.exist
(’declare default element namespace “http://schemas.microsoft.com/sqlserver/2004/07/showplan“;
//Convert/@Implicit[. = “1″]’) = 1
ORDER BY qs.total_worker_time DESC


Once you have identified the higher cost plans, you can validate the predicates and see where the implicit data type conversions are occurring and then address accordingly.