posted on Tuesday, January 01, 2008 12:45 PM by joe.sack

Identify implicit data type conversions using XQuery

It has been a long time since I have posted, but in 2008 I will try to be better about it. I've been regularly blogging at www.joesack.com - however I'll be cross-posting to SQLJunkies.com again too (now that I found my login and password).

On to the topic at hand...

Implicit data type conversion is a non-obvious operation that can eat away at your query performance. Let’s say you have a column called “EmployeeID” in your database. This is recognized as the employer identifier, and it is understood by the application team that an EmployeeID in Table “A” is the same as an EmployeeID in Table “B”. However one database developer uses a bigint for Table “A” and maybe a varchar for Table “B”. You have just built in a hidden cost to all queries that join Table “B” and Table “A” on EmployeeID. Add millions of rows to those tables, and you have just added a potentially significant cost.

How do you identify data type mismatches for the same column then? One method is to look at INFORMATION_SCHEMA.columns and look for all matching names that have varying data types. We can also check INFORMATION_SCHEMA.parameters and see any parameter names that have varying data types. The flaw here though, is that we depend on matching names. If the database designer named a column “EmpID” and another “EmployeeID” - we will have to be very familiar with the database and application design in order to know for sure that these two columns really represent the same thing. We are also limited to checking persisted objects - tables and procedures for example - and not ad hoc queries that contain implicit data type conversions in temporary tables or operations passing values from one query to another…

One solution introduced in SQL Server 2005, is the ability to use XQuery to probe the contents of an execution plan. If an implicit data type conversion occurs for a query, we can identify it in the XML query plan. I created the following query that uses the “exists” method to check for any query plan that has the implicit attribute set to “1″. I order it by worker time, looking for those queries with significant CPU usage. I also look at execution count - for those queries that are executed frequently:

SELECT qs.total_worker_time,
SUBSTRING(st.text, qs.statement_start_offset/2,
(CASE
WHEN qs.statement_end_offset = -1
THEN LEN(CONVERT(nvarchar(max), st.text)) * 2
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2)
as SQL_Text,
qp.query_plan,
qs.execution_count,
st.dbid,
st.objectid
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_query_plan
(qs.plan_handle) qp
CROSS APPLY sys.dm_exec_sql_text (qs.plan_handle) st
WHERE query_plan.exist
(’declare default element namespace “http://schemas.microsoft.com/sqlserver/2004/07/showplan“;
//Convert/@Implicit[. = “1″]’) = 1
ORDER BY qs.total_worker_time DESC


Once you have identified the higher cost plans, you can validate the predicates and see where the implicit data type conversions are occurring and then address accordingly.

Comments