October 2006 - Posts

Announcing SQLQueryStress: A simple query load tool

On October 20, 2004 -- two years ago -- I announced that I was considering writing my own query load testing tool.

Clearly, my follow through on this promise has been incredibly delayed.  But all is not lost, and I'm happy to report that I am true to my word.  As of today, beta-1 of my new SQLQueryStress tool is available for download.

SQLQueryStress is a free tool that provides a simple means of load testing individual queries.  It provides some support for query parameterization and options for collection of I/O and CPU metrics.  It is not intended to replace tools such as Visual Studio Team System's load tests, but rather to be a simple and easy-to-use tool in the DBA or database developer's kit. I think it lives up to that goal, and initial feedback I've gotten from a few people I've shown it to has been positive.  I've been working on this project on and off for around the last eight months, so I'm quite excited to finally release it to the public!

Please check it out and provide me with any feedback you have: What works?  What doesn't work?  What other features would you like to see?  I'm planning to do quite a bit more with this tool, so I'm definitely interested in what users think would be useful.  And of course, please let me know if you find any bugs (but given the rigor with which I've tested this project, there couldn't possibly be any!)

Thanks, and enjoy!




Cross-posted from SQLBlog! - http://www.sqlblog.com


Replacing xp_execresultset in SQL Server 2005

SQL Server 2000 included a very useful extended stored procedure called xp_execresultset. This XP had two parameters: @cmd and @dbname. @cmd was expected to be a SELECT statement that would produce a single column of output, each row of which would produce a valid query. @dbname was used to specify the database that both the initial query and the resultant queries would be executed in.

For example:

EXEC xp_execresultset 
    @cmd=N'SELECT ''SELECT 1''',
    @dbname=N'tempdb'


In this case, tempdb would be used to execute the statement SELECT ''SELECT 1''. This statement produces a single row with the value SELECT 1, which is then executed in tempdb. The final output is a single result set containing the value 1.

Multi-row statements are also allowed:

EXEC xp_execresultset 
    @cmd=N'SELECT ''SELECT 1'' UNION ALL SELECT ''SELECT 2''',
    @dbname=N'tempdb'


In this case, the initial SELECT produces two rows, each with its own statement. Then each row is processed as an individual query, thereby producing two single-row result sets, one with the value 1, and one with the value 2.

We can verify that the first statement is run in the specified database by using the DB_ID() function in the outer query:

USE master

EXEC xp_execresultset
    @cmd=N'SELECT ''SELECT ''+CONVERT(NVARCHAR, DB_ID())',
    @dbname=N'tempdb'


In this case the end result is 2, indicating that the outer query was run in tempdb rather than master.

So now that I've covered xp_execresultset's usage in SQL Server 2000, the bad news: this XP, although quite useful in many cases, has been removed in SQL Server 2005. This was pointed out  to me by a post today in the MSDN forums by Marko B. Simic (thanks, Marko!)

The solution, luckily, is pretty simple: We can re-create this XP as a stored procedure in SQL Server 2005, using a few tricks to make the job easier. Following is the replacement stored procedure I've come up with:

CREATE PROC ExecResultSet
    @cmd NVARCHAR(MAX),
    @dbname NVARCHAR(255)
AS
BEGIN
    SET NOCOUNT ON

    DECLARE @x TABLE
    (
        sql NVARCHAR(MAX),
        num INT IDENTITY(1,1)
    )

    DECLARE @input_sql NVARCHAR(355)
    SET @input_sql = N'EXEC ' + @dbname + '..sp_executesql @stmt=@cmd'

    INSERT @x (sql)
    EXEC sp_executesql
        @input_sql,
        N'@cmd NVARCHAR(MAX)',
        @cmd

    DECLARE @sql NVARCHAR(MAX)
    SELECT @sql =
    (
        SELECT sql + ';' AS [data()]
        FROM @x
        ORDER BY num
        FOR XML PATH('')
    )

    EXEC sp_executesql
        @input_sql,
        N'@cmd NVARCHAR(MAX)',
        @sql
END
GO


Note that I've tried to faithfully reproduce the original functionality of xp_execresultset. This means that the resultant rows are concatenated and executed as a single batch. This can be good in some cases -- for instance, your first row can contain some variable declarations that will be used by later rows. However, this can also cause problems if you need to declare a variable on a per-row basis.

This stored procedure is fairly simple: It executes the input @cmd, concatenates the resultant rows, and finally executes everything as a single batch. Its syntax is identical to the original XP. Modifying this stored procedure to make it a bit more flexible and execute each row as its own batch is a simple matter of using a cursor to take each row individually, rather than employing the FOR XML PATH concatenation trick. However, I'm going to leave it as-is for now so that it can be used directly in places where you would have used xp_executesql previously.




Cross-posted from SQLBlog! - http://www.sqlblog.com


Exploring the secrets of intermediate materialization

When working with SQL Server 2000, I used to have this little trick I'd pull out after exhausting all other ideas for tuning a query.  And I thought that my little trick was dead in SQL Server 2005, but thanks to fellow SQL Server MVP Rob Farley, I am officially reviving my trick from the dead here and now, in this blog post.

... But first, let's start with an example query.  Here's the scenario: You work for AdventureWorks, and management has asked you to create a report to find out how many peers each employee in the company has. You see, AdventureWorks management seems to believe that if two employees are managed by the same person, they must have exactly the same job function, and they can do each others' jobs equally well.  So what they want to do is find out which employees have too many peers (might as well downsize some of that extraneous fluff), and at the same time find out which employees, should they be hit by a bus tomorrow, could be immediately substituted for by a colleague. Now, whether or not management's belief is utterly moronic or not is beyond the scope of this post, so dash any such thoughts from your head until you've read to the end, and then resume pondering along those lines, which I'm sure will end up putting a smile on your face. 

But smiles are for later.  At this point I've managed to go off on a horribly involved tangent, so let's get back to the query at hand:

SELECT 
    x.EmployeeId,
    COUNT(*) AS TheCount
FROM HumanResources.Employee x
JOIN HumanResources.Employee y on y.ManagerId = x.ManagerId
GROUP BY x.EmployeeId


What we're doing here is finding all employees managed by the same manager, and then taking a count of those employees.  Yes, it would have made more sense to simply find out how many employees are managed by each manager, but that's not what management asked for, and management clearly thinks better than you do. So go run the report!

But what does any of this have to do with query tuning tricks, you ask (while tidying up your resume a bit)?  To answer that question, let's take a quick peek at the I/Os our query is using, in addition to the query plan.  First, a baseline for the I/Os:

SELECT *
FROM HumanResources.Employee


Result, obtained via Profiler: 20 logical reads.  OK, and now the test query in question: 827 logical reads. Quite a jump for a query that only uses the one table -- we're clearly wasting a lot of resources.  And looking at the query plan, it's obvious we can do better.  An outer table scan, looped to find the count for each employee -- that's a lot of index operations.

A common way to start tuning this kind of query is to move the aggregation into a derived table. After peering at this query for a while, one might come to the conclusion that there's no reason to aggregate on ManagerId more than once per manager. Why do it once per employee?

SELECT
    x.EmployeeId,
    y.theCount
FROM HumanResources.Employee x
JOIN
(
    SELECT
        p.ManagerId,
        COUNT(*)
    FROM HumanResources.Employee p
    GROUP BY p.ManagerId
) y (ManagerId, theCount) ON y.ManagerId = x.ManagerId


Seems better, but upon running it, we see that it produces 819 logical reads and almost exactly the same query plan. Not much of an improvement.  And alas, there's not much more we can do here.  There just aren't too many ways to skin this query, and each of them requires some kind of loop to get the count, either implied or otherwise... Right?

And now we're almost to "dirty trick" territory.  But let's first try a not-so-dirty trick. A temp table might eliminate some of the overhead, right?  Then we'll only have to query the base tables once...

SELECT
    p.ManagerId,
    COUNT(*) AS theCount
INTO #y
FROM HumanResources.Employee p
GROUP BY p.ManagerId

SELECT
    x.EmployeeId,
    y.theCount
FROM HumanResources.Employee x
JOIN #y y ON y.ManagerId = x.ManagerId


207 logical reads.  Quite an improvement!  But the temp table is still using a nested loop, and a merge would be so much nicer, wouldn't it?  A MERGE JOIN hint drops the number of reads to 115, but I still feel that we can do even better.

Now in SQL Server 2000 at about this point in my query tuning excercise, I might try forcing intermediate materialization of the derived table, sans the temp table, by using TOP 100 PERCENT in conjunction with ORDER BY. Unfortunately, the SQL Server query optimizer team decided that this wasn't a good idea, and the optimizer now ignores such attempts.

And until earlier today, I thought the game was over. Until I was reminded by Rob that TOP takes a number of rows in addition to a percent. The trick, then?  Use a bigger number of rows than you'll ever actually get back... Say, the maximum value for SQL Server's INTEGER type (2147483647)?

By applying TOP and ORDER BY within the derived table, we can force SQL Server to perform intermediate materialization of the results.  And by playing with the ORDER BY properly, we can even prompt the optimizer to choose a merge...

SELECT
    x.EmployeeId,
    y.theCount
FROM HumanResources.Employee x
JOIN
(
    SELECT TOP (2147483647)
        p.ManagerId,
        COUNT(*)
    FROM HumanResources.Employee p
    GROUP BY p.ManagerId
    ORDER BY p.ManagerId
) y (ManagerId, theCount) on y.ManagerId = x.ManagerId


And the result of all of this hard labor?  10 logical reads (1000% improvement over the next best method), a merge operation, and if I do say so myself, a very good topic for a blog post.

The usual caveats apply.  Do not try this at home.  Do not rely on this undocumented behavior.  Do not pass Go.  Do not fail to hire me to tune your databases if this trick doesn't fix all of your problems.  And, lest I forget, do not waste time reading this blog when management needs that report yesterday!

Anyway, until next time, enjoy!




Cross-posted from SQLBlog! - http://www.sqlblog.com