Enjoy Every Sandwich

Thoughts on SQL, XML, .NET and sometimes beer.

<October 2008>
SuMoTuWeThFrSa
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678


Navigation

Tools

List O'Links

Kent's Other Stuff

Subscriptions

News

Please read these
Notices and Disclamiers

Post Categories

Article Categories



Sunday, August 06, 2006 - Posts

Paging resultsets in SQL Server 2005

Yesterday I was relaxing a bit at the Red Carpet Club (D7 gate) as the Dulles Airport when an Instant Message popped up. It was from one of my fellow DevelopMentor instructors. He was wondering why this query was complaining that RN is an invalid column:

SELECT ROW_NUMBER() OVER (ORDER BY IMAGEID) AS RN,IMAGEID,IMG FROM DBO.IMAGES WHERE RN BETWEEN 10 AND 19

Unless you know how these ranking functions like ROW_NUMBER work, you might find yourself asking the same question. The trick here is that the value of the any of the ranking functions is computed after the underlying query against the table is completed. Therefore, the column RN isn't available in the WHERE predicate. So the next question is how do we work around this for making pages of data? That, in turn produced what seemed like a good from my cohort. Here's two variations on a solution:

SELECT RN,IMAGEID,IMG FROM (SELECT ROW_NUMBER() OVER(ORDER BY IMAGEID) AS RN,* FROM DBO.IMAGES) AS T WHERE RN BETWEEN 10 AND 19

This solution uses a sub-query to feed the computed result -- the query within the parens -- up to another query -- where the computed value for the row number is available. Another solution for the problem is:

WITH C(RN,IMAGEID,IMG) AS (SELECT ROW_NUMBER() OVER (ORDER BY IMAGEID),IMAGEID,IMG FROM DBO.IMAGES) SELECT IMAGEID,IMG FROM C WHERE RN BETWEEN 10 AND 19

This uses a Common Table Experssion (the with... as() part) to do the same thing as the previous query. So the question becomes, is this a more efficient query than doing something like this:

SELECT TOP(10) IMAGEID,IMG FROM DBO.IMAGES WHERE IMAGEID >= 10 ORDER BY IMAGEID

The reason for the question was that the first two queries have two select within them, and that might lead one to believe that two queries over the data are required to produce the desired result. However this is not the case. Both the sub-query and the CTE generate operationally identical query plans, namely:

select <- filter <- top <- sequence project (compute scalar) <- compute scalar <- segment <- clustered index scan

meaning that first SQL Server finds a batch of row values from the primary key index on the table, reduces that to a set rows and then select that out to the ten records of interest. So there's while there's no second recordset created, there is an intermediate working set created. The last query, the SELECT TOP(10)..., avoid this the intermediate result set. This generates a more efficient query plan (8.5% lower cost) of:

select <- top <- clustered index scan

For a query over a four-thousand row example table.

The most important thing to note about the query efficiency here that unless these queries are definetly benefiting from having a cluster primary index key on the table. Repeating the cost analysis without these indexes. In the CTE example, the estimated query cost is roughly 25-times more expensive than the query over the indexed table. The once cheaper SELECT TOP() query becomes just as expense without the index.

posted Sunday, August 06, 2006 12:49 PM by ktegels

Best email of the year: Microsoft SQL Server 2005 XML has shipped.

Congrats, Michael!

So if you've been wondering why I never wrote a book about SQL Server 2005 or XML, here's why -- there's one straight from the source.

For those of you with a passion of XML and SQL Server 2005, this is the book for you. My copy has shipped, when are you going to get yours?

posted Sunday, August 06, 2006 7:34 AM by ktegels




Powered by Dot Net Junkies, by Telligent Systems