Enjoy Every Sandwich

Thoughts on SQL, XML, .NET and sometimes beer.

<July 2008>
SuMoTuWeThFrSa
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789


Navigation

Tools

List O'Links

Kent's Other Stuff

Subscriptions

News

Please read these
Notices and Disclamiers

Post Categories

Article Categories



Joe Celko's Analytics and OLAP in SQL

There's a Zen proverb that accurately describes this book: "If you understand, things are just as they are; if you do not understand, things are just as they are."

What do I mean by this? If you already understand writing OLAP queries, this book doesn't really introduce anything that likely to be new to you. If don't, Joe does a decent job of explaining the basics and demonstrating useful techniques in standard SQL and even, albeit briefly, Microsoft's MDX language. He presents things just as they are. There's a few things I like about this book and some I don't. Let's start with the negative.

  • Its Joe being Joe. Okay, that's not bad or wrong per se, but here again he goes on his traditional rants about vendor-specific extensions, the differences between records and rows and the inappropriateness of identity based primary keys. If you are used to reading or listening to Joe, you expect this. If not, then you'll probably find these tangents interrupt the reading flow significantly.
  • Its not T-SQL: If you're buying this book specifically to become a better T-SQL Developer, it can certainly help you do that. However, understand that Joe writes standard SQL, not Microsoft's T-SQL, so not all of his examples are directly usable. This is especially true for one of his ROW_NUMBER() examples. On page 73, he has a query that uses this function to compute the median of vector and references the computed values in the where clause. SQL Server 2005 doesn't support this. The example at the end of this post shows how to do write a similar query in SQL Server 2005.
  • There might be a bug or two: Joe states on pages 51 and 52 that the MERGE INTO statement exists in SQL Server 2005. So far, I've been unable to prove this, at least as far the RTM version goes. I'm working with Niels Berglund to see if it was a feature in early versions (e.g., Alphas, Betas and CTPs).

Those negatives granted, what was good about this book -- and why do I suggest buy it?

  • Its Joe being Joe. Personally, I appreciate his get to the point and show me code style. This is a book you can pick up and read in a few hours and come out with a better understanding of how to use SQL to do analytic queries. Its also a good -- but limited -- primer on OLAP concepts and terminology. If you're looking for a good place to take your first steps into this world, this book is a good choice.
  • Chapter 11 on computing and using Correlations is outstanding: A while ago, there was another book on the market -- Data Mining and Statistical Analysis Using SQL by Lovett and Trueblood -- that was the book on doing statistical computing with SQL Server. I'm not sure that APRESS is still printing it, but I see that its available on EBay and Amazon Marketplace. Joe does a nice job covering this particular topic in five pages.
  • There's lots of other chapters worth noting: Chapters 12 and 14 on distributions and regressions respectively are also very good. Chapter 18 on MDX is a too-brief but solid introduction to that topic. Joe doesn't try to teach you everything you should know on this complex topic, but again, its a great place to get your first foothold.

I think this book is a solid choice to make if you are interested in learning more about OLAP and Analytics from the SQL language point of view. It helps you understand these concepts "just as they are."


Median-by-RowNumber() Example for SQL Server 2005

-- Prepare table and fill with data
drop table dbo.foobar
go
create table dbo.foobar(X smallint not null)
go
set nocount on
declare @i int
set @i = 0
while(@i < 1001) begin
  insert into dbo.foobar values (cast(rand() * 32767 as smallint))
  set @i = @i + 1
end
go
-- Get the computed average
select avg(X) as Average from dbo.foobar
go
/* Replica of Joe's query
select avg(X),
    row_number() over (order by X asc) as hi,
    row_number() over (order by X desc) as lo 
from dbo.foobar 
where hi in (lo,lo+1,lo-1)
*/
-- Find the median values
with f as (
select X,
    row_number() over (order by X asc) as hi,
    row_number() over (order by X desc) as lo 
from dbo.foobar)
select X as median from f
where hi in (lo,lo+1,lo-1)
go

posted on Saturday, September 09, 2006 7:13 AM by ktegels





Powered by Dot Net Junkies, by Telligent Systems