SQL Server - Programming
Where all the fun is
Two of the cool new features that we talk about in Essential SQL Server 2005 and Guerrilla SQL Server 2005 are:
- Data Definition Language (DDL) Triggers, which allow us to
catalog code that gets fired synchronously with event that trigger it. This is
useful in cases where you want to hook into the processing sequence for
commands like CREATE TABLE, ALTER VIEW or DROP ASSEMBLY, potentially logging
who did what or issuing a ROLLBACK to cancel that work.
- EVENT NOTIFICATIONS, which route information about events
asynchronously to Service Broker queue after the event has been processed.
EVENT NOTIFICATIONS are available for 193 specific events, including the events
we can use for DDL triggers.
This processing behavior difference is critical. EVENT NOTIFICATIONS
cannot easily be used to cancel work before it gets done, which a DDL trigger
can do. On the other hand, DDL triggers can’t be written for all 193 possible
events. So there are times we find ourselves wanting to use a synchronous
trigger for non-DDL events. For example, consider the routine situation where --
at the end of the month -- a given SQL Server needs to be dedicated to running
long-running and demanding batches and procedures to close out the financial
records. Normally, a "black-out" period is set by management for some interval (like
8:00 PM on the last day of the month) until work resumes the next day. The
question becomes: how can we easily automate the enforcement of this policy?
Well, if we could inject code into the logon process, it
would be easy. However, SQL Server hasn’t really made it easy to do that. At
least before Service Pack Two for SQL Server 2005. In this service pack a new
type of trigger is introduced: the LOGON TRIGGER. Let’s see how they work by
creating one.
What I’d like to do for auditing and troubleshooting
purposes is first create a database with just one table in it that allows me to
record logon activity. There’s nothing new about coding that up:
use master
go
if not db_id('audit') is null
drop database audit
go
create database audit
go
use audit
go
create schema admin
go
create table admin.LogonAudit(eventData xml,denied bit default 0)
go
use master
go
Let’s also create a user to test with:
if not suser_sid('userLogin') is null
drop login userLogin
go
create login userLogin with password = 'someEasyPassword'
go
Our trigger is actually fairly simple to write:
-- Drop our trigger just in case an older version exists...
if exists(select * from master.sys.server_triggers where name like 'noUserLogins%')
drop trigger noUserLoginsDuringMonthEnd on all server
go
-- Name it
create trigger noUserLoginsDuringMonthEnd
-- Applied at the instance level
on all server
-- Execute this code as the person who cataloged it
with execute as self
-- New, fires when a logon is attempted
for logon
as begin
-- Get the range of the backout period
-- starts on the last day of the month at 8PM
-- and ends at 9 hours later
declare @bob datetime,@boe datetime
select @bob = dateadd(m,1,getdate())
select @bob = convert( datetime,
cast(datepart(yyyy,@bob) as char(4))+'-'+
cast(datepart(MM,@bob) as char(2))+'-'+
'01 00:00:00'+' ', 120)
set @bob = dateadd(hh,-4,@bob)
set @boe = dateadd(hh,9,@bob)
-- If its currently in the back period and
-- a disallowed login is tried...
if (getdate() between @bob and @boe) and
(original_login() = 'userLogin')
begin
-- Stop the login
rollback
-- Record the attempt
insert into audit.admin.logonAudit values (eventData(),1)
-- See what happens to messages...
raisError('Attempted user login during month-end blackout',16,1) with log
end else
-- Record the attempt
insert into audit.admin.logonAudit values (eventData(),0)
end
go
So what happens if somebody tries to logon using the
userLogin we created during the black-out period? They get a message like this:

Personally, I have mixed feelings about this. It is great
because it works and does a nice job NOT displaying my potentially
security-sensitive error message (from RaisError) to the user. However, the
error message shown gives away two pieces of information it doesn’t need to:
the fact that a trigger nixed the log-on ("due to trigger execution") and that
there might very well be some database on the system called "master." I’m not
really sure why the SQL Server team decided to have this dialogue show those,
so I filled a bug about it. If you agree that this isn’t good, you can show
your support by voting for my bug by voting in it at https://connect.microsoft.com/SQLServer/feedback/Vote.aspx?FeedbackID=237008
That said there’s no doubt that this is a very nice addition
to SQL Server 2005 and your bag of tricks.
Just in case you've not already read it on Gert's or Cameron's blogs, CTP6 for "Visual Studio Team Edition For The Database Professional" -- AKA DataDude -- just got released. If things hold true to form, this will hopefully be the last CTP before we start seeing RCs. Looks to me like they've addressed a number of the things that kept me from using "Dude" as my primary SQL Development environment. Gert listed the improvements as:
- Full support for SQL Server 2000 & 2005 objects, the parser work has been completed (so I can't wait to test it again with SQLCLR Assemblies and Service Broker code).
- Extended Properties support, we know import and deploy all your extended properties (so my database dictionary programs will resume working again).
- Inline constraint support, if you do not want to separate them out, we allow them inline as well (cool for check constraints)
- Pre- and post deployment scripts population during Import Script
- Full support for command line build & deploy and Team Build
- A new Import Schema Wizard which is also integrated with the New Project Wizard to make project creation and import a single stop shop (Thank goodness!)
- Synchronize your database project from Schema Compare, compare your project with a database and pull the differences in to the database project
- Schema refactoring is now allowed even if you have files in a warning state
- Resolve 3 and 4 part name usage when the referenced database is locally present, same for linked servers
- The product no longer installs SQL Express; you can pick your own SQL Server 2005 Developer Edition or SQL Server 2005 Enterprise Edition instance on the local box. When you first start the product for the first time we will ask you to choose an local instance to use (YEAH!)
- Display detailed Schema Object properties in the VS Property Window for selected objects in the Schema View (Can't wait to kick the tires on this)
- Separation of user target database settings through user project files, this allows users to work against a different target instance without changing the main project file.
- We made great progress on the overall stability and performance of the product across the board, project creation, importing your schema, reloading project and making changes to your schema
- And last but not least we fixed many reported customer problems!
You can read the full text of Gert's announcement here or get this bits directly from here.
I think of one the most mystifying error messages that you can get in SQL Server 2005 is when try to catalog the same assembly twice into the same database:
Msg 6285, Level 16, State 1, Line 1
CREATE ASSEMBLY failed because the source assembly is, according to MVID, identical to an assembly that is already registered under the name "...".
This raises a couple of interesting questions.
a.) What the heck is an MVID and why do we care?
b.) How can we check these MVIDs for ourselves?
a.) MVID stands for Module Version ID and is documented in section 21.27 of ECMA-335 as "simply a Guid used to distinguish between two versions of the same module." Although an Assembly can contain many modules, there is a principal MVID assigned to it in the manifest -- the Manifest MVID. One of the checks that SQL Server performs when cataloging an new assembly is to get it's Manifest MVID and compare that to all of the other Manifest MVIDs for assemblies already cataloged in the current database. When a duplicate is found, SQL Server 2005 rejects the inbound assembly.
b.) Checking any non-database assembly for its Manifest MVID is fairly simple. Here's a short snip of code that returns it:
Assembly asm = Assembly.Load(BUF);
Module m = asm.ManifestModule;
return m.ModuleVersionId;
Where BUF is assumed by an array of bytes holding the assembly you want the MVID for. You could also use Assembly.LoadFromFile to an exsiting Assembly-as-DLL file into the snip, however, that method is not allowed within SQLCLR. An easier way to get the MVID for an Assembly already cataloged into SQL Server 2005 is to use the AssemblyProperty statement in T-SQL, e.g.:
select assemblyproperty(assembly name,'mvid')
A few weeks back, the ADO.NET team released the first in a series of ADO.NET v.Next releases. This was our first look at several exciting technologies like the Entity Data Framework and Mapping Views. Well today, the team announced the release a designer for working with ADO.NET v.Next Entity data.
Go have a look-see.
Time to fire up your favorite RSS feed reader and get to subscribing:
http://blogs.msdn.com/sqlqueryprocessing/rss.xml
No SQL Developer should leave home without it.
A commonly asked question is "how can I load an existing XSD file into an XML Schema Collection?" It's pretty easy:
declare @don xmlselect @don=bulkcolumn from openrowset(bulk 'c:\users\ktegels\some.xsd',single_blob) as screate xml schema collection dbo.collectionName as @dongoSubsituting the location for your file following the BULK provider in the OpenRowSet source.
Are you Datamart Builder with few bones to pick? Are your cubes too slow?
Ok, so I can't do anything to help you with that. But you can help Microsoft help you. You see, they are in the planning phase for the next version of SQL Server. Your input counts! Visit the SQLTIPs blog for a link to the online surveys.
One of the three best PSS SQL Server Engineers -- Bob Ward -- has written up a great set of white papers on SQL Server I/O. More about them on the SQL Blog.
The 18th. So, okay, that's not a very interesting question in and of itself. A better question might be "what's the Nth occurrence of the Mth day in each month?" Like, what are all of the third Thursdays in 2017?
The literal -- and rather boring -- answer is January 19, February 16, March 16, April 20, May 18, June 15, July 20, August 17, September 21, October 19, November 16, December 21.
So what's the interesting question here? How can one compute these using a T-SQL query. If you are using SQL Server 2005, that's not too hard as this Table Valued Function demonstrates:
create function dbo.NthGivenWeekDaysForYear(@year smallint,@gap tinyint,@dayOfWeek tinyint)
returns @d table(date datetime) as begin
-- fill up a table variable for the days in the requested year
declare @days table(date datetime,dayOfWeek tinyint,Month tinyint);
declare @ds datetime, @de datetime;
set @ds = cast(cast(@year as varchar(4)) + '-01-01' as datetime)
set @de = dateadd(yyyy,1,@ds);
while(@ds LESS-THAN @de)
begin
insert into @days values (@ds,datepart(dw,@ds),datepart(MM,@ds));
set @ds = dateadd(dd,1,@ds);
end;
-- This CTE computes the occurrence number for each weekday within the given
-- month (e.g., first Monday gets a seq of 1, second Monday is 2...) Resets
-- with the month changes. Filters down to just the desired date.
with c as (select date,row_number() over (partition by month order by date) as seq from @days where dayOfWeek=@dayOfWeek)
-- We can now insert the target dates. We know that seq has the
-- nth occurrence of that weekday in the month.
insert into @d select date from c where seq = @gap;
return;
end;
The magic here is in the row_number() function and in its over clause. Here the row_number() is computed as a sequentially increasing value for each occurrence of a given weekday within a month. We sort by date here so the first day of interest gets the value of one in the sequence. The partition by reset the sequence to one when the month changes.
So why would we want to write this as a TVF rather than say a stored procedure? The immediate answer might not be so obvious, but here's a use case: suppose your company has started a bonus program for newly hired employees. Employees are eligible to participate in the program so long as they were hired before the third Friday in the month when payroll data get generated. All it takes to refractor this function is a couple of tweaks. First, we need to have the function accept a hiredate as a parameter. Second, we need to modify the final insert statement to this:
insert into @d select date from c where seq = @gap and datepart(m,date) = datepart(m,@hireDate) and datepart(d,@hireDate) LESS-THAN-OR-EQUAL-TO datepart(d,date)
We can now leverage the TVF using the new APPLY operator, in this case, the OUTER APPLY variant thusly:
select EmployeeID,hireDate,date from adventureworks.HumanResources.Employee e outer apply dbo.NthGivenWeekDaysForYear(2002,3,6,e.hiredate) where datepart(yyyy,e.hiredate) = 2002
Because Employee 284 missed the cutoff by one day, the TVF returns a null value for that row, and we know that this employee didn't qualify for any bonus for May of 2002.
Those of you who've met me probably have figured out that I'm not the most agile person ever. One time in Gym class, when they were trying to get me to play basketball, the coach said "you're the only kid I've ever known who can't pivot." I never thought this would come back to haunt me (its not like I try to play basketball), but it did recently when working with a matrix. Its not hard to imagine treating a table in SQL Server as a matrix. What is hard -- or at least is a lot of code to write -- is doing a join on matrix to some other table. Consider for example the case of an Italian resturant. They've decided to use SQL Server to manage a number of things and a couple of them are lists of dished they prepare and a table that organizes these dishes into four-course meals. The tables might be structured something like this:
create table dbo.menuItems (itemID tinyint identity(1,1) primary key,name varchar(30) not null unique,meatless bit not null)
and
create table dbo.menu(mealID tinyint identity(1,1) primary key,antipasti tinyint,primi tinyint,secondi tinyint,dolci tinyint)
After adding some data, the contents of dbo.menu look something like:
mealID antipasti primi secondi dolci
------ --------- ----- ------- -----
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
4 4 8 12 16
And that's nice, but its not very usable by a human being. What I'd really like to get is:
mealID antipasti primi secondi dolci
------ ------------------------------ ------------------------------ ------------------------------ ------------------------------
1 Crostini Ravioli Nudi di Pesce Pastello di Pesce Granita
2 Prosciutto e meloni Risotto alla Milanese Ossobuco Timballo di Pere
3 Carpaccio Stracciatella Saltimbocca Castagnaccio
4 Polenta Fritta Trippa alla Fiorentina Bollito Misto Zabaione
But how do you do that? In SQL Server 2005, a couple of new operators make this pretty easy: UNPIVOT and PIVOT. But how and why? Let's consider our matrix could also be represented as a list of paired values:
course itemID
---------- ------
antipasti 1
primi 5
secondi 9
dolci 13
antipasti 2
primi 6
secondi 10
dolci 14
antipasti 3
primi 7
secondi 11
dolci 15
antipasti 4
primi 8
secondi 12
dolci 16
The unpivot operator makes that's easy to do:
select u.menuID,itemID from dbo.menu m unpivot (itemID for menuID in (antipasti,primi,secondi,dolci)) as u
And it makes joining the dishes in that list back to the table of dishes (dbo.MenuItems) rather easy: select u.menuID,i.name from dbo.menu m unpivot (itemID for menuID in (antipasti,primi,secondi,dolci)) as u join dbo.menuItems i on u.itemID = i.itemID So while that's cool, we're still dealing with a pair-list, not a matrix. Now, I might be a white guy that can't jump, but I certainly can PIVOT back into the matrix.
;with menus as (select mealID,u.menuID,i.name from dbo.menu m unpivot (itemID for menuID in (antipasti,primi,secondi,dolci)) as u join dbo.menuItems i on u.itemID = i.itemID) select * from menus pivot(max(name) for menuID in ([antipasti],[primi],[secondi],[dolci])) as p
Take that, Coach!
There's a Zen proverb that accurately describes this book: "If you understand, things are just as they are; if you do not understand, things are just as they are."
What do I mean by this? If you already understand writing OLAP queries, this book doesn't really introduce anything that likely to be new to you. If don't, Joe does a decent job of explaining the basics and demonstrating useful techniques in standard SQL and even, albeit briefly, Microsoft's MDX language. He presents things just as they are. There's a few things I like about this book and some I don't. Let's start with the negative.
- Its Joe being Joe. Okay, that's not bad or wrong per se, but here again he goes on his traditional rants about vendor-specific extensions, the differences between records and rows and the inappropriateness of identity based primary keys. If you are used to reading or listening to Joe, you expect this. If not, then you'll probably find these tangents interrupt the reading flow significantly.
- Its not T-SQL: If you're buying this book specifically to become a better T-SQL Developer, it can certainly help you do that. However, understand that Joe writes standard SQL, not Microsoft's T-SQL, so not all of his examples are directly usable. This is especially true for one of his ROW_NUMBER() examples. On page 73, he has a query that uses this function to compute the median of vector and references the computed values in the where clause. SQL Server 2005 doesn't support this. The example at the end of this post shows how to do write a similar query in SQL Server 2005.
- There might be a bug or two: Joe states on pages 51 and 52 that the MERGE INTO statement exists in SQL Server 2005. So far, I've been unable to prove this, at least as far the RTM version goes. I'm working with Niels Berglund to see if it was a feature in early versions (e.g., Alphas, Betas and CTPs).
Those negatives granted, what was good about this book -- and why do I suggest buy it?
- Its Joe being Joe. Personally, I appreciate his get to the point and show me code style. This is a book you can pick up and read in a few hours and come out with a better understanding of how to use SQL to do analytic queries. Its also a good -- but limited -- primer on OLAP concepts and terminology. If you're looking for a good place to take your first steps into this world, this book is a good choice.
- Chapter 11 on computing and using Correlations is outstanding: A while ago, there was another book on the market -- Data Mining and Statistical Analysis Using SQL by Lovett and Trueblood -- that was the book on doing statistical computing with SQL Server. I'm not sure that APRESS is still printing it, but I see that its available on EBay and Amazon Marketplace. Joe does a nice job covering this particular topic in five pages.
- There's lots of other chapters worth noting: Chapters 12 and 14 on distributions and regressions respectively are also very good. Chapter 18 on MDX is a too-brief but solid introduction to that topic. Joe doesn't try to teach you everything you should know on this complex topic, but again, its a great place to get your first foothold.
I think this book is a solid choice to make if you are interested in learning more about OLAP and Analytics from the SQL language point of view. It helps you understand these concepts "just as they are."
Median-by-RowNumber() Example for SQL Server 2005
-- Prepare table and fill with data
drop table dbo.foobar
go
create table dbo.foobar(X smallint not null)
go
set nocount on
declare @i int
set @i = 0
while(@i < 1001) begin
insert into dbo.foobar values (cast(rand() * 32767 as smallint))
set @i = @i + 1
end
go
-- Get the computed average
select avg(X) as Average from dbo.foobar
go
/* Replica of Joe's query
select avg(X),
row_number() over (order by X asc) as hi,
row_number() over (order by X desc) as lo
from dbo.foobar
where hi in (lo,lo+1,lo-1)
*/
-- Find the median values
with f as (
select X,
row_number() over (order by X asc) as hi,
row_number() over (order by X desc) as lo
from dbo.foobar)
select X as median from f
where hi in (lo,lo+1,lo-1)
go
I was happily coding up an example where I have a ADO.NET 1.x DataTable that's has an auto-incrementing field. Being the conservative type I am, I had coded the two fields to be of type Byte. When trying to create the ForeignKeyConstraint related to that column, I got an exception:
Property not accessible because 'Parent Columns and Child Columns don't have type-matching columns.'.
Sure enough, if you go check the DataType of the auto-incrementing column, its been "upgraded" to System.Int32. Silently. Nice. Now that you know the problem, the fix is easy. Make the columns Int32's. Ick.
Leason learned after 20 minutes of chasing that one down. Its going to be a grrrreeeaaat week.
A few months ago, I published a step-by-step procedure for using certificate to sign an assembly so an external access or unsafe assembly could be run without having to set the hosting database to trustworthy. The major problem with that process was its weight: a lot of steps and some still requirements in terms of certificate management. Turns out that yes, there is an easier way to make it work.
Part one -- preparing the master database
- As needed, start a series of queries against the MASTER database.
- Create the target database (meaning execute a CREATE DATABASE query).
- Code and compile the assembly you want to deploy. Make sure you sign the assembly with a strong name key file. This can be done with a PFX file generated by Visual Studio.
- Create an asymmetric key from from the compiled assembly using a statement like this: create asymmetric key <key_name> from executable file = '<path_to_dll_file>'
- Create a login based on that asymmetric key using a statement like this: create login <login_name> from asymmetric key <key_name>
- Grant that login the right to create either or both an unsafe or external access assembly (as needed) using: grant unsafe assembly to <login_name>
Part two -- preparing the hosting database
- As needed, start a series of queries against the desired user database.
- Create a user in that database mapped to the login created in part, step 5. (e.g. create user <db_user> from login <login_name>)
- Give that user the right to catalog an assembly, e.g.: grant create assembly to <db_user>
- Catalog the desired assembly using the now trusted asymmetric key with a statement like: create assembly <assembly_name> authorization <db_user> from '<path_to_dll_file>' with permission_set = unsafe
From there on, its just a matter of mapping your methods and classes to the desired T-SQL objects.
Note: the following is just to help Google find this post.
is not authorized for PERMISSION_SET = UNSAFE. The assembly is authorized when either of the following is true: the database owner (DBO) has UNSAFE ASSEMBLY permission and the database has the TRUSTWORTHY database property on; or the assembly is signed with a certificate or an asymmetric key that has a corresponding login with UNSAFE ASSEMBLY permission. RegistryReader
Yesterday I was relaxing a bit at the Red Carpet Club (D7 gate) as the Dulles Airport when an Instant Message popped up. It was from one of my fellow DevelopMentor instructors. He was wondering why this query was complaining that RN is an invalid column:
SELECT ROW_NUMBER() OVER (ORDER BY IMAGEID) AS RN,IMAGEID,IMG FROM DBO.IMAGES WHERE RN BETWEEN 10 AND 19
Unless you know how these ranking functions like ROW_NUMBER work, you might find yourself asking the same question. The trick here is that the value of the any of the ranking functions is computed after the underlying query against the table is completed. Therefore, the column RN isn't available in the WHERE predicate. So the next question is how do we work around this for making pages of data? That, in turn produced what seemed like a good from my cohort. Here's two variations on a solution:
SELECT RN,IMAGEID,IMG FROM (SELECT ROW_NUMBER() OVER(ORDER BY IMAGEID) AS RN,* FROM DBO.IMAGES) AS T WHERE RN BETWEEN 10 AND 19
This solution uses a sub-query to feed the computed result -- the query within the parens -- up to another query -- where the computed value for the row number is available. Another solution for the problem is:
WITH C(RN,IMAGEID,IMG) AS (SELECT ROW_NUMBER() OVER (ORDER BY IMAGEID),IMAGEID,IMG FROM DBO.IMAGES) SELECT IMAGEID,IMG FROM C WHERE RN BETWEEN 10 AND 19
This uses a Common Table Experssion (the with... as() part) to do the same thing as the previous query. So the question becomes, is this a more efficient query than doing something like this:
SELECT TOP(10) IMAGEID,IMG FROM DBO.IMAGES WHERE IMAGEID >= 10 ORDER BY IMAGEID
The reason for the question was that the first two queries have two select within them, and that might lead one to believe that two queries over the data are required to produce the desired result. However this is not the case. Both the sub-query and the CTE generate operationally identical query plans, namely:
select <- filter <- top <- sequence project (compute scalar) <- compute scalar <- segment <- clustered index scan
meaning that first SQL Server finds a batch of row values from the primary key index on the table, reduces that to a set rows and then select that out to the ten records of interest. So there's while there's no second recordset created, there is an intermediate working set created. The last query, the SELECT TOP(10)..., avoid this the intermediate result set. This generates a more efficient query plan (8.5% lower cost) of:
select <- top <- clustered index scan
For a query over a four-thousand row example table.
The most important thing to note about the query efficiency here that unless t