When I was first introduced to Peanut Butter Chicken, I had many
questions: What does it taste like? Why would one want to mix Peanut Butter
and Chicken? How do you prepare it? Who thought this up? I was hooked for life
with the first bite.
When I first heard that the next version of Microsoft SQL Server,
"Yukon," would have the .NET Common Language Runtime (CLR) hosted
within it, similar questions arose: Why would you want to do that? How do I
take advantage of it? Within a few minutes, I found myself looking at a number
of examples that, frankly, did not convince me that there was a good reason to
use this feature. What I really needed to see was a meaty business problem where
using the CLR was practical. Thankfully, a good business problem was at hand.
The Problem
I work at a leading Engineering and Architecture firm headquartered
in Omaha, Nebraska: HDR, Inc. The company has more than just an Omaha office.
In fact, HDR has over 100 offices and project sites. This means that it is common
for our staff members to travel from office to office as they work on projects.
That travel is frequently air travel and, sometimes, people are not sure which
airports are close to the offices. So how could
I help?
Being a database-thinker, my first thought was to have a simple database:
- A table containing location information about sites, named Sites.
- Another table containing location information about various airports,
named Airports.
- A third table that associated each site with one or more airports,
named — rather non-creatively — SitesAirports.
This design has a couple of problems. First, the person who maintains
the data would have to research which airports were close to each site and they'd
have to do that again whenever we added a new office. When an airport opened
or closed, there would be similar work to do. Finally, there was the question
of "What do you mean by close?"
Geometry Is Easier Than Calculus
One nice side benefit of working with HDR is that people who love
using technology to solve problems surround me. HDR has a number of professionals
versed in Geographical Information Systems (GIS). The basic value of GIS is
that it allows you to relate things in a physical space with information-driven
models, and that is exactly what I needed to do here. In an information-driven
system, I needed to relate to entities: Sites and Airports. Why not borrow a
fundamental GIS concept: latitudes and longitudes?
As you probably know, latitudes and longitudes represent points
on Earth. Latitudes represent the distance from the Poles (North and South),
while longitudes represent the distance East or West of the Prime Meridian.
Both latitudes and longitudes are expressed in degrees of a circle. For example,
some of the best Peanut Butter Chicken I ever had was served up by a little
restaurant in Lincoln, Nebraska. That restaurant is, more or less, located at
40°45"14' North, 96°38"44' West in terms of latitude and longitude.
Decimal degrees are more commonly used; in this case, 40.7539, -96.6428.
Since we knew the address of each of our offices, it simply became
a matter of figuring out their latitudes and longitudes. We used MapPoint
to find them, but you could also use MSN Maps as well as other mapping packages.
Finding the latitudes and longitudes of airports was a bit different though.
I did a few searches on Google and came up with a good list of airports and
their locations. That left me with one task: How would I associate a site with
an airport?
Turns out that if you have the latitudes and longitudes of two places, you can quickly determine the distance between them with this simple formula (expressed in pseudo-code):
dlon = lon2 - lon1
dlat = lat2 - lat1
a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2
c = 2 * atan2(sqrt(a), sqrt(1-a))
d = 3956 * c
The 3,956 value is the median radius of the Earth from pole to pole, in miles.
Since we can compute the distance between a given site and airport,
we can relate them as being closer or more distant. (See the footnote below for more information.)
While our formula seems simple, it is actually more computationally
complex than, say, finding the amount of sales tax to charge.
This is a good thing when it comes to using the CLR functions in Yukon.
Since we need to use this calculation in queries, one
of our Database Administrators developed a user-defined function as follows:
CREATE FUNCTION dbo.udfComputeDistance
(
@lat1 float,
@lon1 float,
@lat2 float,
@lon2 float
)
RETURNS float
AS
begin
-- dLong represents the differences in longitudes
-- while dLat is the difference in latitudes
declare @dLong float
declare @dLat float
-- To keep the calculation easier to understand,
-- we have simplified it by computing it by parts.
-- This value temporarily holds the value of the
-- first calculation.
declare @temp float
-- Convert the decimal degrees to radians
set @lat2 = radians(@lat2)
set @lon1 = radians(@lon1)
set @lat1 = radians(@lat1)
set @lon2 = radians(@lon2)
-- Compute the degree differences
set @dLong = @lon2 - @lon1
set @dLat = @lat1 - @lat2
-- Compute the first part of the equation
set @temp = (square(sin(@dLat/2.0))) + cos(@lat2) * cos(@lat1) * (square(sin(@dLong/2.0)))
-- Return the approximate distance in miles
-- Note that 3956 is the approximate median radius of the Earth.
return (2.0 * atn2(sqrt(@temp), sqrt(1.0-@temp)))*3956.0
end
We used this function to find all of airports within
100 miles of each of sites with a simple query like this:
select s.name
,s.city
,s.state
,a.name
,a.iatacode, dbo.udfComputeDistance(s.latitude,s.longitude
,a.latitude,a.longitude) as distance
from sites s, airports a
where dbo.udfComputeDistance(s.latitude,s.longitude
,a.latitude,a.longitude) <= 100.0
order by 1,4
Query Analyzer reported that it took about seven seconds to run this query, and the ordering of results was less than 1% of the execution time. Since this information is used infrequently, we coded the query up into a stored procedure and left it at that. While it is a simple and working solution, those seven seconds bothered me a bit. After all, we are not running our SQL Servers on computers powered by first-generation Pentiums!
Epiphany
It should not come as any surprise that the time culprit is how
SQL Server Yukon writes computations as T-SQL. The query engine is not designed
to give the CPU a set of instructions optimized for doing this type of computation,
however, the CLR is. Having the CLR hosted in SQL Server Yukon is better
because you can give the CPU optimized instructions for numerical computations and just as important, you don't have to reinvent functionality already found in the
.NET Framework.
The first byte, if you will, that addicted me to using the CLR in
SQL Server Yukon was cutting the execution time for this query down from seven seconds
to one second by using a CLR-based user-defined function. By hosting the CLR within SQL Server Yukon, we get the best of both worlds:
a great set-oriented tool for relating data (T-SQL) and an equally great function-oriented
tool for working with data in your .NET language of choice.
Using the CLR
Now, if there is a trick to programming that I have learned in last
20 or so years, it is simply to start with what you know. In this case, I decided
to get most of the bugs out of the function I wanted to make available in SQL
Server Yukon by debugging code using the Visual Studio .NET (VS .NET) "Whidbey" IDE.
This would be much easier than trying to debug the code once imported into the Server.
I first created a C# console project and coded the following as Class1.cs:
using System;
using System.Data;
using System.Data.SqlTypes;
namespace test
{
class Class1
{
private const double PI_OVER_180 = 0.0174532925;
private static double radians(double DecimalDegrees)
{
return DecimalDegrees * PI_OVER_180;
}
public static SqlDouble ComputeDistance(SqlDouble FromLat,
SqlDouble FromLong, SqlDouble ToLat, SqlDouble ToLong)
{
double lat1, lat2, lon1, lon2,
dLong = 0.0, dLat = 0.0, subCalc = 0.0;
lat1 = radians((double)(FromLat));
lon1 = radians((double)(FromLong));
lat2 = radians((double)(ToLat));
lon2 = radians((double)(ToLong));
dLong = (double)(lon2 - lon1);
dLat = (double)(lat2 - lat1);
subCalc = (Math.Pow(Math.Sin(dLat / 2.0), 2.0))
+ Math.Cos(lat2) * Math.Cos(lat1)
* (Math.Pow(Math.Sin(dLong / 2.0), 2));
return ((2.0 * Math.Atan2(Math.Sqrt(subCalc),
Math.Sqrt(1.0 - subCalc))) * 3956.0);
}
[STAThread]
public static void Main(string[] args)
{
Console.WriteLine(
ComputeDistance(40.7539,-96.6428, 41.28692,-96.07023));
Console.ReadLine();
}
}
}
I know from some previous exploration with MapPoint that the restaurant
that makes my favorite Peanut Butter Chicken was about 50 miles from where I
live. Plugging those latitudes and longitudes into my test program, it determined
that I am less than 48 miles from it, as shown in FIGURE 1. Close enough for
me to want to go get some!

FIGURE 1: Testing the latitudes and longitudes
Feeling confident that my function worked correctly enough, I started
a new SQL Server Project named asmDistanceLibrary. As shown in FIGURE 2, this project
type will show up in the New Project dialog box when you install SQL Server Yukon's developer
bits on a computer and then install VS .NET Whidbey.

FIGURE 2: The New Project dialog box
In talking to some other folks, it is much easier to use the Whidbey-provided
templates for these projects. The reason for doing this is that you have to
bind in three other Dynamic Link Libraries (DLLs) into your project's output
DLL so that SQL Server Yukon can load and register your function:
- ClrCppModule.dll
- Sqlaccess.dll
- Microsoft.VisualStudio.DataTools.SqlAttributes.dll
After you accept the project name and location, you will be prompted
to define a database connection. You can select from one of the connections
already known to the Server Explorer, or you can create a new one.
I picked the name asmDistanceLibrary for one reason:
I usually start the name of an object with some contraction of the object type
(except for tables). After removing the inane Class1.cs file, I added the new
file named DistanceLibrary and coded it like this:
using System;
using System.Data.Sql;
using System.Data.SqlTypes;
using Math = System.Math;
public class CDistanceLibrary
{
private const double PI_OVER_180 = 0.0174532925;
private static double radians(double DecimalDegrees)
{
return DecimalDegrees * PI_OVER_180;
}
public static SqlDouble ComputeDistance(SqlDouble FromLat,
SqlDouble FromLong, SqlDouble ToLat, SqlDouble ToLong)
{
double lat1, lat2, lon1, lon2,
dLong = 0.0, dLat = 0.0, subCalc = 0.0;
lat1 = radians((double)(FromLat));
lon1 = radians((double)(FromLong));
lat2 = radians((double)(ToLat));
lon2 = radians((double)(ToLong));
dLong = (double)(lon2 - lon1);
dLat = (double)(lat2 - lat1);
subCalc = (Math.Pow(Math.Sin(dLat / 2.0), 2.0))
+ Math.Cos(lat2) * Math.Cos(lat1)
* (Math.Pow(Math.Sin(dLong / 2.0), 2));
return ((2.0 * Math.Atan2(Math.Sqrt(subCalc),
Math.Sqrt(1.0 - subCalc))) * 3956.0);
}
};
If this code looks too familiar, good, it should. I have already
tested the code as a console application, and can be sure it is going to work.
The only notable differences are that I do not have a main()
function and the class name has changed to CDistanceLibrary
at the IDE's suggestion. Just compile, deploy it, and I should be ready to go,
right? After all, when I deploy, it should push the DLL over to SQL Server Yukon and make it ready to go, right?
Well, actually...no. At least, not yet. To understand why, though,
we have to take a bit of a detour into what is really going on between SQL Server Yukon
and the CLR.
In normal operation, SQL Server processes T-SQL queries much like
a script interpreter would. Each statement in the query causes the query engine
to take some action, like fetching data from the disk, ordering rows, and so
on. Normally, all of the programmatic logic involved is — more or less
— contained within SQL Server libraries. Two known exceptions exist:
extended stored procedures and CLR assemblies. When the query engine encounters
a request to use either of these services, it attempts to resolve an entry point
into them. When it finds that entry point, it passes whatever data is indicated
to those functions and then passes execution control to that logic. When the
procedure or assembly returns control of execution back to SQL Server, the query
engine picks up the results and errors and proceeds as normal.
Therein lies the answer to why deploying a VS .NET Whidbey project
does not appear to have any effect on the SQL Server. At least with the PDC
bits, it appears that deploying a project actually registers the assembly with
SQL Server Yukon, but it has no idea where the entry point to it is. We will have
to do that by hand.
SQL Server Workbench Project
The process of registering an assembly with SQL Server Yukon is actually
darned simple. Start by running SQL Server Workbench and start a new SQL Server Query.
The statement you want to issue is:
USE [Your Database Name Goes Here]
DROP FUNCTION clrComputeDistance
GO
DROP ASSEMBLY asmDistanceLibrary
GO
CREATE ASSEMBLY asmDistanceLibrary
FROM '[The path to your DLL]\asmDistanceLibrary.dll'
WITH PERMISSION_SET = SAFE
GO
Note that you will need to change this query to go into the database
to which you want to have the assembly known. You will also need to update the
path to where your DLL was compiled.
Now that SQL Server Yukon knows where your assembly is, we also need to
make it usable as one of the programmatic objects Query Analyzer can use. I
really like user-defined functions for this kind of work, but you could
use stored procedures just as well. The T-SQL needed is:
CREATE FUNCTION dbo.clrComputeDistance
(
@lat1 as float,
@lon1 as float,
@lat2 as float,
@lon2 as float
)
RETURNS float
AS EXTERNAL NAME asmDistanceLibrary:CDistanceLibrary::ComputeDistance
GO
If you have not worked with extended stored procedures, the only
new thing here might be the AS EXTERNAL NAME clause. What this example does
is ask SQL Server Yukon to register an entry point into ComputeDistance
method of the asmDistanceLibrary assembly, CDistanceLibrary
class.
We can test the function rather easily now with this query (see FIGURE 3):
SELECT dbo.clrComputeDistance(40.7539,-96.6428, 41.28692,-96.07023)

FIGURE 3: Testing the query in SQL Server Workbench
Finally, we can get back to the business problem, and figure out
what airports are closest to our sites. It is just a simple query away now (see FIGURE 4):
SELECT site.name,site.city,site.state
,airport.name,airport.iataCode
,ROUND(DBO.UDFCOMPUTEDISTANCE(site.LATITUDE,site.LONGITUDE,airport.LATITUDE,airport.LONGITUDE),2) AS distance
FROM sites site,airports airport
WHERE DBO.CLRCOMPUTEDISTANCE(site.LATITUDE,site.LONGITUDE,airport.LATITUDE,airport.LONGITUDE) <= 100
ORDER BY site.name,distance

FIGURE 4: The finished product
Conclusion
We have seen why and how you can use SQL Server Yukon with
the .NET assemblies generated by VS .NET Whidbey. To me, there are four instances
where it makes sense to use this technology. The first is when you have some
complex mathematics involved in an underlying calculation in a repeated fashion
as I demonstrated in this tutorial. The benefit here is dramatically faster
calculations. Another appropriate use is when you have to accomplish something
for which T-SQL is not well suited, like parsing comma-delimited strings into
a list of values. A third case arises when you want to accomplish something
impossible to do with T-SQL alone. Suppose, for example, you have developed
a stored procedure that returns the results of a query as an XML document fragment,
but what you really want is the result of applying an XSL style sheet that renders
the document as HTML. This is painful to do with T-SQL, yet is quite easy with
the .NET CLR. The last case is to avoid creating or maintaining extended stored
procedures in C++. Not that C++ is bad, but as we have seen here, writing assemblies
is much easier.
Now if somebody could make getting some great Peanut Butter Chicken
that easy....
There are a few of technical problems worth
mentioning:
- The formula literally computes the distance between the two points
as a straight line, and ignores any changes in the elevation of the two points.
You really cannot use this formula to say that one point is a so-many mile
drive from one point to another. It might be accurate for crows, but it is
very unlikely to be spot on for driving distance.
- No airport or building that I am aware of is infinitely small
like a point. In fact, airports can be massive in terms of the area they fill.
Given this, it is best to think of latitudes and longitudes as being within
a given facility, rather than representing the exact location of a facility.
- Finally, due to the squished-ball shape of the Earth, the digital
nature of computers, and the inaccuracy in our mapping tools, there are bound
to be some rounding errors involved.