Using SQL Server Yukon's .NET CLR Features Practically
By Kent Tegels MCDBA, MCSE+I, MCP+SB
Published: 1/26/2004
Reader Level: Intermediate
Rated: 4.67 by 6 member(s).
Tell a Friend
Rate this Article
Printable Version
Discuss in the Forums

When I was first introduced to Peanut Butter Chicken, I had many questions: What does it taste like? Why would one want to mix Peanut Butter and Chicken? How do you prepare it? Who thought this up? I was hooked for life with the first bite.

When I first heard that the next version of Microsoft SQL Server, "Yukon," would have the .NET Common Language Runtime (CLR) hosted within it, similar questions arose: Why would you want to do that? How do I take advantage of it? Within a few minutes, I found myself looking at a number of examples that, frankly, did not convince me that there was a good reason to use this feature. What I really needed to see was a meaty business problem where using the CLR was practical. Thankfully, a good business problem was at hand.

The Problem

I work at a leading Engineering and Architecture firm headquartered in Omaha, Nebraska: HDR, Inc. The company has more than just an Omaha office. In fact, HDR has over 100 offices and project sites. This means that it is common for our staff members to travel from office to office as they work on projects. That travel is frequently air travel and, sometimes, people are not sure which airports are close to the offices. So how could I help?

Being a database-thinker, my first thought was to have a simple database:

  • A table containing location information about sites, named Sites.
  • Another table containing location information about various airports, named Airports.
  • A third table that associated each site with one or more airports, named — rather non-creatively — SitesAirports.

This design has a couple of problems. First, the person who maintains the data would have to research which airports were close to each site and they'd have to do that again whenever we added a new office. When an airport opened or closed, there would be similar work to do. Finally, there was the question of "What do you mean by close?"

Geometry Is Easier Than Calculus

One nice side benefit of working with HDR is that people who love using technology to solve problems surround me. HDR has a number of professionals versed in Geographical Information Systems (GIS). The basic value of GIS is that it allows you to relate things in a physical space with information-driven models, and that is exactly what I needed to do here. In an information-driven system, I needed to relate to entities: Sites and Airports. Why not borrow a fundamental GIS concept: latitudes and longitudes?

As you probably know, latitudes and longitudes represent points on Earth. Latitudes represent the distance from the Poles (North and South), while longitudes represent the distance East or West of the Prime Meridian. Both latitudes and longitudes are expressed in degrees of a circle. For example, some of the best Peanut Butter Chicken I ever had was served up by a little restaurant in Lincoln, Nebraska. That restaurant is, more or less, located at 40°45"14' North, 96°38"44' West in terms of latitude and longitude. Decimal degrees are more commonly used; in this case, 40.7539, -96.6428.

Since we knew the address of each of our offices, it simply became a matter of figuring out their latitudes and longitudes. We used MapPoint to find them, but you could also use MSN Maps as well as other mapping packages. Finding the latitudes and longitudes of airports was a bit different though. I did a few searches on Google and came up with a good list of airports and their locations. That left me with one task: How would I associate a site with an airport?

Turns out that if you have the latitudes and longitudes of two places, you can quickly determine the distance between them with this simple formula (expressed in pseudo-code):

dlon = lon2 - lon1
dlat = lat2 - lat1
a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2
c = 2 * atan2(sqrt(a), sqrt(1-a))
d = 3956 * c

The 3,956 value is the median radius of the Earth from pole to pole, in miles.

Since we can compute the distance between a given site and airport, we can relate them as being closer or more distant. (See the footnote below for more information.)

While our formula seems simple, it is actually more computationally complex than, say, finding the amount of sales tax to charge. This is a good thing when it comes to using the CLR functions in Yukon.

Since we need to use this calculation in queries, one of our Database Administrators developed a user-defined function as follows:

CREATE FUNCTION dbo.udfComputeDistance
(
  @lat1 float,
  @lon1 float,
  @lat2 float,
  @lon2 float
)
RETURNS float
AS
begin
  -- dLong represents the differences in longitudes
  -- while dLat is the difference in latitudes
  declare @dLong float
  declare @dLat float
  -- To keep the calculation easier to understand,
  -- we have simplified it by computing it by parts.
  -- This value temporarily holds the value of the
  -- first calculation.
  declare @temp float
  -- Convert the decimal degrees to radians
  set @lat2 = radians(@lat2)
  set @lon1 = radians(@lon1)
  set @lat1 = radians(@lat1)
  set @lon2 = radians(@lon2)
  -- Compute the degree differences
  set @dLong = @lon2 - @lon1
  set @dLat = @lat1 - @lat2
  -- Compute the first part of the equation
  set @temp = (square(sin(@dLat/2.0))) + cos(@lat2) * cos(@lat1) * (square(sin(@dLong/2.0)))
  -- Return the approximate distance in miles
  -- Note that 3956 is the approximate median radius of the Earth.
  return (2.0 * atn2(sqrt(@temp), sqrt(1.0-@temp)))*3956.0
end

We used this function to find all of airports within 100 miles of each of sites with a simple query like this:

select s.name
,s.city
,s.state
,a.name
,a.iatacode, dbo.udfComputeDistance(s.latitude,s.longitude
,a.latitude,a.longitude) as distance
from sites s, airports a
where dbo.udfComputeDistance(s.latitude,s.longitude
,a.latitude,a.longitude) <= 100.0
order by 1,4

Query Analyzer reported that it took about seven seconds to run this query, and the ordering of results was less than 1% of the execution time. Since this information is used infrequently, we coded the query up into a stored procedure and left it at that. While it is a simple and working solution, those seven seconds bothered me a bit. After all, we are not running our SQL Servers on computers powered by first-generation Pentiums!

Epiphany

It should not come as any surprise that the time culprit is how SQL Server Yukon writes computations as T-SQL. The query engine is not designed to give the CPU a set of instructions optimized for doing this type of computation, however, the CLR is. Having the CLR hosted in SQL Server Yukon is better because you can give the CPU optimized instructions for numerical computations and just as important, you don't have to reinvent functionality already found in the .NET Framework.

The first byte, if you will, that addicted me to using the CLR in SQL Server Yukon was cutting the execution time for this query down from seven seconds to one second by using a CLR-based user-defined function. By hosting the CLR within SQL Server Yukon, we get the best of both worlds: a great set-oriented tool for relating data (T-SQL) and an equally great function-oriented tool for working with data in your .NET language of choice.

Using the CLR

Now, if there is a trick to programming that I have learned in last 20 or so years, it is simply to start with what you know. In this case, I decided to get most of the bugs out of the function I wanted to make available in SQL Server Yukon by debugging code using the Visual Studio .NET (VS .NET) "Whidbey" IDE. This would be much easier than trying to debug the code once imported into the Server. I first created a C# console project and coded the following as Class1.cs:

using System;
using System.Data;
using System.Data.SqlTypes;
namespace test
{
  class Class1
  {
    private const double PI_OVER_180 = 0.0174532925;
    private static double radians(double DecimalDegrees)
    {
      return DecimalDegrees * PI_OVER_180;
    }
    public static SqlDouble ComputeDistance(SqlDouble FromLat,
      SqlDouble FromLong, SqlDouble ToLat, SqlDouble ToLong)
    {
      double lat1, lat2, lon1, lon2,
      dLong = 0.0, dLat = 0.0, subCalc = 0.0;
      lat1 = radians((double)(FromLat));
      lon1 = radians((double)(FromLong));
      lat2 = radians((double)(ToLat));
      lon2 = radians((double)(ToLong));
      dLong = (double)(lon2 - lon1);
      dLat = (double)(lat2 - lat1);
      subCalc = (Math.Pow(Math.Sin(dLat / 2.0), 2.0))
        + Math.Cos(lat2) * Math.Cos(lat1)
        * (Math.Pow(Math.Sin(dLong / 2.0), 2));
      return ((2.0 * Math.Atan2(Math.Sqrt(subCalc),
        Math.Sqrt(1.0 - subCalc))) * 3956.0);
    }
    [STAThread]
    public static void Main(string[] args)
    {
      Console.WriteLine(
        ComputeDistance(40.7539,-96.6428, 41.28692,-96.07023));
      Console.ReadLine();
    }
  }
}

I know from some previous exploration with MapPoint that the restaurant that makes my favorite Peanut Butter Chicken was about 50 miles from where I live. Plugging those latitudes and longitudes into my test program, it determined that I am less than 48 miles from it, as shown in FIGURE 1. Close enough for me to want to go get some!

screenshot

FIGURE 1: Testing the latitudes and longitudes

Feeling confident that my function worked correctly enough, I started a new SQL Server Project named asmDistanceLibrary. As shown in FIGURE 2, this project type will show up in the New Project dialog box when you install SQL Server Yukon's developer bits on a computer and then install VS .NET Whidbey.

screenshot

FIGURE 2: The New Project dialog box

In talking to some other folks, it is much easier to use the Whidbey-provided templates for these projects. The reason for doing this is that you have to bind in three other Dynamic Link Libraries (DLLs) into your project's output DLL so that SQL Server Yukon can load and register your function:

  • ClrCppModule.dll
  • Sqlaccess.dll
  • Microsoft.VisualStudio.DataTools.SqlAttributes.dll

After you accept the project name and location, you will be prompted to define a database connection. You can select from one of the connections already known to the Server Explorer, or you can create a new one.

I picked the name asmDistanceLibrary for one reason: I usually start the name of an object with some contraction of the object type (except for tables). After removing the inane Class1.cs file, I added the new file named DistanceLibrary and coded it like this:

using System;
using System.Data.Sql;
using System.Data.SqlTypes;
using Math = System.Math;
public class CDistanceLibrary
{
  private const double PI_OVER_180 = 0.0174532925;
  private static double radians(double DecimalDegrees)
  {
    return DecimalDegrees * PI_OVER_180;
  }
  public static SqlDouble ComputeDistance(SqlDouble FromLat,
    SqlDouble FromLong, SqlDouble ToLat, SqlDouble ToLong)
  {
    double lat1, lat2, lon1, lon2,
    dLong = 0.0, dLat = 0.0, subCalc = 0.0;
    lat1 = radians((double)(FromLat));
    lon1 = radians((double)(FromLong));
    lat2 = radians((double)(ToLat));
    lon2 = radians((double)(ToLong));
    dLong = (double)(lon2 - lon1);
    dLat = (double)(lat2 - lat1);
    subCalc = (Math.Pow(Math.Sin(dLat / 2.0), 2.0))
      + Math.Cos(lat2) * Math.Cos(lat1)
      * (Math.Pow(Math.Sin(dLong / 2.0), 2));
    return ((2.0 * Math.Atan2(Math.Sqrt(subCalc),
      Math.Sqrt(1.0 - subCalc))) * 3956.0);
  }
};

If this code looks too familiar, good, it should. I have already tested the code as a console application, and can be sure it is going to work. The only notable differences are that I do not have a main() function and the class name has changed to CDistanceLibrary at the IDE's suggestion. Just compile, deploy it, and I should be ready to go, right? After all, when I deploy, it should push the DLL over to SQL Server Yukon and make it ready to go, right?

Well, actually...no. At least, not yet. To understand why, though, we have to take a bit of a detour into what is really going on between SQL Server Yukon and the CLR.

In normal operation, SQL Server processes T-SQL queries much like a script interpreter would. Each statement in the query causes the query engine to take some action, like fetching data from the disk, ordering rows, and so on. Normally, all of the programmatic logic involved is — more or less — contained within SQL Server libraries. Two known exceptions exist: extended stored procedures and CLR assemblies. When the query engine encounters a request to use either of these services, it attempts to resolve an entry point into them. When it finds that entry point, it passes whatever data is indicated to those functions and then passes execution control to that logic. When the procedure or assembly returns control of execution back to SQL Server, the query engine picks up the results and errors and proceeds as normal.

Therein lies the answer to why deploying a VS .NET Whidbey project does not appear to have any effect on the SQL Server. At least with the PDC bits, it appears that deploying a project actually registers the assembly with SQL Server Yukon, but it has no idea where the entry point to it is. We will have to do that by hand.

SQL Server Workbench Project

The process of registering an assembly with SQL Server Yukon is actually darned simple. Start by running SQL Server Workbench and start a new SQL Server Query. The statement you want to issue is:

USE [Your Database Name Goes Here]
DROP FUNCTION clrComputeDistance
GO
DROP ASSEMBLY asmDistanceLibrary
GO
CREATE ASSEMBLY asmDistanceLibrary
FROM '[The path to your DLL]\asmDistanceLibrary.dll'
WITH PERMISSION_SET = SAFE
GO

Note that you will need to change this query to go into the database to which you want to have the assembly known. You will also need to update the path to where your DLL was compiled.

Now that SQL Server Yukon knows where your assembly is, we also need to make it usable as one of the programmatic objects Query Analyzer can use. I really like user-defined functions for this kind of work, but you could use stored procedures just as well. The T-SQL needed is:

CREATE FUNCTION dbo.clrComputeDistance
(
  @lat1 as float,
  @lon1 as float,
  @lat2 as float,
  @lon2 as float
)
RETURNS float
AS EXTERNAL NAME asmDistanceLibrary:CDistanceLibrary::ComputeDistance
GO

If you have not worked with extended stored procedures, the only new thing here might be the AS EXTERNAL NAME clause. What this example does is ask SQL Server Yukon to register an entry point into ComputeDistance method of the asmDistanceLibrary assembly, CDistanceLibrary class.

We can test the function rather easily now with this query (see FIGURE 3):

SELECT dbo.clrComputeDistance(40.7539,-96.6428, 41.28692,-96.07023)

screenshot

FIGURE 3: Testing the query in SQL Server Workbench

Finally, we can get back to the business problem, and figure out what airports are closest to our sites. It is just a simple query away now (see FIGURE 4):

SELECT site.name,site.city,site.state
,airport.name,airport.iataCode
,ROUND(DBO.UDFCOMPUTEDISTANCE(site.LATITUDE,site.LONGITUDE,airport.LATITUDE,airport.LONGITUDE),2) AS distance
FROM sites site,airports airport
WHERE DBO.CLRCOMPUTEDISTANCE(site.LATITUDE,site.LONGITUDE,airport.LATITUDE,airport.LONGITUDE) <= 100
ORDER BY site.name,distance

screenshot

FIGURE 4: The finished product

Conclusion

We have seen why and how you can use SQL Server Yukon with the .NET assemblies generated by VS .NET Whidbey. To me, there are four instances where it makes sense to use this technology. The first is when you have some complex mathematics involved in an underlying calculation in a repeated fashion as I demonstrated in this tutorial. The benefit here is dramatically faster calculations. Another appropriate use is when you have to accomplish something for which T-SQL is not well suited, like parsing comma-delimited strings into a list of values. A third case arises when you want to accomplish something impossible to do with T-SQL alone. Suppose, for example, you have developed a stored procedure that returns the results of a query as an XML document fragment, but what you really want is the result of applying an XSL style sheet that renders the document as HTML. This is painful to do with T-SQL, yet is quite easy with the .NET CLR. The last case is to avoid creating or maintaining extended stored procedures in C++. Not that C++ is bad, but as we have seen here, writing assemblies is much easier.

Now if somebody could make getting some great Peanut Butter Chicken that easy....

Footnote

There are a few of technical problems worth mentioning:

  • The formula literally computes the distance between the two points as a straight line, and ignores any changes in the elevation of the two points. You really cannot use this formula to say that one point is a so-many mile drive from one point to another. It might be accurate for crows, but it is very unlikely to be spot on for driving distance.
  • No airport or building that I am aware of is infinitely small like a point. In fact, airports can be massive in terms of the area they fill. Given this, it is best to think of latitudes and longitudes as being within a given facility, rather than representing the exact location of a facility.
  • Finally, due to the squished-ball shape of the Earth, the digital nature of computers, and the inaccuracy in our mapping tools, there are bound to be some rounding errors involved.


Marketplace
(Sponsored Links)
What are the green links?
   



 
Copyright 2007 CMP Tech LLC | Hosted By SecureWebs.com
Privacy Policy (4/10/06) | Your California Privacy Rights (4/10/06) | Terms of Service | Advertising Info | About Us | Help