March 2005 - Posts

Errors, warnings and information events - The non-feature feature

Have you noticed how errors, warnings and informations events don't get much discussion or ink in the press. We talk a lot about the high profile new features in Integration Services vs. DTS, the new dataflow task and other features, but we rarely mention errors. This non-feature is one of the more important changes in SSIS, but yet we hear and see very little discussion about them. Errors are something that few really appreciate until they need them, and then if they're not right, we rightly get complaints. But if they're good errors, that describe the problem well so that it can be easily diagnosed and fixed, it just feels good. The problem gets fixed and we move along to the next task at hand.

The IS team put countless hours into getting errors right. It's one of the unsung new features along with design time validation that doesn't get much credit or thought. Here's a very loose definition of errors, warnings and information events.

Errors are emitted when a component or package is known to be broken. Take the Send Mail task for example. Drop it on the designer surface and run the package and before the package even runs, you get a message box with these error messages in them:

Error at Bogus Task [Send Mail Task]: SMTP Server not specified
Error at Bogus Task [Send Mail Task]: No recipient specified
Error at Bogus Task: There were errors during task validation.

Instantly you know that the “Bogus Task” which is a task of type Send Mail Task had errors. In fact, you don't even need to run the package to know that the task has errors because it shows a little red and white x icon in the top right corner. This feature lets you know at a glance which tasks have errors. There is also a little ! icon for tasks that have warnings. These improvements are small, but drastically simplify and assist in getting packages up and running quickly.

Warnings are emitted whenever there is a cause for concern. For example, the Send Mail Task emits an error when you attempt to send email without a subject. This class of notifications typically won't cause the package to fail, but they may cause problems that are less severe.

Information events are emitted whenever there is something happening internally (inside the workflow or dataflow engines etc.) that the user should know about. This class of events doesn't imply a problem, but rather conveys information about the package as it runs. Examples of information events are when the runtime starts, commits or rolls back a transaction, when a package starts, completes and is successful or failed, the preparation stages for the dataflow task, when package migration succeeds, and when the FTP task sends and receives files and so forth.  

So, how does all this work?

Components raise error, warning and info events through the events interfaces. The designer picks certain intervals or edge events to call Validate on components and passes an event interface into components. One of the edge events the designer uses is when the designer dialog for a component has been closed. Also, when the designer first opens the package, it lazy validates the package. Lazy validation simply means that the designer does the validation when there isn't other stuff going on. It usually happens instantly, so it isn't that noticeable, but it doesn't always happen the moment you open the package either.

When the designer calls validate on components, they look at their settings or properties. There are certain properties that must be set for them to succeed if their execute method were called. Components know what those properties are and in general what they should look like. For the Send Mail task, certainly an empty recipient (To line) would mean that the task will fail if executed. So, the task raises an error and the designer, listening for errors, captures it and places it in the Errors window. Chances are you've never seen this window because it doesn't come up by default.

If you select “View” in the main menu, then “Other Windows” and then “Error List” it will show you the following window:

Now, if I right click on the error and select Online Help, it will open a browser window with help that is constantly being updated with the latest information. This is still a relatively new feature, so few of the errors show yet. But, give this a try the next time you get an error. As we get closer to shipping you should start to see more and more content about the errors and how to fix them. The beauty of this system (Kudos to our User Education folks) is as we get more information about how to diagnose the error and correct it, we can update the entries per error on the website. We can even provide links to MSDN articles etc. that will explain the error in more detail.

Integration Services has an enormous number of errors and warnings. It's a much more complex product than DTS and therefore, if we were to stick with the same level of error support that DTS had, I think our users would have been in trouble. Still, we see posts and internal questions on support aliases where people post the errors and the package, and then ask what went wrong. A large percent of the time, we're able to determine what the user has done or why the package is broken simply based on the errors they include in the post without ever opening up the package. I think that folks aren't used to getting informative errors and error stacks from their ETL tool. :) There are at least 2100 errors and warnings in IS and that number is growing everyday.

When you get some time, take a look at the errors. For the most part, they are very helpful (If not, please let us know and we'll try to make them better) and from my experience, folks don't fully understand how much information these errors and warning communicate and would benefit from reading them closer. Take some time to read what errors, warnings and information events are telling you. They usually provide a pretty good start into diagnosing problem packages.

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Next CTP available soon...

We're really close to sending out the next CTP. There are a few notable changes in Integration Services.

  • Package Reader/Writer roles enabled in Management Studio. You probably haven't heard much about the security features in Integration Services because not a lot of people know about them. I'll blog on this soon, but this is part of that set of security features that have recently come online.
  • Better access to DTS 2000 packages node
  • You can launch the migration wizard directly from the DTS 2000 packages node in Management Studio.
  • Important bug fixes
  • There are also some known issues.

These are all noted in the CTPNotes (readme) but my guess from the questions being asked on newsgroups etc. is that few people actually read that document because:

  1. People don't know about it. Now you do.
  2. It's very long. It is very long, but it's also separated into product interest areas so that if you're only interested in replication or reporting services etc., you can read just the portions that pertain to those products.
  3. Didn't know it had this kind of information. The product specific information is toward the end of the document. Scroll down until you see the product name in which you're interested to get the latest changes.

It's worth a quick runthrough even if you don't read everything. There are probably one or two people in the world who read the entire document. But if you take a few minutes and breeze through it while you're waiting for it to install etc. I think it will be worth your time.

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Table driven foreach loops

In an earlier blog I mentioned that we'd have a fix for the ForEachADOEnumerator. Before the fix there were some pretty involved steps required to build an ADO.Net recordset and enumerate over it in the foreachloop.
This is no longer the case.
Now with a few clicks and a query or two, you can drive whole packages inside a foreach loop with rows from any table. This is a powerful integration between the workflow of IS and your existing data wherever it may exist.

I've built a deceptively simple package you can run that illustrates this. The package has an Execute SQL Task and a ForEach Loop in it. The SQL Task queries the local server's AdventureWorks database to get some names and titles into a record set. It looks like this:

The Execute SQL Task script looks like this:

USE ADVENTUREWORKS
GO

SELECT Con.FirstName, Con.LastName, Emp.Title
FROM HumanResources.Employee Emp, Person.Contact Con
WHERE Con.ContactID = Emp.ContactID
AND Con.FirstName LIKE 'D____'

The resulting data from this query looks something like this:

No big deal so far, just a SQL Task executing a query. The trick is how to get the rows of the result set into the workflow. How to enumerate over those and do meaningful work in the package that is driven by the data in the result. For this simple example, the “meaningful work” is to execute a script task that shows a message box with the data. Applications for this integration however are limitless. Any time you need to execute a section of work flow per row of a result, this method provides a way.

Here are some examples:

  • Sending out email notifications to clients when it's time to renew subscriptions.
  • Executing sub-packages that have been queued up through a management console that stores resulting queue requests in tables.
  • Processing employee edge events as described in human resources databases. Events such as Birthdays, children turning 18, option expirations, etc.
  • CRM stale dates. Reminding sales people to contact customers who haven't been contacted within N weeks.
  • Processing sales logs to find potential fraud, bad checks or stolen visas.

The trick is to get the value of the columns of each row into a form that it can be used inside the package. For Integration Services, the way to do that is to get the columns values into variables. To do this, I need to take the recordset returned from the SQL Task and somehow iterate over it. The ForEachADOEnumerator was created just for that purpose. Here's what the settings look like in the ForEach Loop editor:

The query only selects results that constitute one table. So I'm only interested in the first table. Also, note that the ADO object source variable is is the User::ADORecords variable. That means that the ForEachADOEnumerator will look for the recordset in that variable. Go back to the SQL Task and you'll find it specifies the ADORecords variable as the destination for the result set.

So here's how it works. The SQL Task executes a query. It builds a recordset with the results from the query and places the recordset object into the ADORecords variable. Then the ForEachLoop executes and initializes the ForEachADOEnumerator. The ForEachADOEnumerator has been configured to retrieve the ADORecords variable and enumerate over the recordset stored there. Next, we need to get the values of the columns in the recordset into variables. I do that on the variable mappings page of the ForEach Loop.

This tells the ForEachLoop to map the columns from the recordset to the variables as described. In this case, the first column, column at index 0, should be “mapped” to the variable named Column1 and so forth. To see this in action I've set a breakpoint to fire at the beginning of every loop iteration. Like this:

Then, when running the package and I hit the breakpoint, it looks something like this:

I've also set up my watch to show me what the variable values look like before the first execution of the loop. The watch with the three mapped variables look like this:

See the correlation between the first row of the results from the query above and the first values to be set for the variables? Each iteration of the ForEachLoop will now correspond to one row in the result and the ForEach Loop will iterate 14 times for each row in the resulting recordset.

The code in the script task to show the variable values is trivial, but I'll add it here for completeness. As I've noted in other blog posts, you'll need to remember to place the name of the variables in either the “ReadOnlyVariables“ or “ReadWriteVariables“ property to make this code work.

Dim Message As String
Message = Dts.Variables("Column1").Value + " " + Dts.Variables("Column2").Value + ", " + Dts.Variables("Column3").Value
MsgBox(Message)
Dts.TaskResult = Dts.Results.Success


Now when I execute the package, I get a messagebox like this:

Driving loops with tables this way is extremely powerful. Now, thanks to the recent modifications to the ForEachADOEnumerator, it's also very easy to do.

I've posted the package here so you don't have to recreate it.

Hope this helps you understand enumerators better, especially the class of enumerators we call matrix, multi-valued or enumerator enumerators. That is, enumerators that produce more than one value per iteration. The ForEachADOEnumerator provides an enormous opportunity to build work flow that is driven by existing data in its existing form, in a database.

If you have any comments or input on this, I'd love to hear it. I know some of you have been asking for a simpler way to do this. I hope this satisfies. :)

Universe.Earth.Software.Microsoft.SQLServer.SSIS.KirkHaselden

Integration Services VS. Informatica

Jamie Thomson has been working with SSIS for a few months now and has posted more of his findings. I won't steal his thunder by telling you what they are, but it's worth a look.

http://blogs.conchango.com/jamiethomson/archive/2005/03/22/1172.aspx

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.SSIS.KirkHaselden

This post is not about Paris Hilton, Britney Spears, Beyonce, or Christina Aguilera

Donald Farmer and I were talking about blogs and hits etc. and noticed that posts with certain phrases in their title get a lot of hits. He suggested a little experiment that I post something with Paris or Britney in the title. Well, just for fun, let's see how many hits this one gets. :)

And, please, no complaints from across the pond because I left out the Spice Girls. :)

K

Fleeting issue with debugging in the designer in the February CTP

If you attempt to debug a package and the package will not run in the designer. Also, if you attempt to execute the package by selecting Debug | Start Without Debugging and it correctly executes, you may be seeing an issue with setup that we've seen on some machines.

If you're experiencing this problem, look the registry value under:

HKEY_CLASSES_ROOT\TypeLib\{3ED4D7CC-1ECC-4D11-9AC7-FFA71CE45705}\1.0\0\win32

It should look something like this:

C:\Program Files\Microsoft Visual Studio 8\Common7\IDE\PrivateAssemblies\DtsDebugHost.exe

If it has quotes around it, remove the quotes. That should fix the problem.

Thanks to Michael Entin on the SSIS dev team for figuring this out for one of our customers. Hopefully you haven't seen this, but if you do, this should fix the problem.

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.SSIS.KirkHaselden

Normalizing flat files

A common question asked on newsgroups is how to normalize a flat file with DTS.

With DTS the answer was usually, you load the file into a table and then perform queries on the table to extract the colums into normalized tables. For large flat files this can be a real pain because you must use a different set of tools to perform a simple operation and it may be unnecessarily time consuming because you have to load the data before you can even begin to transform it. Yes, ELT.  Finally, you would need to use one process to extract the data and another to transform it, decentralizing the process.

It would be great if it were possible to normalize the data as it flows through the pipeline for a one time load sequence. With SSIS, you can transform the data as you load it, ETL.

For example, maybe a flat file contains header/detail information in the same row. Likely, since the flat file isn't normalized there's some redundant header data. One way to do this in Integration Services is by splitting the flow from a flat file source adapter. For the simple case, where you only need to split the flat file into two tables, use the multi-cast to split the flow into two. On the one flow, the one to extract header information, use the sort t\ransform to remove duplicates. Then, use a lookup that references the header destination table. If the header data (row) is already there, based on the key, then you'll want to discard that row. Here's a case where the error flow comes in handy. For the lookup, flow the rows not found (errors) into the header destination table. Then, only those rows that aren't already in the table will get inserted.

On the other flow, the details flow, you can just flow this directly to the details table. No need to remove duplicates etc. if the details are unique transaction line items etc.and there shouldn't be any redundency. If there is, you can build a flow similar to the one for header information to eliminate dups.

If there is information that needs to be separated into more than 2 tables similar flows can be created along the same lines.

The point here is that it can all be done in the pipe and within one process keeping it centralized and easier to debug and maintain. In some cases, it may also perform better, especially in cases where the header row count is low or there is a low threshold of deviance from a small set of header data.

Update: Just wanted to add one last thing. ETL scenarios all have one thing in common, they're all different. So the approach to them will be different as well. Ultimately, how you approach your solution is going to depend largely on the requirements of the particular project. This post is not about establishing _the_ correct SSIS methodology, but more to explain the expanded possibilities that are available with SSIS vs. DTS.

Universe.Earth.Software.Microsoft.SQLServer.SSIS.KirkHaselden

Raw adapters to use variable access mode

We received feedback that the raw adapters were difficult if impossible to configure. We've modified the raw adapters so that it's possible to easiliy configure them. They now use a variable access mode. The variable points to the location of the raw file and we all know how easy it is to configure variables, right?

This should make it easier to deploy packages with the raw file adapters in them.

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

 

New Microsoft.com dedicated SSIS web page...

Here
It's a bit sparse at the moment, but more content is coming.

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

DevTeach 2005

For those who may be interested, I'm going to be speaking at DevTeach, June 18-22 in Montreal, Quebec.

http://www.devteach.com/Index.asp

I'll be giving three talks about... You guessed it, SSIS.

“What's New in DTS”, “Configuring and Deploying SSIS Packages” and “Data Flow in SSIS”.

Looks like it will be a great conference. A lot of the SQL Junkies crowd will be there as well.

If you're going, please stop by and say hello. Apparently, I have to learn French... :)

J'attends avec intérêt de vous voir tout là.

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Donald speaks...

Donald's post speaks volumes about why he's such an incredible asset to the SQL Server Integration Services team. This is a great post about the position and strength of SSIS, why we integrate the way we do with other vendors, and why it will only get better.

Well, aside from the fact that we have Donald. :) 

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Contingent execution based on stored procedure results I

Well, I've been a little pre-occupied lately. Between rebuilding machines, my kids getting sick, getting sick from my kids, and my day job, I haven't been blogging for a while. So I've got a few stored up. Look out, here they come. :)

Here's one you might find interesting. Say you have the following requirements:

  • You have a stored procedure that queries for some state on your server.
  • You want to conditionally execute portions of your package based on the results of the stored procedure.

For this particular case, the user wants to determine if a given table exists and if it does, if it is a user table.

IF EXISTS (select * from dbo.sysobjects where id = object_id(N'[dbo].[AWBuildVersion]')) and

OBJECTPROPERTY(id, N'IsUserTable') = 1

 

So, the problem is, how to execute the query and get the results back into SSIS.

You've got a few options here. Return the result in a result set and iterate over it in a ForEach Loop with the ADO enumerator or such.

But, for this case, I'm only interested in the one value.
Here's how I addressed this and it's pretty straightforward.
I created a stored procedure like this:

create procedure [dbo].[uspGetUserTableExists](

@TableName varchar(256)

)

AS

BEGIN