Workflow Tips and Tricks (RSS)

Tips and tricks for how to do things in workflow

Caching in The SSIS Service

In the stream of thought category...

Integration Services is of all things a platform. While some platforms are pretty useless until you actually build something with them, SDKs and APIs like DirectX etc (Yes, definitions of “platform“ differ) others are extremely functional out of the box, Windows, Office etc.

Integration Services as a Platform

Integration Services is a platform in every sense of the word of the latter type. You can use it out of the box to generate functionality. You can build components that plug into it. You can build them in any COM conformant language or .NET language as well. For lack of better nomenclature, or ignorance thereof, I call this “Building Into”.

You can also fill components with code that gets executed (Script Task, SQL Task, Script Transform). “Building up”

You can build applications with SSIS embedded in the typical object model sense. “Building in”

You can just use SSIS to create packages and put together packages for building a solution. “Building on“

There's a cost associated with this. SSIS needs to keep track of all this “stuff“ that's going on. Where are the components. What kind are they, what are their capabilities and how do those components wish to be instantiated or persisted etc. It's surprising how much code in SSIS is devoted to managing, enumerating and tracking all this information.

Start up time for SSIS

We regularly get complaints that the startup time for packages is really long etc. It's true. In some cases, it's avoidable. We're looking at ways to make this better. In other cases, there are some things you can do to make it better as well.

What's one got to do with the other? Well, all those components need to be enumerated at some point. DTS's answer was to cache the component information in registry. Integration Services has many more component types and also supports both COM and managed components which include Logging Providers, Transforms, Source and Destination Adapters, Tasks, Connection Managers, and ForEach Enumerators. Managed components aren't really registry friendly. It is after all a different paradigm.

Integration Services uses the service to enumerate these components. It's running all the time. (It should be. If you're doing work in SSIS, you should have the service running.), and in case of reboot etc., it refreshes the cache automatically,

So, what happens when we enumerate components?

The runtime does a few things. It checks the registry for the following registry entry.

SOFTWARE\\Microsoft\\MSDTS\\Setup\\DtsPath

This tells IS where SSIS is installed. Typically, that value is something like:

C:\Program Files\Microsoft SQL Server\90\DTS

Yes, yes, I know. DTS is all over the place. Trust me, we would have changed it if we could, but that's a whole 'nuther blog entry.

If you go to the folder specified in that registry entry, you should see some subfolders named Connections, ForEachEnumerators, PipelineComponents, Tasks, and LogProviders. There are some others there as well. That's another blog too.

Now, the runtime looks in those folders for the managed components. If it finds the managed component there, it reflects on it. Interrogates it for things like it's component type, properties, Icon etc. This is how all the Infos collections on the Application object get populated. Or at least, this is part of how it works, for managed components. .Net has no notion of  “component categories“. It's a COM way of marking components as being in a certain class of components that supports certain interfaces. So, we've resorted to dropping the managed components into these folders. The different folders are really just to make it easier to organize more than a technical necessity.

Once all the managed components have been enumerated, the runtime looks for native components. This is a little easier because the search is assisted by “Component categories“. All we have to do is enumerate the components that are DTS components.

The service communicates with the runtime to retrieve this information and stores it for the next time you load a package.

Summary and tips

Bottom line is, there's a lot happening behind the scenes. To make startup better, keep the service running. Use the designer to design packages only. If you want to gauge performance or startup time, use DTExec.exe. There is very little overhead when running DTExec with the service running. If you don't need breakpoints while executing a package in the designer, use the execute without debugging feature found in the debug menu. That launches the package without attaching the debugger etc. If you're debugging in the designer, shut down as many of the windows as you can. Having 4 watches, the memory and output window all open at the same time along with the tools and project windows isn't really necessary while debugging. In fact, the VS environment will remember which windows you have open during execution vs. design time. Only open the windows you need during execution, that has a profound effect on perf.

Hope this helps,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Contingent execution based on stored procedure results I

Well, I've been a little pre-occupied lately. Between rebuilding machines, my kids getting sick, getting sick from my kids, and my day job, I haven't been blogging for a while. So I've got a few stored up. Look out, here they come. :)

Here's one you might find interesting. Say you have the following requirements:

  • You have a stored procedure that queries for some state on your server.
  • You want to conditionally execute portions of your package based on the results of the stored procedure.

For this particular case, the user wants to determine if a given table exists and if it does, if it is a user table.

IF EXISTS (select * from dbo.sysobjects where id = object_id(N'[dbo].[AWBuildVersion]')) and

OBJECTPROPERTY(id, N'IsUserTable') = 1

 

So, the problem is, how to execute the query and get the results back into SSIS.

You've got a few options here. Return the result in a result set and iterate over it in a ForEach Loop with the ADO enumerator or such.

But, for this case, I'm only interested in the one value.
Here's how I addressed this and it's pretty straightforward.
I created a stored procedure like this:

create procedure [dbo].[uspGetUserTableExists](

@TableName varchar(256)

)

AS

BEGIN

DECLARE @FlExists bit

IF exists (select * from dbo.sysobjects where id = object_id(N'[dbo].['+@TableName+']')

and OBJECTPROPERTY(id, N'IsUserTable') = 1)

BEGIN

SET @FlExists = 1

END

ELSE

BEGIN

SET @FlExists = 0

END

SELECT @FlExists AS 'Exists'

END

The stored procedure just checks for the existence of the table and then if the table is a user table and sets a flag based on that. 

The select statement forces a return set and forces the column name 'Exists' which will make it a little clearer later on what the return value is when we use it in the package.

Now that I have a stored procedure to call, I can call it in my package with an Execute SQL Task. I've created a simple package that calls the stored procedure and based on the return value, executes a different script task.

The package looks like this:

Package with a SQL Task and two script tasks

 

The Execute SQL Task calls the stored procedure passing in the name of the table to find by binding the TableName variable to the parameter of the stored procedure. That looks something like this:

 

Package with a SQL Task and two script tasks

See the question mark following the SQLStatement? That is the parameter. This is how you call a stored procedure with a parameter from the SQL Task. Next, we need to map the parameter to the input variable. You do that on the Parameter Mapping tab, shown here:

Package with a SQL Task and two script tasks

The TableName variable is a string variable that holds the name of a... table. Once you've gotten a value into a variable in SSIS, it's a simple matter to use it everywhere in the package. Variables have always been important in DTS, but they're crucial in IS. So, for example, this “package part” could be used as sublogic to determine if certain tables should be backed up based on a heuristic of some sort that keys off whether the table is a user table or not. The names of the tables could come from the SMO enumerator in a ForEach loop etc. The graphic above shows how to get the name of the table into the input parameter for the stored procedure. I can do this directly with the SMO enumerator and the script task, but like most scenarios in SSIS, there are many ways to accomplish this. What is the best way depends on your requirements.

OK, this post is getting pretty long. That's enough for now I think.

In the next post, I'll show you how you can get the return value from the SQL statement back into a variable, and how to use the return value in expressions on precedence constraints to conditionally execute different parts of the package. I'll also post the package so you can play around with it yourself.

Hope this is interesting and helpful. Please let me know if it's otherwise. :)

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

Expressions: Part III

Jamie's blog got me thinking about expressions and wanted to give some background and a little more information about how they work.

Background

Property expressions came relatively late in the development cycle. There was always an idea floating around on our team about putting expressions on properties. Even though it seems obvious now, it wasn't obvious to us early on how to do it. In fact, we visited the idea on several occasions, never coming to a decent approach or design. Very early on, we didn't even have the notion of an expression evaluator in the runtime at all. Even though we were writing one for the derived column and conditional split, we didn't make the conceptual leap as soon as we might have that we could use the expression evaluator for the runtime as well. Eventually, Sergei Ivanov and I were sitting in my office trying to solve the long time problem, how to replace property mappings with a more workable and manageable solution. We knew we wanted to use expressions somehow. By this time, we'd incorporated them into the variables, on the forloop and into precedence constraints. The problem was, how to do it in a way that would successfully apply the result at the right time. Sergei said something about applying them at execution time and then we realized we were on to something.

How They Work

The trick is “applying“ the result of the expressions at the right time. For tasks, that meant right before initialization, validation, and execution. For other objects, the key edge events are different. Here's when expressions are applied.

  • Tasks
    • Before Saving
    • After Loading
    • Before initialization
    • Before Validation
    • Before Execution
  • Connection Managers
    • Before Saving
    • After Loading
    • Before initialization
    • Before returning from AcquireConnection calls
  • Log Providers
    • Before Saving
    • After Loading
    • Before Initialization
  • ForEach Enumerators
    • Before Saving
    • After Loading
    • Before Initialization
    • Before returning from GetEnumerator calls

These object types are “hosted“. That means there is a host that is, for the most part, transparent in the designer that creates the component and wraps it. The hosts do a lot of stuff for the components so the components don't have to do it themselves. This factors out functionality to a centralized place and eliminates the need for custom component writers to re-implement all that code. Property expressions are one of those features implemented on the hosts. The host examines the components when it loads it the first time and discovers things like what properties the component exposes etc. It also tracks the property expressions that you create on the component and “applies“ the expression to the property or in other words, evaluates the expression and sets the corresponding property for the expression to the resulting value.

Hopefully, this helps you better understand the behavior of property expressions. As shown in previous blogs, property expressions are an extremely powerful way to dynamically modify package behavior based on various package external and internal parameters.

Hope this helps,

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden

 

 

Chatty script components...

For what it's worth...

Integration services has a script transform that may be used as a source or destination adapter or a transform and a script task for work flow tasks.
Both are feature rich and at times a bit too chatty. For example, if I have some invalid code like this:
Invalid code

Running the package I'll get a dialog box that will look something like this:
Invalid code dialog

This dialog will pop up for each error in the script. In most cases this is OK, but at times you don't necessarily want dialogs popping up, say for example if you're working on a demo or in a different part of the package that's unrelated to the script task.

There are a few options for you to eliminate these dialogs.

  • If you click on the "Stop Compiler" button, no more message dialogs will display. The "Continue Reporting" button will, of course, continue to report errors as the compiler finds them.
  • You can disable the task altogether by right clicking on the task you wish to disable and selecting “Disable“ from the context menu. This is very useful for package development and debugging.
  • You can disable the dialogs using a little known setting in the Package Property Pages dialog. Right click on the Project in the Solution Explorer. Select properties. In the “Debugging“ tab, there is a setting called InteractiveMode. Set that to false, and no script component dialogs will show up.
So long as the task is enabled and executes, you will still get warning and error feedback in the output window. You can open that window by selecting View from the main menu and selecting Other Windows and then Output. The output window will look something like this:
Output window 

These are a few ways to better control the debugging environment. Give them a try and let me know what you think. As usual, your (non-spam) feedback is much appreciated. :)

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden



One configuration, multiple packages...

This question comes up a lot.

“Can you use one configuration for multiple packages?“ The answer is yes. In fact, from the start, this was the scenario that we were planning for and ultimately is the way to abstract your packages away from the differences in the machines they are running on. Here's a really simple example that shows how to configure a property on a task, in this case, the subject line of a send mail task in two packages with the same configuration file:

Here's how you do it:

  1. Create two new packages in your solution.
  2. Drop a send mail task into each one
  3. Rename them both, SendMail. I know, not much imagination. Bear with me, it's late.
  4. Change the subject lines for both Send Mail tasks to something random like “On these three breezy trees the cheezy flees flew“
  5. Right click on one of the packages and select configurations.
  6. Create a new XML configuration file and specify the subject property of the SendMail task to export to the configuration.
  7. Save the configuration somewhere.
  8. Go to the other package, right click, select package configurations, browse for the one you've just created and select it.
  9. The wizard will ask you if you want to reuse the existing one. Click on yes.
  10. Now you've got two packages that use the same configuration. So long as both tasks are named the exact same, and they remain in the same location hierarchically (so their package paths match), both will get configured from the same configuration you've created.

This will yield a configuration that should look something like this:

Configuration ConfiguredType="Property" Path="\Package\SendMail.Subject" ValueType="String"

If you try this, let me know how it works for you.

Thanks,

Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden