James Snape asked a familiar question. Since we've been seeing it a lot, I thought I'd blog about it.
I was wondering if you guys had tried to do the following and if so what the best approach would be.
Our data warehouse product will use Yukon as a storage mechanism with SQLIS ETL. The facts and dimensions are not built it, they are added by one of our tools at deploy time. What I would like to do is write a single IS package that can be run multiple times with different parameters to import different tables. The "pattern" for importing dimensions is the same for all tables.
Let me give you an example. For any dimension table I want to stream data from source into a staging database table, then mapping source columns to configured destination columns I want to run the slow changing dimension processor to insert/update/expire rows in the destination table. The only thing different for each table is the command used to extract the data and how the columns are mapped (plus the column count/types can vary).
I would like to be able to do some sort of a FOR or FOREACH to iterate over my dimension tables, configure a data flow with the relevant source select and column mappings and execute it.
I guess I have two main questions - is this sort of thing possible? and what pointers can you give to do this?
===================== ===================== ===================== =====================
We've had a few people ask about this type of dimension load. We don't recommend doing it dynamically in a loop as described for the following reasons:
- It's more complicated to do, hence more difficult to maintain the package
- Errors and warnings will not flow as well and it will be more difficult to diagnose problems if/when they occur
- Integration Services is all about being explicit. Building packages for a specific purpose so that, when you open the package it's clear what you're doing. Putting this process into such a loop would actually hide details. Generally, hiding details is what causes problems for people when trying to understand a package.
- Using a loop in this way doesn't really buy you anything but a smaller package and a lot of headaches as described above.
The problem isn’t really the fact that you’d be using a loop, you can still use the loop to make your master package smaller. What I suggest is to factor out your load using sub-packages. Use one package per dimension. Then, you can use the loop to call out to the subpackages. With the new ForEachItem Enumerator, you can specify the list of dimension load packages to run. That way you are explicitly creating packages to the requirements of your load but making it very simple to manage how the load is being done. Or, optionally, if you want something a little more automated, you can place all the packages in one folder and run them with the ForEachFile Enumerator. Then, simply adding or removing a given package from that folder will add or remove it from the load.
I think the biggest problem with a totally automatic load as described is that it relies on scripting of some sort. We’ve gone to great lengths to eliminate the need to script as much as possible. It's much better if you can build a codeless solution because, typically scripted solutions are opaque, buggy, difficult to maintain, and understand. Also, if you have a problem with the load, you must somehow rebuild the package that was having the problem. Since it gets built dynamically, there's not way to know for sure when you build it (by simulating the way it gets built in the dynamic package) whether or not it actually reflects the problem package accurately. The way it gets built during the load could be adversely affected by some side effect of the environment during the load that doesn't exist when simulating the build. If you use statically defined packages, it's much easier to debug. Just go load the package that's having the problem and debug it without concern for getting the package right or eliminating side effects of the load time environment.
If you build individual packages per dimension load and use the for loop to iterate over the packages as described, then you have a totally explicit and modular set of packages (possibly without a line of code) that will be much easier to discover, maintain, understand and manage.
Now, to your questions. Yes, it is possible to do. But we only recommend it for strict one-off type data movement scenarios. And for that, we recommend you use the SMO Transfer object.
I'm not just making up these problems by the way. We actually wrote a system like this. It's the foundation of the SMO Transfer object. For every table the SMO Transfer object moves, it creates a temporary package, configures a SQL Task to create the new table and configures a pipeline to move it. Then discards the package in favor of the next one. This was _very_ difficult to get right and is only for moving data straight across without any transformations or schema changes. I believe it's technically possible to build such a package, but don't recommend it as an in house solution.
Hope this helps,
Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden