July 2006 - Posts

Wiki on BI with SQL Server

I just opened the wiki on SQLBI.EU: it contains link to useful resources for BI solution development with SQL Server, so it covers SSIS, SSAS, SSRS and of course SQL Server itself.

The wiki works well when there is large contribution, so please share your links with the BI community! Comments and suggestions are welcome!

Request for a better lookup component (SSIS dev team: call to action!)

I just answered to a request on SQLBI.EU forum of someone that is looking for a better lookup component, specifically designed for typical Data Warehouse scenarios.

While you can read the whole thread on the forum, I try to resume most wanted requirements, hoping that Katmai (next SQL Server version) developers are already working on that!

  • Default value for lookup failure: simple "dummy value" assignment for lookup that fails, so that we can easily implement fact table processing in non-incremental scenarios
  • Synchronous action for lookup failure: it is absolutely necessary to have the chance to execute a script (or another component, but this seems harder) in a synchronous way against the processed row. The need for synchronization is that if I need to add a dimension member, I need to do that *before* to process the next fact table row, that can reference the same member and I don't want (to try) to create a second surrogate key for the same application key
  • Dynamic update at preloaded lookup table: the previous feature brings the need for a in-memory update of the lookup table initially loaded to get fast performance. This thing has to be done synchronously to the row process and it is necessary only to add elements (no delete/update required).

Having lookup source, it should be not so hard to obtain these features: 80% of functionality are already implemented. Critical point is the need to make operations synchronous to the row process, before to process another row.

Actually I'm doing that with a double lookup: the first try to lookup in memory for existing (at start time) dimension members, the second do the lookup calling a stored procedure that first try to lookup with a SELECT sql statement, and if it doesn't get the value, it INSERTs a new dimension member. That the better way I found until now to implement this pattern in a fast and readable way, but I really dislike to have to add a stored procedure to support my SSIS package (I know that my position could be controversial and it could be an argument for a future post!).

Improve DTS to SSIS package migration with DtsToSsis-Prepare

Two weeks ago I wrote a tool to ease and improve the migration of a DTS package to a SSIS package. I named the tool DtsToSsis-Prepare, it is available with full source code at www.sqlbi.eu and now I wrote a larger article that describes the needs that DtsToSsis-Prepare try to solve.

For people who are too lazy to click a link, I extracted a part of the article in this post.

This is a DTS package before treatement: UDL connections are used to easily change connection parameters.

If we try to migrate this DTS package into an SSIS package, we obtain this result.

It is not very useful to get one SSIS package containing two DTS packages.

Using the DtsToSsis-Prepare tool with all options enabled we get this “prepared” DTS package.

As you can see, UDL connections have been converted to standard connections and have been consolidated into only two physical connections. At this point this DTS package would be less parallelized, but we will never execute this package, we just want it to migrate in a fair way to SSIS. The next picture shows the result of this migration.

We have two Data Flow Tasks corresponding to our original Transform Data Tasks.

I would be glad to receive feedback and comments about this tool: probably there are other features that would be useful to add. There is also a dedicated support forum to submit comments/bug/ideas.

The new and improved SCD handling with SSIS

Thanks to Alberto Ferrari there is a new way to handle Slowly Changing Dimensions (SCD) with Integration Services: he just released Table Difference 1.1 and FlowSync 1.0. Both are downloadable at www.sqlbi.eu.

The problem that those components solve is lack of performance handling large dimensions. Standard SCD component is very slow.

The FlowSync component is also interesting to solve nasty memory problems occurring when you have many asynchronous transformation into a the same Data Flow Task and a lot of data to process.

I'd like that SSIS dev team take a look at those components for the next version of SQL Server.