Alberto Ferrari

SQL Server 2005, BI, SSIS and SSAS.

<September 2008>
SuMoTuWeThFrSa
31123456
78910111213
14151617181920
21222324252627
2829301234
567891011


Navigation

Images

Subscriptions

Post Categories



Thursday, June 08, 2006 - Posts

TableDifference: a solution to SCD handling

In SSIS the SCD component does a lot of work for you but it works so slowly that it is quite always convenient to create an “ad hoc” solution to handle the slowly changing dimensions.

As the main problem is that of determining the difference between the last snapshot of the SCD and the current data we decided to write an SSIS component that has two inputs, one for the “old” and one for the “new” flow of data. It compares all the rows and sends each one to a different output for new, deleted and user configurable updated data. You can then decide how to handle the variations executing the correct UPDATE sequences to the dimension table or whatever you need to handle variations.

The following picture illustrates the component functionality better than thousand words.

Using TableDifference two tables with 5.5 millions of records and 25 columns each were compared, using an AMD dual core processor with 4Gb of RAM and standard SATA disks running both SSIS and SQL Server in 7 minutes. The complexity of the algorithm is linear, so you can expect to take more or less 15 minutes for a 10 millions record tables with the same hardware.

You can find the full article that describes TableDifference here. Full source code is available at www.sqlbi.eu.

posted Thursday, June 08, 2006 8:14 AM by AlbertoFerrari with 3 Comments




Powered by Dot Net Junkies, by Telligent Systems