<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>Alberto Ferrari </title><link>http://www.sqljunkies.com/WebLog/aferrari/default.aspx</link><description>SQL Server 2005, BI, SSIS and SSAS.</description><dc:language>en-US</dc:language><generator>CommunityServer 1.0 (Build: 1.0.1.50214)</generator><item><title>Bulk Insert of sorted data</title><link>http://www.sqljunkies.com/WebLog/aferrari/archive/2007/07/02/41931.aspx</link><pubDate>Mon, 02 Jul 2007 19:53:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:41931</guid><dc:creator>AlbertoFerrari</dc:creator><slash:comments>1</slash:comments><comments>http://www.sqljunkies.com/WebLog/aferrari/comments/41931.aspx</comments><wfw:commentRss>http://www.sqljunkies.com/WebLog/aferrari/commentrss.aspx?PostID=41931</wfw:commentRss><description>&lt;P&gt;
In a &lt;a href="http://www.sqljunkies.com/WebLog/aferrari/archive/2007/04/08/30380.aspx"&gt;previous post&lt;/A&gt; I showed how to make bulk insert faster using SSIS. During  &lt;A HREF="http://www.sqlconference.it/"&gt;Sql Server Conference 2007&lt;/A&gt; I had the great opportunity to double check it with Stefano Stefani that analyzed the problem in greater detail and finally came up with both an explanation of the problem and a simple workaround when facing the same problem with T-SQL (my solution is still good with SSIS but Stefano’s one is very effective with T-SQL).
&lt;/P&gt;
The problem is related to bad performances when inserting rows into a table that has a clustered index (typically a primary key) based on an INT IDENTITY field and is easily shown in this example:
&lt;PRE&gt;
Create Table Destination (
	ID_Test Int Identity (1,1) Primary Key,
	Cod_Test VarChar (20),
	Test VarChar (100))
Go

Create Table RowsToInsert (
	Cod_Test VarChar (20),
	Test VarChar (100))
Go
&lt;/PRE&gt;
Now, suppose you have 1.000.000 of rows in RowsToInsert, if you analyze the execution plan of this simple insert:
&lt;PRE&gt;
INSERT 
    Destination
SELECT 
    Cod_Test, 
    Test
FROM
    RowsToInsert
OPTION (RECOMPILE)
&lt;/PRE&gt;
&lt;P&gt;
You will get this:
&lt;/P&gt;
&lt;img src="/WebLog/photos/aferrari/images/41930/original.aspx" width="100%"&gt;
&lt;P&gt;
As you can see, the data is sorted before being inserted into the table even if the IDENTITY specification of the primary key guarantees to SQL Server that the data is already sorted. Of course, sorting one million rows takes a long time.
&lt;/P&gt;
&lt;P&gt;
The problem is that SQL Server can insert rows in parallel mode into the table, thus there is no guarantee that the IDENTITY fields will be generated sequentially. This leads to very poor performances when inserting a huge amount of data, as is normally the case in DWH tasks.
&lt;/P&gt;
&lt;P&gt;
A very effective workaround, when facing the problem in a T-SQL task, is that of sorting the data based on a dummy column, as in this example:
&lt;/P&gt;
&lt;PRE&gt;
INSERT 
    Destination
SELECT 
    Cod_Test, 
    Test
FROM
    (SELECT 1 A, * FROM RowsToInsert) Q
ORDER BY A
OPTION (RECOMPILE)
&lt;/PRE&gt;
&lt;P&gt;
The execution plan is now:
&lt;/P&gt;
&lt;P&gt;
&lt;img src="/WebLog/photos/aferrari/images/41929/original.aspx" width="100%"&gt;
&lt;/P&gt;
&lt;P&gt;
The sort task disappeared, as now SQL Server knows that the IDENTITY fields will be generated sequentially because the input is sorted.
&lt;/P&gt;
&lt;P&gt;
The final consideration is that you can have a dramatic speed gain in huge INSERT operations with SQL Server into clustered indexed tables if you provide SORTED SELECTS to the INSERT operation. When the clustered index is based on an INT IDENTITY field you can get the same speed by sorting on a dummy field, instructing SQL Server that the IDENTITY columns will be computed sequentially and there is no need to sort the data.
&lt;/P&gt;&lt;img src="http://www.sqljunkies.com/WebLog/aggbug.aspx?PostID=41931" width="1" height="1"&gt;</description></item><item><title>Generate huge test tables using CTE</title><link>http://www.sqljunkies.com/WebLog/aferrari/archive/2007/07/02/41908.aspx</link><pubDate>Mon, 02 Jul 2007 19:07:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:41908</guid><dc:creator>AlbertoFerrari</dc:creator><slash:comments>0</slash:comments><comments>http://www.sqljunkies.com/WebLog/aferrari/comments/41908.aspx</comments><wfw:commentRss>http://www.sqljunkies.com/WebLog/aferrari/commentrss.aspx?PostID=41908</wfw:commentRss><description>&lt;P&gt;
Sometimes I need to have a huge table to make performance tests and I want to create the table easily. I found a simple way of producing these test data using CTE in a creative way and decided to post it so I can have it at hand when needed and maybe somebody will find it useful.
&lt;/P&gt;
&lt;P&gt;
The trick is to have a CTE Select that generates numbers from 0 to 9 and then cross join it with itself several times composing the numbers you want. The sample generates a one million row but can be easily changed to fit your needs
&lt;/P&gt;
&lt;pre&gt;
With 
    ZeroToNine (Digit) As (
        Select 0 As Digit
        Union All
        Select Digit + 1 From ZeroToNine Where Digit &amp;lt 9),
    OneMillionRows (Number) As (
        Select 
          Number = SixthDigit.Digit  * 100000 
                 + FifthDigit.Digit  *  10000 
                 + FourthDigit.Digit *   1000 
                 + ThirdDigit.Digit  *    100 
                 + SecondDigit.Digit *     10 
                 + FirstDigit.Digit  *      1 
        From
            ZeroToNine As FirstDigit  Cross Join
            ZeroToNine As SecondDigit Cross Join
            ZeroToNine As ThirdDigit  Cross Join
            ZeroToNine As FourthDigit Cross Join
            ZeroToNine As FifthDigit  Cross Join
            ZeroToNine As SixthDigit)
Select 
    COD_Test = Number,
    Test     = Replicate ('A', 100) 
From OneMillionRows 
Order By Number
&lt;/pre&gt;
&lt;P&gt;
Removing the ORDER BY clause to the final SELECT you will have unordered rows, this can be useful if you need – for example – to test a SORT component.
&lt;/P&gt;&lt;img src="http://www.sqljunkies.com/WebLog/aggbug.aspx?PostID=41908" width="1" height="1"&gt;</description></item><item><title>Mantaining order in SSIS flow, problems with Merge</title><link>http://www.sqljunkies.com/WebLog/aferrari/archive/2007/04/10/30459.aspx</link><pubDate>Tue, 10 Apr 2007 08:29:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:30459</guid><dc:creator>AlbertoFerrari</dc:creator><slash:comments>1</slash:comments><comments>http://www.sqljunkies.com/WebLog/aferrari/comments/30459.aspx</comments><wfw:commentRss>http://www.sqljunkies.com/WebLog/aferrari/commentrss.aspx?PostID=30459</wfw:commentRss><description>
&lt;P&gt;In a &lt;a href="http://www.sqljunkies.com/WebLog/aferrari/archive/2007/04/08/30380.aspx"&gt;previous post&lt;/A&gt; I spoke about the advantages of having sorted flows in SSIS to greatly speed up data insertion using fastload. The need to have a sorted flow brings some severe problems to the SSIS programmer that he need to be aware of and that IMHO Microsoft should address with a future implementation of SQL Server. &lt;/P&gt;
&lt;P&gt;Let’s have a look at the problem.&lt;/P&gt;
&lt;P&gt;In the image you can see a very typical SSIS data flow where you need to manage the error flow of a lookup component and go on with the processing. Even if you can’t see it from the picture, think that Sample Source will produce a sorted output of several millions rows and we want to insert into TestTable with the same sorting. &lt;/P&gt;
&lt;P&gt;
&lt;img src="/WebLog/photos/aferrari/images/30458/original.aspx"&gt;
&lt;/P&gt;
&lt;P&gt;This task works fine but it has a big problem: the Union All component will lose the sorting of the flow as it will handle data from both its input in an unordered way. This is not a bug, Union All has its behaviour by design. Still we have a problem and we know that sorting several millions rows after the Union All component is not an option for memory consumption.&lt;/P&gt;
&lt;P&gt;In SSIS you have another component, Merge, that will kindly maintain the order of its inputs, so you can change your package this way:&lt;/P&gt;
&lt;P&gt;
&lt;img src="/WebLog/photos/aferrari/images/30457/original.aspx"&gt;
&lt;/P&gt;
&lt;P&gt;Everything will work fine until you have a package that should handle several millions rows where only a very few of them (say 1.000) will enter the lookup error output flow.&lt;/P&gt;
&lt;P&gt;In such a situation Merge will start caching ALL the rows from its first input (the row that correctly matched lookup) until something will come from its second input (rows coming from the Derived Column task). The problem is that SSIS will NOT call the ProcessInput method of the Derived Column task until its buffer reaches a certain amount of rows (normally 10.000 or a number like it) and this will not happen because only 1.000 rows will enter the Derived Column path. In such a situation Merge will start consuming memory and will fill up all the available memory really fast leading to crashes and/or very poor performances.&lt;/P&gt;
&lt;P&gt;Even this behavior of both Merge and SSIS is “by design”, so we cannot complain it. But it could be easily solved setting the max number of rows of the buffer in the Derived column to 1 in order to call ProcessInput immediately (if I know what kind of data I will read I can easily set up values that make my SSIS package run faster). Unfortunately there is no such kind of option in SSIS so, at present, the problem cannot be solved.&lt;/P&gt;
&lt;P&gt;Needless to say, even if this is a problem, you can try to make your sorted package run without any merge component and you will be able to maintain the sort of the flow but this is a really big limitation in expressivity of your ETL algorithm so awareness of the problem is mandatory to successfully complete your ETL process.&lt;/P&gt;
&lt;img src="http://www.sqljunkies.com/WebLog/aggbug.aspx?PostID=30459" width="1" height="1"&gt;</description></item><item><title>Making Fast Load really fast on clustered indexed tables with SSIS</title><link>http://www.sqljunkies.com/WebLog/aferrari/archive/2007/04/08/30380.aspx</link><pubDate>Sun, 08 Apr 2007 16:12:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:30380</guid><dc:creator>AlbertoFerrari</dc:creator><slash:comments>3</slash:comments><comments>http://www.sqljunkies.com/WebLog/aferrari/comments/30380.aspx</comments><wfw:commentRss>http://www.sqljunkies.com/WebLog/aferrari/commentrss.aspx?PostID=30380</wfw:commentRss><description>&lt;P&gt;I have been so used to use int identity fields as primary key for any table and to believe it’s the fastest way to define a clustered primary key that I never thought this can cause serious performance problems to SSIS. Until I tested it and discovered that &lt;B&gt;int identity primary keys are among the slowest way to insert huge amount of data with SSIS&lt;/B&gt;. In the post I’ll describe the technique that – from my tests – is the fastest way to insert data into tables with clustered index using SSIS.&lt;/P&gt;
&lt;P&gt;As a test case I used a very simple table with only two fields:&lt;/P&gt;
&lt;P&gt;CREATE TABLE [dbo].[FastLoadTests](&lt;/P&gt;
&lt;P&gt;      [Id] [int] Identity NOT NULL,&lt;/P&gt;
&lt;P&gt;      [TestString] [varchar](2048) NOT NULL&lt;/P&gt;
&lt;P&gt;) ON [PRIMARY]&lt;/P&gt;
&lt;P&gt;I filled it with a very simple task that generates one million of monotonically increasing Id and TestStrings of exactly 512 bytes each, throwing them into an OleDb destination adapter that uses FastLoad to fill the table.&lt;/P&gt;
&lt;P&gt;If the table has no index at all the package runs in 18.5 seconds, pretty nice. Now, I created an index on the table like this:&lt;/P&gt;
&lt;P&gt;CREATE UNIQUE CLUSTERED INDEX [FastLoadTests_ClusteredIndex] ON [dbo].[FastLoadTests] ([Id] ASC) ON [PRIMARY]&lt;/P&gt;
&lt;P&gt;And run the package once more. The execution time is now 1.08.5 (one minute and eight seconds). It is &lt;B&gt;more or less 4 times slower&lt;/B&gt;. Clearly, as SQL has to sort one million rows, I was expecting poor performances. The first trial has been that of reducing the number of rows to sort.&lt;/P&gt;
&lt;P&gt;Looking at the various options in the OleDb destination adapter it is easy to find that it has a parameter (Maximun Insert Commit Size) that defaults to 0. &lt;/P&gt;
&lt;P&gt; &lt;/P&gt;
&lt;P&gt;&lt;img src="/WebLog/photos/aferrari/images/30381/original.aspx"&gt;&lt;/P&gt;
&lt;P&gt; &lt;/P&gt;
&lt;P&gt;Setting it to 10.000 reduced execution time to 44.5 seconds, a better result but still more than two times slower than the table with no clustered index. Is this the best result? Not at all! Let’s try something different.&lt;/P&gt;
&lt;P&gt;Reading (carefully) the MS documentation about the OleDB destination adapter (&lt;A href="http://msdn2.microsoft.com/en-us/library/ms141237.aspx"&gt;http://msdn2.microsoft.com/en-us/library/ms141237.aspx&lt;/A&gt;) you’ll discover that you can gain performance if the input data is sorted accordingly to the clustered index on the table, specifying the ORDER option with the advanced editor.&lt;/P&gt;
&lt;P&gt;After some trials I managed to set it with the advanced editor:&lt;/P&gt;
&lt;P&gt; &lt;/P&gt;
&lt;P&gt;&lt;img src="/WebLog/photos/aferrari/images/30382/original.aspx"&gt;&lt;/P&gt;
&lt;P&gt; &lt;/P&gt;
&lt;P&gt;It has been surprising that, running the package, I got this error:&lt;/P&gt;
&lt;P&gt;[Test Table [44]] Error: SSIS Error Code DTS_E_OLEDBERROR.  An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available.  Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Could not bulk load. The sorted column 'Id' is not valid. The ORDER hint is ignored.".&lt;/P&gt;
&lt;P&gt;So, it seems that setting this option on an INT IDENTITTY field causes FastLoad to fail. Really surprising because it seems that SQL Server does not know that the keys generated by itself will be sorted! Anyway, removing the Identity setting from the table and using the package generated ID resulted in execution time of 46.23 seconds. &lt;B&gt;No performance gain at all. &lt;/B&gt;Anyway as it seemed an interesting way to search into, I went more on trials.&lt;/P&gt;
&lt;P&gt;The last and resolving trial test has been that of REMOVING the Maximum Insert Commit Size parameter from the destination adapter. Running the package again I got a result of 19.07 seconds execution time, more or less the same time as inserting the data with no clustered index at all. Got it! Now I have a really fast way to insert data.&lt;/P&gt;
&lt;P&gt;The results are summarized in this table&lt;/P&gt;
&lt;P&gt; &lt;/P&gt;
&lt;TABLE&gt;

&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;I&gt; &lt;/I&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;I&gt;No index&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;&lt;I&gt;0 MICS&lt;/I&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;I&gt;Index, &lt;BR&gt;10.000 MICS&lt;/I&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;I&gt;Index, &lt;BR&gt;0 MICS&lt;/I&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;Int Identity key&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;18.85&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;41.15&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;1.08.57&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;SSIS generated key&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;18.59&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;44.54&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;1.10.03&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;SSIS key and ORDER option&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;18.85&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;46.23&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;B&gt;19.07&lt;/B&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;

&lt;/TABLE&gt;
&lt;P&gt;&lt;I&gt;*MICS = Maximum Insert Commit Size&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;You can easily see that using the ORDER option gives a dramatic speed improvement when you have a clustered index but ONLY if you are able to generate the sorting key in the SSIS package and DO NOT USE the MICS parameter.&lt;/P&gt;
&lt;P&gt;I think that the documentation for this behavior should have been made far more accessible and that ORDER parameter is so important that should have been shown in the first page of the standard editor for OleDb destination, hope it will be done in the next release of Sql Server.&lt;/P&gt;
&lt;P&gt;Moreover, as using MICS parameter leads to a dramatic performance degradation, I think that &lt;B&gt;the OleDb destination adapter should issue a warning if both ORDER and MICS parameter are set &lt;/B&gt;so to make the programmer think twice before using them together.&lt;/P&gt;
&lt;img src="http://www.sqljunkies.com/WebLog/aggbug.aspx?PostID=30380" width="1" height="1"&gt;</description></item><item><title>Distinct values in SSIS</title><link>http://www.sqljunkies.com/WebLog/aferrari/archive/2007/02/08/27954.aspx</link><pubDate>Thu, 08 Feb 2007 09:37:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:27954</guid><dc:creator>AlbertoFerrari</dc:creator><slash:comments>1</slash:comments><comments>http://www.sqljunkies.com/WebLog/aferrari/comments/27954.aspx</comments><wfw:commentRss>http://www.sqljunkies.com/WebLog/aferrari/commentrss.aspx?PostID=27954</wfw:commentRss><description>&lt;P&gt;
&lt;P&gt;Distinct is a partially blocking component that remove duplicates from one flow. Its main advantages against the sort component provided in SSIS are:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Memory usage: Distinct does not cache the whole flow (as Sort does) but retains in memory only the distincts, consuming less memory then Sort 
&lt;LI&gt;Distinct is partially blocking where Sort is fully blocking 
&lt;LI&gt;Distinct is freeware, you can easily download sources and adapt it to your needs &lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;A full description of the component and full sources can be downloaded at &lt;A href=""&gt;www.sqlbi.eu&lt;/A&gt;.&lt;/P&gt;&lt;img src="http://www.sqljunkies.com/WebLog/aggbug.aspx?PostID=27954" width="1" height="1"&gt;</description></item><item><title>Visio does not handle Schema? Let's force it to do!</title><link>http://www.sqljunkies.com/WebLog/aferrari/archive/2006/09/06/23290.aspx</link><pubDate>Wed, 06 Sep 2006 12:12:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:23290</guid><dc:creator>AlbertoFerrari</dc:creator><slash:comments>4</slash:comments><comments>http://www.sqljunkies.com/WebLog/aferrari/comments/23290.aspx</comments><wfw:commentRss>http://www.sqljunkies.com/WebLog/aferrari/commentrss.aspx?PostID=23290</wfw:commentRss><description>&lt;P&gt;I love VISIO to make database modeling, it gives very nice and colorful diagrams that I use for the documentation. &lt;/P&gt;
&lt;P&gt;I hate VISIO as it is really far from perfect, among all&amp;nbsp;the others one thing it lacks is the ability to generate a database using SCHEMA information. &lt;/P&gt;
&lt;P&gt;As I like schemas I decided to find a way to bypass VISIO limitations, here it is:&lt;/P&gt;
&lt;P&gt;I create table names in VISIO adding the schema to&amp;nbsp;their name (so "Customers"&amp;nbsp;is created as&amp;nbsp;"Config.Customers"). Then I&amp;nbsp;use all the colorful and funny tools to model the database and, after I have generated and run the script for creating the DB (which creates ALL the tables in the schema "dbo") I run this script that changes all the table names removing the schema information, creates the necessary schemas&amp;nbsp;and transfer the tables to their correct one. Et voilà, I can still love VISIO, even with this (totally absurd) limitation.&lt;/P&gt;&lt;PRE&gt;--&lt;BR&gt;--&amp;nbsp; This code analyzes all the tables in a database, removes the schema information&lt;BR&gt;--&amp;nbsp; from the name and transfer each table to its proper schema&lt;BR&gt;--&lt;BR&gt;Declare @TableSchema VarChar (100);&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; -- Will hold the table Schema&lt;BR&gt;Declare @TableFullName VarChar(100);&amp;nbsp;&amp;nbsp;&amp;nbsp; -- Will hold the table full name&lt;BR&gt;Declare @TableName VarChar (100);&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; -- Will hold the table name without the schema&lt;BR&gt;Declare @SqlCommand NVarChar(100);&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; -- Temporary for issuing SQL commands&lt;/PRE&gt;&lt;PRE&gt;--&lt;BR&gt;--&amp;nbsp;&amp;nbsp; Traverse the tables searching for those containing a dot in the name&lt;BR&gt;--&lt;BR&gt;Select &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; TableFullName = '[' + Name + ']',&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; TableSchema = SubString (Name, 0, charindex ('.', Name)),&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; TableName = SubString (Name, charindex ('.', Name) + 1, 100)&lt;BR&gt;Into #TempTables&lt;BR&gt;From Sys.Tables&lt;BR&gt;Where charindex ('.', Name) &amp;gt; 0&lt;/PRE&gt;&lt;PRE&gt;--&lt;BR&gt;--&amp;nbsp; A cursor used to traverse the table names&lt;BR&gt;--&lt;BR&gt;Declare Tables_Cursor CURSOR FOR &lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Select * From #TempTables;&lt;/PRE&gt;&lt;PRE&gt;Open Tables_Cursor;&lt;BR&gt;Fetch Next From Tables_Cursor Into @TableFullName, @TableSchema, @TableName;&lt;BR&gt;While @@Fetch_Status = 0 Begin&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; --&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; --&amp;nbsp; If the schema does not exists then create it&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; --&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; If Not Exists (Select * from Sys.Schemas where name = @TableSchema) Begin&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Select @SqlCommand = 'Create Schema ' + @TableSchema;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Exec sp_executesql @SqlCommand;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; End;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; --&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; --&amp;nbsp; Rename the table, removing its schema definition&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; --&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Exec sp_rename @TableFullName, @TableName&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; --&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; --&amp;nbsp; Transfer the table to its desider schema&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; --&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Select @SqlCommand = 'alter schema ' + @TableSchema + ' Transfer ' + @TableName&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Exec sp_executesql @SqlCommand;&lt;/PRE&gt;&lt;PRE&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Fetch Next From Tables_Cursor Into @TableFullName, @TableSchema, @TableName;&lt;BR&gt;End;&lt;/PRE&gt;&lt;PRE&gt;Deallocate Tables_Cursor;&lt;BR&gt;Drop Table #TempTables&lt;BR&gt;&lt;/PRE&gt;&lt;img src="http://www.sqljunkies.com/WebLog/aggbug.aspx?PostID=23290" width="1" height="1"&gt;</description></item><item><title>Data Flow Components, constructors and SSIS</title><link>http://www.sqljunkies.com/WebLog/aferrari/archive/2006/07/31/22321.aspx</link><pubDate>Mon, 31 Jul 2006 05:35:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:22321</guid><dc:creator>AlbertoFerrari</dc:creator><slash:comments>3</slash:comments><comments>http://www.sqljunkies.com/WebLog/aferrari/comments/22321.aspx</comments><wfw:commentRss>http://www.sqljunkies.com/WebLog/aferrari/commentrss.aspx?PostID=22321</wfw:commentRss><description>&lt;P&gt;Yesterday I had to fix a simple still subtle bug in &lt;A href="http://www.sqlbi.eu/Home/tabid/36/ctl/Details/mid/374/ItemID/0/Default.aspx"&gt;TableDifference&lt;/A&gt;, I think sharing the experience will be useful for everybody involved in custom component creation.&lt;/P&gt;
&lt;P&gt;The component has some fields that are initialized in the component constructor, at the end of the work the component does not clear them as I would expect the object to be destroyed by the SSIS engine. Everything works fine until&amp;nbsp;you run the component in a data flow task contained in a foreach loop then... bang! the component crash.&lt;/P&gt;
&lt;P&gt;The problem is that the component was not fresh-built but contained properties with the same data that were there&amp;nbsp;at the end of&amp;nbsp;its first execution. It seems to me that SSIS does not destroy components after a data flow terminates but reuses them during the subsequent runs of the same data flow task recalling their pre-execute method. I solved the problem clearing variable values in the pre-execute method, the problem is solved but I don't like the way the SSIS engine works, I would really like an object to be destroyed and then recreated when the container data flow task finishes execution.&lt;/P&gt;
&lt;P&gt;If confirmed (&lt;EM&gt;is anybody from&amp;nbsp;Microsoft listening?&lt;/EM&gt;) this behaviour is - in my opinion - very interesting but &lt;STRONG&gt;wrong&lt;/STRONG&gt;. You can decide to use this behaviour to implement some sort of state management in a component but in the same time you are prone to very nasty bugs if you do not clear all your properties in the pre-execute method and then initialize them to a meaningful value.&lt;/P&gt;
&lt;P&gt;I did not found useful documentation about this behaviour, has anybody done any kind of investigation about it?&lt;/P&gt;&lt;img src="http://www.sqljunkies.com/WebLog/aggbug.aspx?PostID=22321" width="1" height="1"&gt;</description></item><item><title>SqlBulkTool</title><link>http://www.sqljunkies.com/WebLog/aferrari/archive/2006/07/14/22236.aspx</link><pubDate>Fri, 14 Jul 2006 08:32:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:22236</guid><dc:creator>AlbertoFerrari</dc:creator><slash:comments>2</slash:comments><comments>http://www.sqljunkies.com/WebLog/aferrari/comments/22236.aspx</comments><wfw:commentRss>http://www.sqljunkies.com/WebLog/aferrari/commentrss.aspx?PostID=22236</wfw:commentRss><description>&lt;SPAN&gt;
&lt;P&gt;&lt;A href="http://www.sqlbi.eu/Home/tabid/36/ctl/Details/mid/374/ItemID/5/Default.aspx"&gt;SqlBulkTool &lt;/A&gt;is a command line utility that is used to quickly create a mirror of a database. It reads its configuration from an XML file containing source and destination command strings and a list of all the tables to mirror and then handles the work of copying the database in an automated and highly parallelized way.&lt;/P&gt;
&lt;P&gt;The parallelism can use the partition capabilities of SQL Server 2005: to handle a huge table it is enough to partition it to make the tool load it by running each single partition in a separate thread, dramatically increasing table load time. In the case where no partitioning is defined the parallelism is handled at the table level.&lt;/P&gt;
&lt;P&gt;You can find the full article &lt;A href="http://www.sqlbi.eu/Home/tabid/36/ctl/Details/mid/374/ItemID/5/Default.aspx"&gt;here&lt;/A&gt;. Full source code is provided, the project is freeware and available at &lt;A href="http://www.sqlbi.eu"&gt;www.sqlbi.eu&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/SPAN&gt;&lt;img src="http://www.sqljunkies.com/WebLog/aggbug.aspx?PostID=22236" width="1" height="1"&gt;</description></item><item><title>Table Difference 1.1</title><link>http://www.sqljunkies.com/WebLog/aferrari/archive/2006/07/06/22116.aspx</link><pubDate>Thu, 06 Jul 2006 14:34:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:22116</guid><dc:creator>AlbertoFerrari</dc:creator><slash:comments>0</slash:comments><comments>http://www.sqljunkies.com/WebLog/aferrari/comments/22116.aspx</comments><wfw:commentRss>http://www.sqljunkies.com/WebLog/aferrari/commentrss.aspx?PostID=22116</wfw:commentRss><description>&lt;SPAN&gt;
&lt;P&gt;&lt;A href="http://www.sqlbi.eu/Home/tabid/36/ctl/Details/mid/374/ItemID/0/Default.aspx"&gt;TableDifference &lt;/A&gt;has been updated to&amp;nbsp;&lt;A href="http://www.sqlbi.eu/Home/tabid/36/ctl/Details/mid/374/ItemID/3/Default.aspx"&gt;version 1.1.&lt;/A&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;SPAN&gt;The major improvement is the fact that now TableDifference can handle a field as 
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Key&lt;/STRONG&gt;: a field that represent a key in the table, you must supply KeyOrder 
&lt;LI&gt;&lt;STRONG&gt;Compare&lt;/STRONG&gt;: a field that is to be compared between OLD and NEW, you must supply UpdateID 
&lt;LI&gt;&lt;STRONG&gt;PreferNEW&lt;/STRONG&gt;: the field appear in both flows but you want the value from the NEW flow 
&lt;LI&gt;&lt;STRONG&gt;PreferOLD&lt;/STRONG&gt;: the field appear in both flows but you want the value from the OLD flow.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Mostly valuable implementation is PreferOLD, in this way you can use the ID of the OLD table in the subsequent flows to make any kind of update you want in an easier way.&lt;/P&gt;
&lt;P&gt;You can find the whole article and source code on &lt;A href="http://www.sqlbi.eu"&gt;www.sqlbi.eu&lt;/A&gt;.&lt;/P&gt;&lt;/SPAN&gt;&lt;img src="http://www.sqljunkies.com/WebLog/aggbug.aspx?PostID=22116" width="1" height="1"&gt;</description></item><item><title>Threads and custom components: FlowSync 1.0</title><link>http://www.sqljunkies.com/WebLog/aferrari/archive/2006/07/06/22115.aspx</link><pubDate>Thu, 06 Jul 2006 09:51:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:22115</guid><dc:creator>AlbertoFerrari</dc:creator><slash:comments>0</slash:comments><comments>http://www.sqljunkies.com/WebLog/aferrari/comments/22115.aspx</comments><wfw:commentRss>http://www.sqljunkies.com/WebLog/aferrari/commentrss.aspx?PostID=22115</wfw:commentRss><description>&lt;P&gt;Several people downloaded &lt;A href="http://www.sqlbi.eu/Home/tabid/36/ctl/Details/mid/374/ItemID/0/Default.aspx"&gt;TableDifference&lt;/A&gt;&amp;nbsp;to handle SCD faster, some of them, especially using it on huge table (more than 10 millions rows) noticed memory problems. The problem is that of a flow running too fast and making TableDifference cache data, we know of it and now we decided to solve it creating a new component called "FlowSync". You can find all the details and source code&amp;nbsp;&lt;A href="http://www.sqlbi.eu/Home/tabid/36/ctl/Details/mid/374/ItemID/1/Default.aspx"&gt;here&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;In the article there is a brief discussion about how SSIS handles the ProcessInput method of a component with more than one input, here is an extract:&lt;/P&gt;
&lt;P&gt;As you may already know ProcessInput is called once for every buffer and, in the case of a component with two or more inputs like TableDifference or Union All, this method is called once for each buffer of each input, so the inputs are mixed together and handled by the same method.&amp;nbsp;A solution to the problem of syncronizing input,&amp;nbsp;before deciding to develop FlowSync, has been that of using semaphores to stop the faster input inside the ProcessInput method. It would have been a nicer solution BUT ProcessInput is called in only ONE thread, even if it has two input flows. So, if ProcessInput is stopped then all the inputs of the components are stopped and the system will be in a deadlock state. &lt;/P&gt;
&lt;P&gt;This is very strange because each flow runs in a separate thread but it seems that the two thread synchronize on a single one when they need to pass data to the component. So the solution has been that of inserting the sync technique where we still have separate threads, hence directly on the flows with a transformation component: FlowSync.&lt;/P&gt;
&lt;P&gt;I would really like to see in the next version of SSIS the ability to decide if – when developing a component – we want ProcessInput to be called in a multithreaded environment or not, my personal opinion is that – using threads – programs become easier to write and maintain, TableDifference may be a good candidate to demonstrate this statement.&lt;/P&gt;&lt;img src="http://www.sqljunkies.com/WebLog/aggbug.aspx?PostID=22115" width="1" height="1"&gt;</description></item><item><title>TableDifference: a solution to SCD handling</title><link>http://www.sqljunkies.com/WebLog/aferrari/archive/2006/06/08/TableDifference.aspx</link><pubDate>Thu, 08 Jun 2006 12:14:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:21785</guid><dc:creator>AlbertoFerrari</dc:creator><slash:comments>3</slash:comments><comments>http://www.sqljunkies.com/WebLog/aferrari/comments/21785.aspx</comments><wfw:commentRss>http://www.sqljunkies.com/WebLog/aferrari/commentrss.aspx?PostID=21785</wfw:commentRss><description>&lt;P&gt;In&amp;nbsp;SSIS the SCD component&amp;nbsp;does a lot of work for you but it works so slowly that it is quite always convenient to create an “ad hoc” solution to handle the slowly changing dimensions.&lt;/P&gt;
&lt;P&gt;As the main problem is that of determining the difference between the last snapshot of the SCD and the current data we decided to write an SSIS component that has two inputs, one for the “old” and one for the “new” flow of data. It compares all the rows and sends each one to a different output for new, deleted and user configurable&amp;nbsp;updated data. You can then decide how to handle the variations executing the correct UPDATE sequences to the dimension table or whatever you need to handle variations.&lt;/P&gt;
&lt;P&gt;The following picture illustrates the component functionality better than thousand words.&lt;/P&gt;
&lt;P&gt;&lt;IMG src="http://www.sqlbi.eu/Portals/0/Articles/Table%20Difference%20Images/DataFlowSimple.png"&gt;&lt;/P&gt;
&lt;P&gt;Using TableDifference&amp;nbsp;two tables with 5.5 millions of records and 25 columns each were compared, using an AMD dual core processor with 4Gb of RAM and standard SATA disks running both SSIS and SQL Server in 7 minutes. The complexity of the algorithm is linear, so you can expect to take more or less 15 minutes for a 10 millions record tables with the same hardware.&lt;/P&gt;
&lt;P&gt;You can find the full article&amp;nbsp;that describes TableDifference &lt;A href="http://www.sqlbi.eu/Home/tabid/36/ctl/Details/mid/374/ItemID/0/Default.aspx"&gt;here&lt;/A&gt;. Full source code is available at &lt;A href="http://www.sqlbi.eu"&gt;www.sqlbi.eu&lt;/A&gt;.&lt;/P&gt;&lt;img src="http://www.sqljunkies.com/WebLog/aggbug.aspx?PostID=21785" width="1" height="1"&gt;</description></item></channel></rss>