<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>Donald Farmer on SQL Server Integration Services</title><link>http://sqljunkies.com/WebLog/donald_farmer/default.aspx</link><description>pamboli's naive perambulations in the world of data integration, ETL, metadata and data warehousing</description><dc:language>en-US</dc:language><generator>CommunityServer 1.0 (Build: 1.0.1.50214)</generator><item><title>New SQL Server Integration Services whitepaper</title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/05/23/15272.aspx</link><pubDate>Mon, 23 May 2005 22:26:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:15272</guid><dc:creator>donald farmer</dc:creator><slash:comments>0</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/15272.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=15272</wfw:commentRss><description>&lt;P&gt;&lt;A href="http://www.microsoft.com/technet/prodtechnol/sql/2005/intro2is.mspx"&gt;http://www.microsoft.com/technet/prodtechnol/sql/2005/intro2is.mspx&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kamal, our fearless leader, has an excellent whitepaper on the new TechNet SQL Server site.&lt;/P&gt;
&lt;P&gt;The paper covers the business isues that SSIS can address. It &amp;nbsp;includes real world scenarios and the challenges which IT departments face implementing data integration. &lt;/P&gt;
&lt;P&gt;Well worth a read.&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=15272" width="1" height="1"&gt;</description></item><item><title>My first SQL Server Integration Services book is on its way.</title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/05/21/15221.aspx</link><pubDate>Sat, 21 May 2005 18:53:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:15221</guid><dc:creator>donald farmer</dc:creator><slash:comments>1</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/15221.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=15221</wfw:commentRss><description>&lt;P&gt;I have not been blogging about SQL Server Integration Services&amp;nbsp;quite so much recently. This is partly because&amp;nbsp;I have been finishing my first SSIS book, &lt;EM&gt;The Rational Guide to Scripting with SQL Server 2005 Integration Services&lt;/EM&gt;. This should be out, along with Roger's book on Service Broker, at Tech Ed in June.&lt;/P&gt;
&lt;P&gt;I really like the Rational Guides. I liked them even before I wrote one - honestly! They are just the right size for me at around 150 pages. I have a very low boredom threshold and really can not be bothered slogging through some huge tome&amp;nbsp;-&amp;nbsp;anything that needs an index is too much for me to read from cover to cover. I tend to flick and flit from section to section. I have a ton (probably literally) of big IT books, but I have probably only read about 10% of each one. &lt;/P&gt;
&lt;P&gt;So, on the basis that I should only write a book that I would like to read, I narrowed down the scope to just the Script Task and the Script Component. However these are such great features that its difficult to cover even their full potential. Nevertheless I think I have a bit of everything in there. There are sections on using scripts in the data flow to calculate hashes for change data capture, to generate surrogate keys for dimension loading, to aggregate text columns, and to use regular expressions, in addition to scripting sources and an XML destination. The script task gets some coverage with some scripts to do scheduling, to handle file attributes, and to use with AMO for Analysis Services.&lt;/P&gt;
&lt;P&gt;The book has been fun -&amp;nbsp;it did not take long to write at all. I guess Tony and Jeff at Mann Publishing will&amp;nbsp;laugh at&amp;nbsp;that comment, because it was terribly late. But then I do have a product to ship and that has first priority. I would say it has first priority after family, but at this stage of the ship cycle I suspect my wife would laugh at that too.&lt;/P&gt;
&lt;P&gt;SSIS is such an easy application to write about!&amp;nbsp;By the time&amp;nbsp;one describes a business scenario and how to acheive it, the hard work is done.&amp;nbsp;Of course, I couldn't resist slipping in the occasional joke. I think the key&amp;nbsp;has been&amp;nbsp;to keep the scenarios interesting. But only you can judge if I have been successful!&lt;/P&gt;
&lt;P&gt;I do hope the book will be&amp;nbsp;practical. It is not a general guide to SSIS -&amp;nbsp;although there is one overview chapter on the application architecture.&amp;nbsp;One thing I have learned from teaching courses on SSIS is that spending too much time on architectural explanations is counter-productive, at least when I do it.&amp;nbsp;Have you &lt;EM&gt;tried &lt;/EM&gt;explaining the&amp;nbsp;difference between Control Flow and Data Flow? Jeesh, it's easier just to&amp;nbsp;get people building packages and let them pick up the difference themselves. It's like teaching someone to drive a manual transmission from a book - an overview of the principles can help, but too much is confusing, and in the long run there's nothing like practice.&lt;/P&gt;
&lt;P&gt;Other books are on the way from other authors. In fact,&amp;nbsp;there are some &lt;EM&gt;great &lt;/EM&gt;ones in the pipeline. I'm not sure which ones are officially announced, so I'll not name names yet, except for Brian Knight&amp;nbsp;whose book is pre-announced on Amazon. &lt;A href="http://www.amazon.com/exec/obidos/tg/detail/-/0764584359/"&gt;http://www.amazon.com/exec/obidos/tg/detail/-/0764584359/&lt;/A&gt;.&amp;nbsp;I have immediate plans for another book, and a glint in my eye about another two beyond that. Let's see.&lt;/P&gt;
&lt;P&gt;Meanwhile, my hope is that the Rational Guide will be a &lt;EM&gt;vade mecum&lt;/EM&gt;&amp;nbsp;in the SSIS developer's&amp;nbsp;laptop bag.&amp;nbsp;I guess while I am plugging the&amp;nbsp;goods, I&amp;nbsp;should give you the link if you would like to pre-order it:&amp;nbsp;&amp;nbsp;&lt;A href="http://www.mannpublishing.com/Catalog/BookDetail.aspx?BookID=40"&gt;http://www.mannpublishing.com/Catalog/BookDetail.aspx?BookID=40&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When you get your hands on it, please do let me know what you think. I am anxious to give folks what they need to do their job better.&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=15221" width="1" height="1"&gt;</description></item><item><title>Integration services training, the towering inferno, some television,  and more ...</title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/05/10/14113.aspx</link><pubDate>Tue, 10 May 2005 10:58:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:14113</guid><dc:creator>donald farmer</dc:creator><slash:comments>0</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/14113.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=14113</wfw:commentRss><description>&lt;P&gt;Last week I was in Tampa, training some Ascend partners for three days. The structure of these courses is very free form, and generally I get as much out of them as the trainees! For the customers I guess they get the advantage of very hands-on work with a product team member. The loose structure means that we can drill into their individual issues quite easily. Over three days we covered getting started with SSIS, data quality, deployment, auditing and compliance (there were a number of financial customers), warehouse loading, performance and&amp;nbsp;scripting. That's a lot to cover! &lt;/P&gt;
&lt;P&gt;It continually surprises me that we &lt;STRONG&gt;&lt;EM&gt;can &lt;/EM&gt;&lt;/STRONG&gt;cover so much in three days. It says a huge amount about the usability and effectiveness of the application that this is even possible.&lt;/P&gt;
&lt;P&gt;One of our great advantages going forward will be the broad user base, and the widely available skills. Think how many SSIS users there will be in a years time. In two years, think how many SSIS users will have quite deep experience of the application! At that point, we really become very established at the top end of the integration market, simply because of the body of skills which will be there to support the user community. Of course, that leaves us in the product team with a huge responsibility to support and&amp;nbsp;encourage these users. And of course, to ensure that the product meets their needs and expectations - so far, so good on that front!&lt;/P&gt;
&lt;P&gt;Well, it was not quite the towering inferno, but we did have a small fire my hotel in Tampa, at about 1am on my first night there. Not what I needed to catch up on sleep. At least it was in Florida and not chilly Seattle. Of course, when the alarm goes off you think it is only a drill. So I dressed properly, made sure I had my wallet and cellphone and headed for the door. When I looked out, the corridor was smoky! &amp;#8220;Oh bugger!&amp;#8221; does not quite capture my exact feelings,&amp;nbsp; but it's pretty close. We spent about 4 hours on the grass, while Tampa's finest did their bit. Hilariously, the fire investigators turned up in a Hummer at 2 o'clock in the morning, and still had their sunglasses on! Eventually we got back, but the elevators were out, so I had climb up fourteen floors to get back to my room. Still I got a full two hours sleep before the training course.&lt;/P&gt;
&lt;P&gt;While I was away in Tampa, MSDN released my first MSDN TV broadcast. It covers how to build and debug your first SSIS package. Simple stuff, but the kind of thing that new users seem to need. You can see it here. &lt;A href="http://msdn.microsoft.com/msdntv/episode.aspx?xml=episodes/en/20050505SQLServerDF/manifest.xml"&gt;http://msdn.microsoft.com/msdntv/episode.aspx?xml=episodes/en/20050505SQLServerDF/manifest.xml&lt;/A&gt;&amp;nbsp;. Let me know what you think. There's another one coming on data cleansing which should be fun. And Kamal has done some too, which should be out soon.&lt;/P&gt;
&lt;P&gt;A couple of people have asked me about the picture in the background of the TV introductions. It is a view of Victoria Terrace and Victoria Street in Edinburgh, from just outside the little garden&amp;nbsp;of the church of St Columba by the Castle. The frame is an old wooden window, complete with cobwebs and glass, that we found in an alley in Duvall, WA. When I first moved to MS, I did not have that most prized of locations - a window office. So this was my window for 3 and a half years - and very nice it was too.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Finally, on a personal note, I noticed Steve Ballmer sent out an excellent email about and public statement about Microsoft's approach to public policy initiatives, especially around the recently controversial equal rights bill in Washington State. There has been a lot about this in the news. One of the great strengths of Microsoft, both in the business and technical spheres, has been the ability to recognize mistakes or wrong directions, and to change course with confidence, integrity&amp;nbsp;and considerable skill.&amp;nbsp;These qualities have served us well as a company. This decision by Steve and the leadership is in that spirit, I think.&amp;nbsp;I felt quite proud to see them make this change. Let's hope that in a new legislative season MS can play an active role in ensuring equality in Washington.&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=14113" width="1" height="1"&gt;</description></item><item><title>SQL Server Integration Services 64 bit - featured in Bill Gates' keynote at WinHEC</title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/04/25/12734.aspx</link><pubDate>Mon, 25 Apr 2005 16:59:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:12734</guid><dc:creator>donald farmer</dc:creator><slash:comments>0</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/12734.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=12734</wfw:commentRss><description>&lt;P&gt;&lt;FONT face=Arial&gt;&lt;FONT size=2&gt;&lt;FONT color=#000000&gt;Fantastic to see a 64-bit ETL load live on stage in Bill Gate's keynote at WinHEC. SSIS really shines in these situations: the sheer performance and scale on 64-bit is stunning, but the visual debugging and ease-of-use really make for a compelling demo, even before people start comparing us dollar-for-dollar with the competition. &lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"&gt;BillG says &lt;EM&gt;This is one of the clearest and neatest SQL Demos I have ever seen. &lt;/EM&gt;Who am I to disagree?&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;&lt;FONT size=2&gt;&lt;FONT color=#000000&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"&gt;You really should watch this (our section starts 36:00 minutes in) ... &lt;/SPAN&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;A href="http://www.microsoft.com/events/executives/billgates.mspx"&gt;http://www.microsoft.com/events/executives/billgates.mspx&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial size=2&gt;The demo was put together by Ashvini and I, but of course Ash did most of the real work to make the demo as awesome as it is. The same scenario is also running live throughout WinHEC at the AMD booth - they gave us one of the very first multicore machines available to work with - so if you get a chance to stop by you should stop to see it..&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial size=2&gt;The combination of multicore procs, 64-bit processing and SSIS is going to astonish our market - I truly think that the way we lower the cost of entry to high-performance ETL will change the way people think about data warehousing in their organizations.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial size=2&gt;Enjoy the video - and thanks to Ash, Francois (who demos on stage) and the great guys from AMD.&lt;/FONT&gt;&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=12734" width="1" height="1"&gt;</description></item><item><title>Integration Services Editions</title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/04/21/12431.aspx</link><pubDate>Thu, 21 Apr 2005 21:51:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:12431</guid><dc:creator>donald farmer</dc:creator><slash:comments>2</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/12431.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=12431</wfw:commentRss><description>&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Earlier in the week anxious crowds gathered outside our imposing building, straining for that portentous plume of white smoke. Sure enough, a weary German (Matthias, for it is he), emerged from the genteel conclave that is SQL Server Shiproom (see &lt;A href="http://www.eweek.com/article2/0,1759,1787462,00.asp"&gt;http://www.eweek.com/article2/0,1759,1787462,00.asp&lt;/A&gt; ) to announce to the city and the world&lt;EM&gt;,&lt;/EM&gt;&amp;nbsp;&lt;SPAN style="FONT-STYLE: italic"&gt;Habemus CTP&lt;/SPAN&gt;. The new CTP (or &lt;SPAN style="FONT-STYLE: italic"&gt;IDW the XIVth &lt;/SPAN&gt;as it is known to the faithful) has it's work cut out. On the one hand it has to be conservative and not break the applications of our traditional supporters, while including enough new fixes to address pressing issues. I think it succeeds admirably and I really do recommend that MSDN subscribers and Beta users upgrade to this build.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;The April CTP got a bunch of publicity including a nice press release. I was a little miffed last week with our friends in marketing because I felt the excellent Forrester note on SQL Server Integration Services was not getting enough exposure from them. Of course, I understand that they have their work cut out on the big box server, especially the CTP announcement, but there is so much goodness in what we are achieving that I can't help wanting more. Other companies do a lot more of this. Oracle Warehouse Builder publicizes the fact that screenshots of the product are included in Ralph Kimball's ETL course (along with SSIS naturally, although Oracle inexplicably fail to mention that.) Informatica peppers my news tickers with press releases. I fully expect to open my mailbox one morning to read the headline &lt;SPAN style="FONT-STYLE: italic"&gt;PowerCenter commits transaction&lt;/SPAN&gt;. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;However, I have no doubt that one day the penny will drop and the significance of what we have achieved in SSIS will be clear. So this week it is great to see the Forrester report highlighted on the front page of our SQL Server site. Thanks guys! &lt;A href="http://www.microsoft.com/sql/default.mspx"&gt;http://www.microsoft.com/sql/default.mspx&lt;/A&gt; &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Now that we are moving nearer to RTM I get a lot of questions about SQL Server editions, and how SQL Server Integration Services is factored in different editions. Actually, most of the questions are about &lt;SPAN style="FONT-STYLE: italic"&gt;why&lt;/SPAN&gt; we are factored as we are. As a Program Manager I guess I can't duck these questions, as I always say in presentations: If you want to know &lt;SPAN style="FONT-STYLE: italic"&gt;how&lt;/SPAN&gt;, ask a developer. If you want to know &lt;SPAN style="FONT-STYLE: italic"&gt;why&lt;/SPAN&gt;, ask a PM. And if you want to know &lt;SPAN style="FONT-STYLE: italic"&gt;when&lt;/SPAN&gt;, ask marketing.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;So what are our editions of SQL Server Integration Services, and why?&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;SPAN style="FONT-WEIGHT: bold"&gt;SQL Server Express&lt;/SPAN&gt; is the entry-level edition of SQL Server, and SSIS does not surface in that edition at all. If you look at the listing of the various editions you will see that it does include Import / Export, but this is a simple utility, not a version of SSIS. Really, SSIS is simply too powerful to give away. The Import Export utility will meet most needs of those who currently use Express. Even those who used DTS in MSDE mostly only used it for Import and Export - it was less common to see it in use as a transformation engine. In the new .NET world these users may even find leveraging the .NET framework and ADO.NET a more efficient developer story for embedding lightweight data transfer and manipulation than working with an application API. However, for those who really need to do more, there is always the next step up ...&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;SPAN style="FONT-WEIGHT: bold"&gt;SQL Server Workgroup Edition &lt;/SPAN&gt;is a somewhat beefier edition, pitched between Express and Standard. It includes the Import / Export wizard which leverages SSIS to build simple source-destination packages, without any transformation logic. The packages can be saved and scheduled, so it is a useful tool. You can get some insight into our current thinking if you consider that the wizard is actually named the SQL Server Import Export wizard, not the SSIS wizard. The idea here is to start to distinguish between the use of SSIS in utilities and the use of SSIS as an enterprise-class data integration application. The sheer usability of SSIS enables us to build these simple wizards - try doing that in Data Stage! - while retaining&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;the power of a fully featured integration platform. In the future, I suspect we may see SSIS itself diverge into more differentiated architectures, like an SSIS Express. But don't quote me on it, I'm just woolgathering. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;What can I say about &lt;SPAN style="FONT-WEIGHT: bold"&gt;SQL Server Standard Edition&lt;/SPAN&gt;? It's only 2005 and already this may the bargain of the century. The Business Intelligence applications here are really fantastic value for money. You get the full SSIS in here, with the exception of a few high-end components. But there is no throttling of performance at all, and you do get some excellent features such as aggregation in the data flow and the Slowly Changing Dimension Wizard. Originally the SCD wizard was destined for Enterprise Edition only, but we heard from consultants and customers that this wizard was such a good learning tool for building dimension loads, that we should really enable those new to data warehousing to get their hands on it. And so here it is. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;At one time we had thought about restricting this edition to only one source and one destination (no Unions, Merges, or Multicasts) but that was soon shot down. Parallelism and the ability to handle conditional flows were such great features for SSIS that we could not bear to keep them from standard users. We also pulled the Data Mining and OLAP tasks back into this sku, following a flurry of anxious mails, after another posting suggesting they might only be in Enterprise edition. However, the transforms which enable data mining in the data flow, one of our&amp;nbsp;most exciting&amp;nbsp;features if you ask me, are still only available in ...&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;SPAN style="FONT-WEIGHT: bold"&gt;Enterprise Edition.&lt;/SPAN&gt; The big Daddy of them all. You get the lot. Data mining integration, fuzzy data cleansing, text mining ... Sometimes I have to pinch myself when I consider the feature set we are bringing to market. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Whatever edition you choose, I know you're going to have fun with this. And when you have fun, you'll spread the word, as our community is so enthusiastic and engaged. And &lt;SPAN style="FONT-STYLE: italic"&gt;that &lt;/SPAN&gt;is marketing that IBM and Larry will really struggle to match.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;PS: I do believe PowerCenter has just committed another transaction.&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=12431" width="1" height="1"&gt;</description></item><item><title>Enterprise ETL with SQL Server Integration Services</title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/04/18/12011.aspx</link><pubDate>Mon, 18 Apr 2005 11:03:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:12011</guid><dc:creator>donald farmer</dc:creator><slash:comments>2</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/12011.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=12011</wfw:commentRss><description>&lt;P&gt;Analysts are really starting to confirm the value of our enterprise data integration capabilities, compared with other ETL vendors.&amp;nbsp;This report from Forrester calls out our collaborative development environment, our data quality components, our scalability and reliability features, and of course our &amp;#8220;seductive&amp;#8220; price-performance ratio.&lt;/P&gt;
&lt;P&gt;&lt;A href="http://www.microsoft.com/sql/bi/integrate/productinfo/foresterreport.asp"&gt;http://www.microsoft.com/sql/bi/integrate/productinfo/foresterreport.asp&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As the report says: &lt;EM&gt;Get ready to take SSIS where you didn't dare take DTS. &lt;/EM&gt;&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=12011" width="1" height="1"&gt;</description></item><item><title>Business logic and data integration with SQL Server Integration Services</title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/04/03/10148.aspx</link><pubDate>Sun, 03 Apr 2005 16:52:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:10148</guid><dc:creator>donald farmer</dc:creator><slash:comments>2</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/10148.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=10148</wfw:commentRss><description>&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;Loading fact tables and handling surrogate keys has been the subject of a great discussion between &lt;/FONT&gt;&lt;FONT size=1&gt;Marco Russo, Michael Barrett Johnson and Jamie Thomson. Their threads center around the handling of key values which cannot be found in the dimension when loading a fact table. This is an interesting and important discussion, and you'll see that they have got right into the details. You can read the posts and comments here:&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL style="MARGIN-TOP: 0in; MARGIN-BOTTOM: 0in; MARGIN-LEFT: 0.5in; DIRECTION: ltr; unicode-bidi: embed" type=disc&gt;
&lt;LI style="MARGIN-TOP: 0px; FONT-SIZE: 10pt; MARGIN-BOTTOM: 0px; VERTICAL-ALIGN: middle; mso-outline-level: 1"&gt;&lt;A href="http://blogs.conchango.com/jamiethomson/archive/0001/01/01/1232.aspx"&gt;&lt;SPAN style="FONT-FAMILY: Verdana"&gt;&lt;FONT size=1&gt;http://blogs.conchango.com/jamiethomson/archive/0001/01/01/1232.aspx&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/A&gt; 
&lt;LI style="MARGIN-TOP: 0px; FONT-SIZE: 10pt; MARGIN-BOTTOM: 0px; VERTICAL-ALIGN: middle; mso-outline-level: 1"&gt;&lt;A href="http://www.sqlservercentral.com/columnists/jthomson/thenewetlparadigm.asp"&gt;&lt;SPAN style="FONT-FAMILY: Verdana"&gt;&lt;FONT size=1&gt;http://www.sqlservercentral.com/columnists/jthomson/thenewetlparadigm.asp&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/A&gt; 
&lt;LI style="MARGIN-TOP: 0px; FONT-SIZE: 10pt; MARGIN-BOTTOM: 0px; VERTICAL-ALIGN: middle; mso-outline-level: 1"&gt;&lt;A href="http://sqljunkies.com/WebLog/sqlbi/archive/2005/04/01/9945.aspx"&gt;&lt;SPAN style="FONT-FAMILY: Verdana"&gt;&lt;FONT size=1&gt;http://sqljunkies.com/WebLog/sqlbi/archive/2005/04/01/9945.aspx&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN style="FONT-FAMILY: Verdana"&gt;&lt;FONT size=1&gt;. &lt;/FONT&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;Marco previously always used a star join to find dimension keys for incoming facts, but followed Jamie's advice and tried a series of pipelined lookups. There are considerable performance advantages to this approach. In fact, an SSIS developer can just about perfectly implement Ralph Kimball's &lt;SPAN style="FONT-STYLE: italic"&gt;Surrogate Key Pipeline&lt;/SPAN&gt; subsystem which he has been recommending since 1998! See &lt;/FONT&gt;&lt;A href="http://www.dbmsmag.com/9806d05.html"&gt;&lt;FONT size=1&gt;http://www.dbmsmag.com/9806d05.html&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=1&gt; (the original article has also been reused almost verbatim in his book &lt;SPAN style="FONT-STYLE: italic"&gt;The Data Warehouse ETL Toolkit&lt;/SPAN&gt;.)&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;This pipelined system works extremely well for performance, but what to do about keys which are not found during lookup? Marco found handling the error stream cumbersome with many dimensions in your schema. Michael suggested using default values in the fact table definition, and Jamie preferred, as a matter of style, using a derived column to add an indicator value to the row to be inserted.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;In the course of the conversation, two very important points were raised which I want to drill into in more detail, as I believe they are critical to this issue in particular, and to the design of integration processes in general. Michael calls out the important point that there may be unknown keys in the fact table &lt;SPAN style="FONT-STYLE: italic"&gt;due to poor data integrity in the source system&lt;/SPAN&gt; and Jamie mentions in passing that he likes &lt;SPAN style="FONT-STYLE: italic"&gt;to manage all eventualities &lt;/SPAN&gt;within his ETL process.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;Michael's point is important because it should set off a little alarm bell for the integration developer: this is not just a technical matter but a business issue. Why is the integrity of the source system poor? Is it poor for technical reasons - perhaps there are transformation errors in the extract subsystem? Or is the quality poor for business reasons - perhaps information on new products is always delivered late to the IT department, or customer details are prone to be inaccurately recorded?&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;So in consideration of these problems, I am going to set out what I believe are two fundamental principles of integration process design:&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL style="MARGIN-TOP: 0in; MARGIN-BOTTOM: 0in; MARGIN-LEFT: 0.5in; DIRECTION: ltr; unicode-bidi: embed" type=disc&gt;
&lt;LI style="MARGIN-TOP: 0px; FONT-SIZE: 10pt; MARGIN-BOTTOM: 0px; VERTICAL-ALIGN: middle; mso-outline-level: 1"&gt;&lt;SPAN style="FONT-FAMILY: Verdana"&gt;&lt;FONT size=1&gt;Don't confuse business logic with technical logic;&lt;/FONT&gt;&lt;/SPAN&gt; 
&lt;LI style="MARGIN-TOP: 0px; FONT-SIZE: 10pt; MARGIN-BOTTOM: 0px; VERTICAL-ALIGN: middle; mso-outline-level: 1"&gt;&lt;SPAN style="FONT-FAMILY: Verdana"&gt;&lt;FONT size=1&gt;If your integration process must compensate for business processes, do so in a manner that leaves an audit trail back to the business process;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;This could end up being a whitepaper, so I'm not going to go into deep detail, but let's address them in relation to the surrogate key lookup problem that Marco, Jamie and Michael were discussing.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-WEIGHT: bold; FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;Don't confuse technical logic with business logic&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;Or the other way round for that matter. Your dimensional model is merely a model. Processes built on that model are only accurate to the extent that the model accurately represents reality. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;A Product ID which is not found in the dimension table is just that - a key not found in the reference table. &lt;/FONT&gt;&lt;FONT size=1&gt;Do not forget that this is first and foremost a technical issue which &lt;SPAN style="FONT-STYLE: italic"&gt;models &lt;/SPAN&gt;a business issue. There is plenty to go wrong before you can be sure that an individual technical case accurately reflects a individual business case. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;Perhaps the incoming data quality is poor and there is an error in the key. Perhaps, if you are using a surrogate key table as Kimball suggests, the process of populating that surrogate key table from the dimension table has gone wrong. Or, if you have special logic such as a WHERE condition in the query to load the key cache, perhaps that has failed to cache the necessary key. In these, and a host of other cases, inserting a new dimension member would be a mistake - but a mistake which would bubble up into the presentation layer for business users making decisions with the warehouse.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;A Product ID not found in a dimension table is almost certainly not an &lt;SPAN style="FONT-STYLE: italic"&gt;unknown&lt;/SPAN&gt; product in business terms, and that can be an important distinction to make because there may be cases where dimension members are genuinely unknown. If you are a food company and a customer phones in to say that some frozen-meal, long since thrown in the trash compactor, has left them chained to the lavatory for 24 hours, you may really have a business problem with an unknown product. And if you are a bank you want to be very sure that you are not making capitalization decisions based on certificates from unknown customers.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;You could perhaps address any potential confusion here with something as simple as choosing a good accurate term for the missing key. &lt;SPAN style="FONT-STYLE: italic"&gt;Unmatched&lt;/SPAN&gt;, perhaps, rather than &lt;SPAN style="FONT-STYLE: italic"&gt;unknown&lt;/SPAN&gt;. Even this has potential for problems. I well remember a case from my life before Microsoft where a customer complained to me that a subcontractor had not delivered a component despite being prompted for it. It turned out that the customer had indicated to the support team in an email that the component was &lt;SPAN style="FONT-STYLE: italic"&gt;outstanding &lt;/SPAN&gt;- which they took to mean that he was delighted with it!&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;Then again, you could just require that consumers of data warehouse reports are aware of exactly what &lt;SPAN style="FONT-STYLE: italic"&gt;unknown &lt;/SPAN&gt;means when it appears. But, ensuring that awareness, is itself a business process which you do not control and which may fail. The report could be forwarded to an accountant, or worse, an auditor, who has not had the training. And don't expect to shrug off the blame if a senior manager in finance makes a bad decision based on an ambiguity in your reports. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;So it is essential that you distinguish between the technical issue of not matching a key, with the business process that may lead to that member arriving late and with other business issues which result in similar symptoms.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;For unknown dimension members, SSIS enables special handling in the dimension load process for a boolean flag column which indicates whether the member is &lt;SPAN style="FONT-STYLE: italic"&gt;inferred &lt;/SPAN&gt;or not. I like that term, of course, but I also like the use of a boolean flag as it easily enables table scans and filters for those members for subsequent processes.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-WEIGHT: bold; FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;If your integration process must compensate for business processes, do so in a manner that leaves an audit trail back to the business process;&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;In these days of increased corporate scrutiny and regulatory requirements, you want to be sure your that your integration architecture does not introduce hidden decisions that may affect your compliance. For example, your personnel database may be replicated and integrated on a schedule which is substantially slower than your sales system: perhaps monthly and daily respectively. If a sale has been made by a new salesperson before their personnel record arrives, then inserting an inferred dimension member may be technically sound, but you also need to implement a process which ensures those records are properly updated at the next refresh of the employee dimension - inferred members in SSIS can help you do that. But just as importantly you will likely need a process to ensure that if any inferred members still remain after that refresh, that this gets flagged as an issue for the business stakeholders. You do not want income recorded in your reporting system that cannot be fully accounted for. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=1&gt;In such cases I would suggest multicasting off the records which are to be inserted as inferred&lt;SPAN style="FONT-STYLE: italic"&gt; &lt;/SPAN&gt;records and generating a report which ensures you have full and detailed history of the process which can be shared with the business stakeholders of the personnel system. In this way, they can know, in advance of the next refresh, whether these records are simply new employees who have not yet been updated, or the result of problems with the keys of existing employees. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;I hope Jamie, Michael and Marco do not mind me jumping on the topic. I believe that, as SSIS becomes more and more established at the heart of enterprise data integration, that the business issues need to be considered in just as much detail as the technical tips and tricks.&lt;EM&gt; &lt;/EM&gt;I guess I'll blogging more about this kind of thing in the future.&lt;/FONT&gt;&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=10148" width="1" height="1"&gt;</description></item><item><title>SQL Summit in Minneapolis</title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/04/03/10121.aspx</link><pubDate>Sun, 03 Apr 2005 13:00:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:10121</guid><dc:creator>donald farmer</dc:creator><slash:comments>2</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/10121.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=10121</wfw:commentRss><description>&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;Kent has already posted about the SQL Server Summit in Minneapolis: it was a great event, so I hope you don't mind me giving my own account.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;My first trip to Minneapolis proper (previously I had only ever seen the airport) looked like it was going to get off to a good start. My flight arrived somewhat late, and it was already late evening, but the SuperShuttle staff were friendly and appeared, (deceptively) to be efficient.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;The driver was listening to some Ethiopian religious music, so we got into a conversation about the wonderful Aster Aweke, one of my favourite musicians. However, from there on things went downhill rapidly. A few more people got on the shuttle and I don't know if it was the driver or the dispatcher who made the call, but we delivered them to the suburbs before dropping me off downtown. It's somewhat disconcerting to be on a &lt;SPAN style="FONT-STYLE: italic"&gt;downtown hotel shuttle&lt;/SPAN&gt; driving through what looked like countryside in the middle of the night. I thought I was being kidnapped, but at least I now know where Minnetonka is.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;Finally, after over an hour-and-a-half on the shuttle I got to the hotel, at 11.45. I was hungry, wanted a shower, and still had to run through my demo for the keynote in the morning. But room service in the Hyatt stops at midnight, and, bizarrely, all the hot water in the hotel was being shut off at midnight too for some repairs. So I had a lukewarm bath, worked on my demos until about 2am, and set my alarm for 6. I didn't even get to meet up with Kent for a pint.&amp;nbsp;I shouldn't complain - in the morning I discovered that Paul Flessner had had an even worse trip - a cancelled flight meant he had to drive from Bloomington to Chicago in a massive hailstorm, which reduced Ilya Bukshteyn from marketing to a nervous wreck.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;But the summit was well worth it. The local team are clearly doing an impressive job of building an enthusiastic community of SQL Users. The keynote room held about 1300 people and was packed. My sessions (on SSIS, and Analysis Services) were in rooms for 300 and were standing room only. The sponsors and exhibitors were delighted and&amp;nbsp;all the speakers got good audiences who were knowledegable and appreciative. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;Paul's keynote was a fine start to the day. There's a real candour and honesty throughout the SQL Server team about our product and our processes, and it really works to build a community of users who feel they can commit to us. Paul was able to pitch his keynote in such a way that everyone in the audience, from the VP of Unisys sitting beside me, to the overburdened DBA just behind be, recognized their own issues in what he was saying. You find this a lot in the SQL Server world - Dave Campbell and Bill Baker are experts at it, too - that the business vision is driven intricately with a detailed technical grasp of what is needed to fulfill that vision on the ground. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;I think some SQL Server enthusiasts do not realize how rare this is. But try attending an IBM or Oracle conference, or Siebel or SAP, and you'll find that there are really two distinctive approaches there: a glossy, high level, business-driven marketing message, and a technical practical message. In the SQL Server world, it is very different: you come away with the feeling that Paul or Bill or Dave could come down from the podium and help you tune that query, or optimize that dimension hierarchy, just as readily as they could map out your enterprise's business relationship with Microsoft. And it's not just about technical knowledge - which could be merely academic - it's also about having a sense of what is needed to drive technical decisions to achieve really practical goals. This mix of vision and practicality is what attracted me to the SQL Server team in the first place, and it's good to see we still have it in bucketloads.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;My own demo in the keynote went pretty well - the idea was to start from scratch and in only five minutes build an SSIS package, a mining model and a report to analyze some customer information from Excel into a &lt;SPAN style="FONT-STYLE: italic"&gt;hot leads&lt;/SPAN&gt; report for a sales team.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Despite no rehearsal at all, Paul and I got the stagecraft right, I think, and I was able to complete the demo in only 4 1/2 minutes, which surprised even me. And the audience got it too, I think, seeing how BI can actually add real value to raw data easily and efficiently. Hopefully it also underscored the huge value to developers of having all these tools together in a single environment.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;So from an inauspicious start, my visit to Minneapolis ended up being really worthwhile. I met a ton of good customers (I have about 30 detailed issues to follow up on), had good partnership conversations with the likes of Unisys, Sogeti and Panorama which will all lead to good things, met up with old friends (hi Mike from Appsmart!), had awesome audiences at my sessions, and was able to meet up with the local team who did such a great job of organizing the conference. The team also took me out to a pub in Minneapolis where I was able to have a nostalgic pint of McEwans, and my shuttle driver back to the airport turned out to be a real football fan (soccer for the yanks) so we chatted away merrily about Scotland's World Cup chances. (How long before we change our national motto from &lt;SPAN style="FONT-STYLE: italic"&gt;Nemo me impune lacessit &lt;/SPAN&gt;to &lt;SPAN style="FONT-STYLE: italic"&gt;There is still a mathematical possibility we could qualify&lt;/SPAN&gt;?)&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;FONT size=1&gt;If every one-day summit was like this, I could do 365 a year.&lt;/FONT&gt;&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=10121" width="1" height="1"&gt;</description></item><item><title>Staging Areas and Transfer Areas for your Data Warehouse</title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/03/27/9716.aspx</link><pubDate>Sun, 27 Mar 2005 12:54:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:9716</guid><dc:creator>donald farmer</dc:creator><slash:comments>4</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/9716.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=9716</wfw:commentRss><description>&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Staging Areas are a common requirement in a Data Warehouse architecture, but they raise some interesting issues for both the warehouse administrator and the data integration designer, whether working with SQL Server Integration Services or other integration and ETL tools. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Recently Kirk and Ashvini and I had a little mail thread about this with Allan Mitchell, one of our tireless MVPs (see &lt;A href="http://www.sqlis.com/"&gt;www.sqlis.com&lt;/A&gt; for more of Allan's amazing support for SSIS.) As ever, Kirk and Ash were on the money technically, and I just waffled on about the history and philosophy of staging, but readers of SQLJunkies blogs are used to that by now. (As an aside, when are we going to rename this site IntegrationServicesJunkies? A much more accurate reflection of our influence.)&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;But to return to our sheep. I thought I would at least share some thoughts on staging areas and transfer areas, and why I increasingly emphasize the difference between them ...&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;What is a staging area? The best, and most amusing, description of this architectural feature that I know is naturally from Ralph Kimball, which you can and should read about here: &lt;A href="http://www.intelligententerprise.com/040101/701warehouse1_1.jhtml"&gt;http://www.intelligententerprise.com/040101/701warehouse1_1.jhtml&lt;/A&gt; Ralph uses the metaphor of a kitchen, and very persuasive it is too.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-STYLE: italic; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;The data warehouse's staging area is very similar to the restaurant's kitchen. The staging area is where source data is magically transformed into meaningful, presentable information. The staging area must be laid out and architected long before any data is extracted from the source. Like the kitchen, the staging area is designed to ensure throughput. It must transform raw source data into the target model efficiently, minimizing unnecessary movement if possible.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;I wonder about metaphorical determinism, because I typically use a different image to describe staging areas, and come to somewhat different, although, I think, complementary conclusions. I like to use an image from the history of railways for this problem and I make the distinction between &lt;SPAN style="FONT-STYLE: italic"&gt;transfer&lt;/SPAN&gt; and &lt;SPAN style="FONT-STYLE: italic"&gt;staging&lt;/SPAN&gt; areas. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;In the early days of railways there were two significant gauges or widths of track: Stephenson's Standard used by most British rail companies, and Brunel's Broad gauge used by the Great Western. There were variations of this discontinuity in most countries: Oregon was broad, but Montana was standard, the southern states largely broad before the Civil War; Gujarat railways were standard, the rest of India unusually broad. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;In order to move goods across a country you needed transfer stations where goods could be moved from one system to another That's why Reading in the UK became an important rail centre: it was the meeting place for different standards where goods could be transferred between systems. Or one required transfer &lt;SPAN style="FONT-STYLE: italic"&gt;technologies&lt;/SPAN&gt;, such as the Ramsey Transfer at Pocatello, Oregon, which was able to lift a carriage up from one track, replace its rolling gear underneath, and replace the carriage on another track of a different gauge.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Data integration and data warehousing face similar problems. Different technologies such as flat files, or web services, LOB systems or different vendor databases have differing storage requirements, which is a pretty obvious need for a transfer technology. But there are other issues in play too. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;For example, in a data warehouse you may well have a specialized tool for, say, high performance sorting. This may be able to read and write data to and from text files or to and from streams. In addition you may have a specialized tool for data quality, such as postal address standardization, which may also read and write text files and streams. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Streaming between these tools would be highly efficient, but may not be possible. Perhaps the expected formats in the stream are incompatible, or there may be no way to schedule the writing from one application with the reading from another or to address the streams in a compatible manner. But they &lt;SPAN style="FONT-STYLE: italic"&gt;can &lt;/SPAN&gt;both read and write flat files: so we end up with a &lt;SPAN style="FONT-STYLE: italic"&gt;transfer &lt;/SPAN&gt;area where data is landed to and read from disk in flat file format just to enable compatibility between two technologies.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;So, if we can enable a technology which integrates these applications seamlessly (let's call it SQL Server Integration Services, just for now) we would not need staging areas, yes? Well not quite, and this is where the difference between transfer and staging becomes important ... &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Even after railways standardized gauges, there was still a great need for &lt;SPAN style="FONT-STYLE: italic"&gt;staging warehouses&lt;/SPAN&gt; for goods. Why? Because staging is not just about the need to physically transfer goods - or data - but because at times there is a procedural need to have it stay put for a while. In the industrial world it may be more economic to move timber in larger quantities than the sawmill can use at any one time, because it is economic to fell it all at once at a particular time of year. Or it may be procedural, like having to wait for customs to examine the goods. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;In data warehousing, it may be necessary to extract your data from the mainframe between 5pm-8pm, but the data warehouse is not ready to receive it until 10pm. Or it may be necessary to do a thorough data quality audit and have that signed off by a DQ analyst as a human workflow step.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;If you &lt;SPAN style="FONT-STYLE: italic"&gt;must &lt;/SPAN&gt;have transfer areas for technological reasons, and you &lt;SPAN style="FONT-STYLE: italic"&gt;prefer &lt;/SPAN&gt;to have staging areas for procedural reasons then it does make sense to conform them: to make your transfer area a staging area. But just having a transfer area for its own sake, where it adds no value and introduces additional complexities to the management of the integration process - this is a waste of time.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;What additional complexities does a transfer area add? Well consider the logic needed to move data between the sorting component and the address cleansing component. It's not just a matter of landing the data to disk and reading it again. What if the writing operation fails? What if the reading operation fails to start? Does the transfer area need backed up? What RAID level is appropriate for it? Where should it be hosted? Who should have access to it? How do you log and audit the sorting operation and the data quality operation - do they have different log files? And so on.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;What SSIS provides - almost uniquely - is the ability to easily do away with transfer areas, by integrating many different technologies into a single, auditable and manageable process with integration of all aspects including scheduling, logging, error handling and recovery. Even if we do not have the stock components to handle a technology, other technologies can still easily be added to the flow by customers or vendors, which is why we are getting excited partners queuing up to integrate with us. The high performance sorting and data quality technologies I mentioned earlier are examples of this. Once a technology lives in the SSIS control flow or data flow, it can participate as fully in our rich management and integration story as any of our own components.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;So is there still a need for staging areas? Yes, but now it can be &lt;SPAN style="FONT-STYLE: italic"&gt;solely &lt;/SPAN&gt;a procedural choice. To flip back to Ralph Kimball's metaphor, the staging area is now &lt;SPAN style="FONT-STYLE: italic"&gt;just &lt;/SPAN&gt;a kitchen where the master chef can prepare the data for consumption - the work surfaces are clean and uncluttered, and the food storage is are efficiently to hand, but hygienically separated.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Interestingly, we can now turn the question around. If you have a staging area for procedural reasons, is there an advantage to using it also as a transfer area? Perhaps in some cases there is, such as taking advantage of that procedural stage to partition the flow into multiple raw files which can the be used in parallel for better performance in the next step. In the same way, a sawmill may grade choose to have its timber delivered ungraded in bulk then grade it themselves in storage before processing. Use staging areas to handle schedule incompatibilities between processes, to provide an auditing step, and to add value to data before delivering it to the warehouse.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;I believe that only SSIS really provides this &lt;SPAN style="FONT-STYLE: italic"&gt;complete &lt;/SPAN&gt;flexibility in the data integration world. Some of you will have seen presentations in which I suggest that SSIS removes some of the need for staging areas. I'll be sharpening up my language in future and keeping this distinction between staging and transfer areas much clearer.&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=9716" width="1" height="1"&gt;</description></item><item><title>IBM finally buys Ascential (what kept them?)</title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/03/15/8913.aspx</link><pubDate>Tue, 15 Mar 2005 11:01:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:8913</guid><dc:creator>donald farmer</dc:creator><slash:comments>4</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/8913.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=8913</wfw:commentRss><description>&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;You've got to hand to it to Peter Gyenes, he knows how to work a deal. He was the CEO of Ardent when Informix bought them in March of 2000. By July of that year he had replaced Jean-Yves Dexmier as CEO of Informix. By September he had split Informix into two operations, Informix and Ascential, and then in 2001 sold the aging Informix database to IBM for $1B, getting to keep the cool data integration product for himself. Now, a mere 4 years later he lands another $1.1B deal selling to IBM the company they originally left on the table. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;It's exactly what the great (and shamefully under-praised) George Mikes used to say about his countrymen (Gyenes is a good Magyar name): &lt;SPAN style="FONT-STYLE: italic"&gt;A Hungarian can get into a revolving door behind you and come out in front of you. &lt;/SPAN&gt;Nice work, Peter.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;So what does it mean for the data integration market, and for SQL Server Integration Services? Not a lot really, as IBM have been cross-selling Ascential for all this time anyway. It will be interesting to see if they commoditize data integration as a central feature of the database offering, in the way that we and our friends in Redwood Shores have done. To do so, they would need to drop the price somewhat: from an-arm-and-a-leg to at least a leg.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;More likely they will stay at the high end, which is where they have skills and presence. Their press release hints at this, I think, when they set out what they see as the WebSphere and DataStage scenarios: &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 8.25pt; MARGIN: 0in 0in 0in 0.5in; COLOR: black; FONT-FAMILY: verdana; mso-outline-level: 2"&gt;&lt;EM&gt;&lt;FONT size=2&gt;For example, a company trying to consolidate data from multiple ERP systems into a single system could leverage WebSphere Information Integrator to access various mainframe or distributed sources for profiling and assessment, and then use Ascential Software's data migration and transformation capabilities to integrate the data. &lt;/FONT&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 8.25pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;&lt;FONT size=2&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Multiple ERP systems are certainly a high end scenario - and one which SSIS handles very nicely along with BizTalk.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;I think the most telling aspect of this purchase is that is has been so long coming. Microsoft has been steadily investing in Business Intelligence, building a tightly integrated feature set that, by it's nature, spans the market from entry-level to high-end enterprises and which has organically developed within a unified vision. IBM, as one customer said to me yesterday "seems to splurge in fits." A little unfair perhaps, but I can see his concern.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;As for Informatica &amp;#8230; &lt;SPAN style="FONT-STYLE: italic"&gt;and now there was one&lt;/SPAN&gt;. The big three database vendors all have data integration offerings under their wing. IBM now has a big but difficult platform, which suits a company with their global services focus. Oracle has Warehouse Builder, which is ok-ish if you happen to be an Oracle DBA stuck in your datacenter Muqata, but isn't likely to win many friends outside the firewall. We have SSIS, which is nimble, versatile, easy to use, and, according to Forrester, &lt;EM&gt;downright seductive&lt;/EM&gt;. (Seductive is good, by the way - not a phrase used often enough around me, so expect me to quote it often.) &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;So, whither Informatica as the biggest pure-play data integration vendor? If I was a betting man (I'm not. I put three dollars in a slot in Vegas and the guilty thrill kept me up all night) I would look at their mailing address and make no guesses at all.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;As usual, all the proceeding is personal ramblings, largely influenced by a red-wine hangover. (It's sad to think I sat up all night talking about Analysis Services and Data Mining&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;for clickstreams over a bottle of claret. I sure didn&amp;#8217;t do that when I was 18.) Our marketing guys, as marketing guys do, have a more balanced take on all this. Maybe Tom Rizzo will give the official line in his blog.&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=8913" width="1" height="1"&gt;</description></item><item><title>Oracle and SQL Server Integration Services </title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/03/13/8819.aspx</link><pubDate>Sun, 13 Mar 2005 13:30:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:8819</guid><dc:creator>donald farmer</dc:creator><slash:comments>12</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/8819.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=8819</wfw:commentRss><description>&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;I have had some interesting conversations with SQL Server customers, partners and our field about&amp;nbsp;SSIS and Oracle in the last couple of weeks. Some were of a tone which prudence and good taste suggest is better left unreported, but the others, if less juicy, were certainly worth discussing here. Typically they revolve around some combination of three key technical issues and three key scenarios: migration, integration surrounding Oracle and loading to Oracle. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;SPAN style="FONT-WEIGHT: bold"&gt;Migration. &lt;/SPAN&gt;I'm glad to say that Oracle to SQL Server migration is the most common scenario. Let's not be churlish here. Oracle is a damn fine database, but we come across a lot of customers who cannot wait to get off it and onto the SQL Server platform - which significant numbers find to be faster, more cost efficient, and better supported. Sure, it happens the other way round sometimes, but in general, Oracle to SQL Server migration is a common requirement and one which folks often come to us for help with - and naturally, we are delighted to help. Their key technical requirement is to get data out of Oracle and into SQL Server quickly and accurately. There are other requirements, such as accurately porting PL/SQL or Java code, and Microsoft has some great experience in that area, but strictly speaking these do not involve SSIS except on the periphery.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;SPAN style="FONT-WEIGHT: bold"&gt;SQL Server surrounding Oracle.&lt;/SPAN&gt; Another common scenario is where a customer already has a well-established Oracle system for their transactional or line of business system, or even an entrenched data warehouse. In such cases they may be wary of disruption and either permanently or temporarily keep the Oracle system running. However, requirements are rarely static in the world of IT and these companies find they want things that we are good at - perhaps to use Analysis Services for reporting and data mining, or to use SQL Server to build their data warehouse entirely, because of our great support for Business Intelligence and our ease of management. There are a ton of companies out there who are entirely Oracle shops, except that they have Analysis Services, or Reporting Services, too. It's always fun (and it happens quite often) in our Executive Briefings with customers to see even CIOs confidently say that all their databases are Oracle - only to have someone from IT, or their MS account rep say "Well, actually &amp;#8230; we have SQL Server Analysis Services departmentally, and all the marts are running on SQL Server too." In this scenario, again, the issue is getting data out of Oracle quickly and efficiently and into the data mart or warehouse.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;SPAN style="FONT-WEIGHT: bold"&gt;SQL Server to Oracle.&lt;/SPAN&gt; What can I say? S*!&amp;amp; happens. Seriously, however, we do come across this scenario in positive ways. For example, a customer may have an established Oracle data warehouse, but they are sensibly moving their transactional or line of business systems to SQL Server. Perhaps they migrate the transactional system first, leaving the Oracle warehouse intact. Or perhaps they have an Oracle warehouse and implement a new line of business system in SQL Server and integrate the two. In this scenario they would like to be able to load data into Oracle as quickly and efficiently as possible. Does SSIS support them? I'll reveal the secret two-letter answer later ...&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;We come across three technical issues with these scenarios: bulk extracting from Oracle, querying Oracle and loading into Oracle. Let's now consider these.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&lt;SPAN style="FONT-WEIGHT: bold"&gt;Bulk extracting from Oracle. &lt;/SPAN&gt;This is pretty straightforward. In most cases you can use the .NET provider or the Microsoft OLEDB provider for Oracle. Oracle do have their own providers, but we find that most users are happier with the performance of the Microsoft versions. We hear from customers that performance with these providers is generally pretty good: we have heard from "acceptable" to "incredible." I guess it depends on specific needs and expectations. For migration scenarios it is also possible to bulk export from Oracle and then load from the generated flat files into SQL Server. That is efficient, even though it incurs an additional cost of writing to and reading from disk in between. But most users find the .NET or OLEDB providers meet their needs. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Another huge strength of SQL Server here is the use of Replication. The ability to synchronize Oracle and SQL Server really rocks and the replication team in SQL Server has done amazing things to make this easy to do. If replicating to a staging table, SSIS can take up the story from there and add additional value.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; mso-outline-level: 1"&gt;&lt;SPAN style="FONT-WEIGHT: bold; FONT-FAMILY: Verdana"&gt;Querying Oracle&lt;/SPAN&gt;&lt;SPAN style="FONT-FAMILY: Verdana"&gt;. Compared to bulk extracting, querying imposes some additional demands such as the need to parameterize queries and often the need to handle Unicode and type conversion problems. Here SSIS really appears to have advantages for users over other applications: at least our users report that we do. Parameterizing at first is a little tricky, because the OLEDB providers do not support parameterized queries. You can set parameters and parse the query, because the OLEDB UI is fairly generic, but the source fails when run against the server. This also happens with DB2. Fear not, gentle reader, for help is at hand in the dashingly handsome form of Property Expressions. The SQL statement you use to query the Oracle source can be set by a variable, and that variable can be composed using a property expression such as: &lt;/SPAN&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'"&gt;"select * from Orders where OrderID &amp;gt; " + @LastOrderID. &lt;/SPAN&gt;&lt;SPAN style="FONT-FAMILY: Verdana"&gt;It's a parameterized query - and a very flexible one - in all but name.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Other requirements, such as Unicode handling, may be exposed through the provider. Alternatively, such issues can be handled explicitly in the SSIS data flow using derived columns or data conversions to conform Unicode / ANSI, code pages and data types.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Talking about things exposed through the provider, have a look at the .NET Oracle provider in the SSIS Import Export wizard. Due to some limitations in the way that .NET provider user interfaces are factored, we have to rely on a generic property bag to configure these in the wizard - because it has to live outside Visual Studio and the UIs are provided by Visual Studio, if you must know. So we just expose the raw properties. One of them is a boolean with the delightful name: &lt;SPAN style="FONT-STYLE: italic"&gt;Workaround Oracle bug 914652. &lt;/SPAN&gt;There's nothing like washing dirty laundry in public! In this case it's an Oracle skidmark, not ours.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;And so to our third technical issue. &lt;SPAN style="FONT-WEIGHT: bold"&gt;Loading into Oracle.&lt;/SPAN&gt; Do we support fast loading into Oracle? The obvious answer is no, and the obvious tone with which express this negative is one of surprise, as if someone had requested a Gideon Bible in a North Korean hotel room. But there's more to it than meets the eye and the question raises some important issues about the positioning of SSIS in the market.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;The first point to consider is why the request is made in the first place. Truth is, most customers report that Oracle's OLEDB and .NET providers really suck at loading data. Oracle just have not designed high performance interfaces - they prefer their own proprietary interfaces, for good proprietary reasons no doubt. So customers in my third scenario above often need to find a way to get data into the Oracle database. Yet here in SSIS, we have no passion to provide fast loading which would sustain Oracle as a database platform -the Microsoft Oracle providers are certainly not optimized for that. The best strategy from our point of view is for users to move away from Oracle completely and, indeed, we know of several cases where customers have ramped up their Oracle to SQL Server migration to avoid these cohabitation problems. If we provided an Oracle fast load solution out of the box, perhaps this results in a net loss for us in migration scenarios, because we extend the lifetime of the Oracle database, which frankly is a disservice to our customers. Besides, it really is Oracle's job to provide better interfaces. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;As I said, Oracle &lt;SPAN style="FONT-STYLE: italic"&gt;do &lt;/SPAN&gt;provide fast loading through proprietary means and it is possible to build SSIS packages which write to text files and then invoke the Oracle fast loader with an ExecuteProcess task. Some customers don't like the workaround, but some love it. In practice, it turns out to be not so bad as might be thought, because landing to disk before bulk loading provides a check point of the processed data which enables restartability of the fast load process. This can be very practical when enrolling the entire load in a single transaction for performance.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Some users take the view that SSIS cannot be a real ETL tool if we do not enable fast loading into other databases. They see ETL and data integration as being by their nature generic processes which should enable even-handed integration throughout the enterprise. I think there are two principal responses to this. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;First of all, SSIS is fundamentally a data integration &lt;SPAN style="FONT-STYLE: italic"&gt;platform&lt;/SPAN&gt;, not just an application. That means that we have the technical hooks and the strategic drive to enable a wide range of third parties to extend our platform, from individual developers to well-established ISVs. We are actively encouraging an integration ecosystem which provides end users with a range of integration technologies and features which no one application or company, not even Microsoft, could provide. This will be a cornerstone of our success. How does this relate to Oracle integration? Well already I know of a number of individuals and ISVs who are working on this very problem - writing wrappers around Oracle's bulk loaders to make them easy to use within SSIS, or writing deeper integration of Oracle's bulk interfaces with the SSIS data flow. I do not expect that Microsoft will write an Oracle fast loader - currently it comes in around number 999 in my list of 1000 features for next version, just slightly ahead of recompiling for Linux. But on the other hand I do fully expect that there &lt;SPAN style="FONT-STYLE: italic"&gt;will &lt;/SPAN&gt;be excellent Oracle loading capabilities from third parties, who may well live long and prosper in that market.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Secondofly (thank you, Arrested Development, for this neologism) I think it is important to stress the advantages of SQL Server as the complete data platform. One of the key issues in data integration is the difficulty of pulling together all these disparate sources and applications that grow like Topsy in enterprises. SQL Server provides a solution for this, not by making Integration Services a kind of duct tape to hold it all together, but more fundamentally by enabling inherent integration through SQL Server's data platform capabilities. It's no surprise to me that Informatica's metadata product is called Superglue. My bottle of Superglue in my&amp;nbsp;toolbox says it is for repairs throughout the household and shop. I believe we do better by our customers by enabling them to build enterprise architectures which are not broken in the first place. Good integrated engineering is better than glue. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Microsoft and Oracle both take this approach. (It's only in their most recent version that Oracle's Warehouse Builder enabled loading non-Oracle sources at all, while DTS has enabled generic integration for years.) The pure play Data Integration vendors will no doubt continue to address platforms in a more even-handed way. In many ways that will be their strength going forward. But such even-handedness comes at a price, quite literally, as witnessed by the expense of licenses and services for pure play vendors' integration tools.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;I'm looking forward to many happy days with SSIS, moving customers from Oracle to SQL Server. For those who wish to load Oracle quickly and efficiently, I will be glad to direct them to partners who can provide those capabilities within our architecture. I will also be comfortable in the knowledge that not fast loading to Oracle is a relatively minor weakness, compared to the enormous strength we gain from having such partners, and the ease with which they can extend our capabilities and enhance SQL Server as the complete data platform for our customers.&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=8819" width="1" height="1"&gt;</description></item><item><title>More on Slowly Changing Dimensions in SSIS </title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/03/10/8737.aspx</link><pubDate>Fri, 11 Mar 2005 00:48:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:8737</guid><dc:creator>donald farmer</dc:creator><slash:comments>5</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/8737.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=8737</wfw:commentRss><description>&lt;P&gt;Ash has a great post below on the Slowly Changing Dimension component. &lt;/P&gt;
&lt;P&gt;It's important to note that this component was designed to be robust and easy to use, rather than a pure high performance component. Most experienced DW practitioners have a sackload of favoured and well tried techniques for handling SCDs. The SSIS SCD component and wizard help the user master SCD loading techniques reliably - and with some interesting additional functions over the standard SCD processes.&lt;/P&gt;
&lt;P&gt;One of the things I am really glad we did, was to enable the user to break apart the data flow built by the wizard and make their own changes to it, tweaking the lookup, the update statements and all the various components in between. &lt;/P&gt;
&lt;P&gt;This enables some neat scenarios. For example ... In the wizard, you choose between using a boolean flag or a start/end date combination to show current and expired records. But if you would like to use both, you can still do this by tweaking the downstream components.&lt;/P&gt;
&lt;P&gt;For Ash's scenario, where the updating rows may interfere with subsequent lookups, there is another interesting twist possible. Who says the lookup to the dimension table needs to be to the same table (or even the same database connection on the same server) as the destination for inserted and updated rows? It would be quite possible to have the dimension lookup table be a different (cached?) table from the destination.&lt;/P&gt;
&lt;P&gt;Imagine a scenario where you have a huge dimension - say products for a supermarket. The only changes likely would be to current products - not to products which have already been expired. But the current products may be only a fairly subset of the dimension table, especially if it spans a long time and the dimension members change regularly: think of how often supermarket products change their packaging, weight, etc.&lt;/P&gt;
&lt;P&gt;So the dimension lookup could be to a subset of the dimension table holding only the current records. Add an index and this smaller table would be even more efficient.&lt;/P&gt;
&lt;P&gt;Meanwhile, downstream, the updates and inserts would be to the real dimension table.&lt;/P&gt;
&lt;P&gt;So, I would recommend anyone starting to build warehouse dimensions to try the Slowly Changing Dimension wizard, and for anyone with a bit more experience, try it for sure and investigate how to tweak and enhance the data flow it generates.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=8737" width="1" height="1"&gt;</description></item><item><title>SSIS plays Vegas</title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/03/06/8627.aspx</link><pubDate>Mon, 07 Mar 2005 00:02:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:8627</guid><dc:creator>donald farmer</dc:creator><slash:comments>5</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/8627.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=8627</wfw:commentRss><description>&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Last week Kamal Hathi and I were in Las Vegas for the TDWI (Data Warehousing Institute) conference. This is really the first general trade show at which SQL Server Integration Services has been on show. Microsoft had a booth at which I did a 30 minute demo every hour. The demo was simple enough but did show a lot of our features. Given the audience, I specifically concentrated on data quality features, which also demo very well. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;The reaction was great. We only had about 8 seats for attendees, but in fact every demo was packed and we attracted a large number of people who stood around watching. A lot of people came back again and again, which was fun. Even better, a lot of those coming back were actually from other vendors attending the conference. There was a good number of data integration products represented: Ab Initio, Informatica, SAS, Ascential, Information Builders and Oracle were all there. I think it's fair to say we got more attention than any of them. In fact, I believe we got more attention than any other product at the show. SAS had a conjuror doing card tricks - he was impressive. Naturally, I could not be upstaged, so worked up my own card trick. And it worked. Quite nicely too. The look on Bill Baker's face was priceless when the name of a card chosen by an attendee popped up in the SSIS debugger!&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;The reaction to the real features was great too - except for the worried looks from the other vendors! Actually, that was better than great! :-)&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;I had a good conversation with the staff on the Oracle booth. They are in a similar situation to us: previously their data integration tool, Warehouse Builder, was relatively underpowered compared to those from the pure-play vendors. Now they have a really competitive application which will probably become the default choice for Oracle shops. However, I do hear from their customers that even the new version is very difficult to use compared to SSIS. We hear of people re-engineering their Oracle Warehouse Builder apps into SQL Server Integration Services in a fraction of the time it took to build in OWB. So I am sure that SSIS will make inroads to Oracle shops too, much in the way that DTS did.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Nevertheless, the Oracle folks and I did find a lot in common. Not least the fact that we now&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;own much of the data integration space by default. Why would anyone consider a pure-play vendor before at least putting their database vendor's toolset through a thorough Proof of Concept? Of course, we will not win all of these - the pure play guys are still ahead in some areas - but we will win enough to make a huge difference. And of course as we move forward&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;with the new versions, features and capabilities, we will increasingly meet &lt;SPAN style="FONT-STYLE: italic"&gt;every &lt;/SPAN&gt;need of our customers.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;I also met with the guys on the Ab Initio booth. Ab Initio really costs a great deal of money, and services on top of that are frequently also very expensive. The result is that I hear of installations costing well over $500k a year to run. That's a huge sum, and customers are naturally anxious to move away from that cost base. Yet Ab Initio really do some very nice work at the high end of the market. It would be interesting to see if we can find some synergy there to enable our customers and theirs to get the best of both worlds. But I wonder if it would even be possible? They are quite a secretive company and did not exactly open up. In fact, they would not even run through a full demo with me. But they did drop by our booth (several times) and saw our demos.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;I am glad I concentrated my own demos on data quality as it was a big thing at this conference. There were a lot of vendors, and a lot of questions about quality issues. SSIS has some great data quality features built in, which gives us interesting possibilities with data quality vendors. Our data quality features enable a great toolkit approach which can be used to profile, score, cleanse and audit data for variations, duplications and exceptions against generic or specific business rules. The vendors have more specialized features - cleansing against databases of known addresses for example. In fact, they fulfill a lot of interesting needs: they can match addresses to postal routes for more efficient mail drops, or check databases for deceased customers, or capture errors such as a Street address part entered in the City box.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Putting these specialized features together with our generic toolkit has a lot of benefits for both sides. From our point of view, SQL Server Integration Services customers can benefit because they have access to a very full range of quality features. For the quality vendors, we lower the entry point for doing serious data quality work, with the result that more data warehouses and integration projects will start to concentrate on data quality - and a significant number of those customers will begin to look for the specialized features that the data quality vendors can provide.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;One interesting possibility is that other vendors will start to lower their prices, or provide entry-level versions to party with newly-empowered customers in our space. This will likely include data quality vendors of course, but also some of the pure play data integration guys. One way or another, SQL Server Integration Services is going to make a huge and, for users, very welcome difference to the data warehousing and data integration market.&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=8627" width="1" height="1"&gt;</description></item><item><title>Counting Rows in SQL Server Integration Services</title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/02/24/8106.aspx</link><pubDate>Fri, 25 Feb 2005 03:06:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:8106</guid><dc:creator>donald farmer</dc:creator><slash:comments>4</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/8106.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=8106</wfw:commentRss><description>&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;I am often asked if there is a way to capture row counts in SSIS. For example, a user may want to know how many rows passed along certain outputs of a conditional split in order to compare the ratio of say high- and low-value line items in a day's sales.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Integration Services provides a great way to do this, using the Row Count component. To use this component, first create a variable of integer type (the default, so that's easy) at a scope where you can see it from your Data Flow task. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Now add your Row Count component to the Data Flow at the point in the process where you would like to count rows. Edit the Row Count component and set its VariableName property to the name of the variable you created.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;When you execute the Data Flow, the number of rows which pass through the Row Count component are written to the named variable. However, it's important to note that the variable value does not change until the Data Flow has completed. This is the same for all SSIS package variables referenced in the Data Flow, even when using the Script component, the values are locked when execution of the Data Flow starts and they are only updated at the end. (VB.NET Variables within the Script Component &lt;SPAN style="FONT-STYLE: italic"&gt;can &lt;/SPAN&gt;be changed during the flow, but they cannot be used outside the script.)&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;So what can you do with this row count?&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;One use I like, is to populate a table which captures detailed package execution statistics. I create a number of variables, such as VarErrorRows, VarGoodRows. I use an ExecuteSQL task to write to a table with a statement such as: &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;INSERT INTO [My_Audit_Table]&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;( [Package name]&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;,[Machine name]&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;,[Username]&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;,[ErrorRows]&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;,[GoodRows]&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;,[Execution start time]&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in 0in 0in 0.5in; FONT-FAMILY: Verdana; mso-outline-level: 2"&gt;,[AcceptancePercent])&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;VALUES (?,?,?,?,?,?,?)&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;And I map the variables appropriately. Note that I can use system variables to map Package name, Machine name, User name and Execution start time.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Another interesting use is to audit a sample of the data in a process before loading the entire data set. For example, let's say I have an extract which is pretty expensive - an involved query, or pulling a flat file over a slow network connection from a remote log server. The data may be ok, but on the other hand it may have various quality issues that would prevent me from loading it to my warehouse. But it's expensive data to get to, so I want to audit its quality and load it if possible in a single operation.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;I work this scenario using two Data Flows - the first to extract and audit, and the second to load.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;In this case, I add a multicast immediately after the source adapter. One leg of the multicast goes straight to a raw file destination - so I capture the source to my integration server. Another leg of the multicast goes through a Row or Percent sample component to sample say 10% of the load. I immediately count the sampled rows into a variable, SampleSize. Next I apply whatever auditing logic I like - conditionally splitting out null keys, columns which have missing values, etc. At the end of my auditing I can add another Row Count component to capture GoodRows which passed the audit.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;The second Data Flow is my real business logic to load the warehouse, but this flow sources from the Raw File rather than the original source - no need to stress that slow connection again, or to run that query twice. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;The important thing is to add an expression to the precedence constraint between the two Data Flows. The second Data Flow only runs if the first succeeds and the expression GoodRows / SampleSize &amp;gt;= x evaluates True. X can be whatever value you like, or could even be another variable, configured from an XML file if you like. &lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;And of course you can capture all these metrics into an audit table just as before.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;However, the Row Count is not the answer to all row counting scenarios. One common question is how many rows were inserted to a destination? You may be tempted to count these with a RowCount component just before the destination component. This may work for a flat file, but not reliably for a database - because it would not take into account rows which failed to be inserted. So there are two patterns you could use for OLEDB destinations:&lt;/P&gt;
&lt;UL style="MARGIN-TOP: 0in; MARGIN-BOTTOM: 0in; MARGIN-LEFT: 1in; DIRECTION: ltr; unicode-bidi: embed" type=circle&gt;
&lt;LI style="MARGIN-TOP: 0px; FONT-SIZE: 10pt; MARGIN-BOTTOM: 0px; VERTICAL-ALIGN: middle; mso-outline-level: 2"&gt;&lt;SPAN style="FONT-FAMILY: Verdana"&gt;Use an ExecuteSQL task to count rows before and after the Data Flow has executed, and compare;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="MARGIN-TOP: 0px; FONT-SIZE: 10pt; MARGIN-BOTTOM: 0px; VERTICAL-ALIGN: middle; mso-outline-level: 2"&gt;&lt;SPAN style="FONT-FAMILY: Verdana"&gt;Add a RowCount component before the destination, and one on the error output of the destination, and compare the values of the two variables after the Data Flow has completed.&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;But let's not end on a negative. I love the Row Count component, so here's another neat use. The Row Count component can be used without an output. In other words, it can, in effect, be a destination. This is as cool as a very cool thing indeed, because it means you can run a Data Flow and debug it without the data going anywhere. Think of it - no more temp tables, or dummy text files, just to get your process working while you debug. Develop first using a Row Count destination and then, when you're happy with the process, hook up a real destination and you're ready to go - after some final testing, of course.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;I hope this gives some insight into this elegantly simple, often overlooked, but very valuable little component.&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=8106" width="1" height="1"&gt;</description></item><item><title>How fast is SSIS compared to DTS?</title><link>http://sqljunkies.com/WebLog/donald_farmer/archive/2005/02/20/7989.aspx</link><pubDate>Sun, 20 Feb 2005 15:51:00 GMT</pubDate><guid isPermaLink="false">d2584c15-f6ef-46f7-a2d4-24fc0e143e76:7989</guid><dc:creator>donald farmer</dc:creator><slash:comments>4</slash:comments><comments>http://sqljunkies.com/WebLog/donald_farmer/comments/7989.aspx</comments><wfw:commentRss>http://sqljunkies.com/WebLog/donald_farmer/commentrss.aspx?PostID=7989</wfw:commentRss><description>&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Long time no blog. No excuses, except that my workload has been particularly heavy recently. Anyway, enough of my problems. Let's get stuck in straight away with one of the questions I am asked most frequently, whether by DTS customers thinking of trying SSIS, or by Microsoft field staff spreading the news of SSIS, or by my VP in our performance shiproom meetings checking in on our progress. And the question is &amp;#8230; &lt;SPAN style="FONT-STYLE: italic"&gt;How fast is SSIS compared to DTS?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;The short, but very incomplete, answer is &lt;SPAN style="FONT-STYLE: italic"&gt;much faster&lt;/SPAN&gt;. The longer version would warm the cockles of Prof. Joad's heart (see below) &amp;#8230;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;DTS was actually pretty fast at data loading, especially into SQL Server - in fact the performance was largely determined by factors at the SQL Server end of the operation. But it could only transform data using ActiveX script which was relatively slow. Consequently the speed of DTS depended on how many ActiveX transformations you were performing, and how complex they were.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;SQL Server Integration Services typically transforms data using high performance native components which greatly outperform DTS transformations. For example, doing a simple character conversion like UPPERCASE can add about 5% to the execution time of a DTS package. With Integration Services the difference between a data flow with or without an character conversion is negligible: even with 5 conversions I have measured only 0.1% performance impact.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;So, the performance difference of DTS and Integration Services really does depend on what you are doing. For simple data loading the difference is small, being largely dependent on the efficiency of the data load into SQL Server. SSIS does have a special SQL Server destination that improves perf about 8% over bulk load on a 3Gb file, so even here we have an advantage. But who wants to promote a product with an 8% perf gain? That's a real &lt;EM&gt;we suck less than we used to &lt;/EM&gt;message.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;Throw some transforms into the pot and we get some more meaty figures. At the Unisys lab in Bellevue we ran a little experiment last year with their tireless Sanjay Sony and Henk van der Valk. Watching Henk in action is like watching a conductor in concert - he orchestrates a whole battery of performance tools like musical instruments, pulling up graphs and counters in the most dazzling manner. During this experiment we put together a simple but typical transformation job: some calculations on columns, and some data conversions, 7 transformations in total. Loading a 7.5M row, 1Gb text file, Integration Services (on beta 2) was 7.2x faster than DTS 2000.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;This figure has the advantage of having being tested under formal conditions, so I am very happy to quote it whenever I am asked. Customers have reported even better figures - in some cases seeing performance improve by 12x to 17x.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;So how much faster &lt;SPAN style="FONT-STYLE: italic"&gt;is&lt;/SPAN&gt; SQL Server Integration Services compared to DTS? Well, it all depends what you mean. But typically I say &lt;SPAN style="FONT-STYLE: italic"&gt;up to 7 times faster in typical cases&lt;/SPAN&gt;, and stress that &lt;SPAN style="FONT-STYLE: italic"&gt;your mileage may vary&lt;/SPAN&gt;. And you can quote me on that.&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 10pt; MARGIN: 0in; FONT-FAMILY: Verdana; mso-outline-level: 1"&gt;For those who don't know, Professor C.E.M. Joad was a philosopher and a regular panellist on the popular BBC Home Service radio programme, &lt;SPAN style="FONT-STYLE: italic"&gt;The Brains Trust&lt;/SPAN&gt;, in which various members of the great and good would answer pressing questions on issues of the day, or moral and philosophical topics, posed by the public. He was famous for starting almost every answer with &lt;SPAN style="FONT-STYLE: italic"&gt;Well, it all depends what you mean by &amp;#8230;&lt;/SPAN&gt; As a moral philosopher in more innocent times, his career was ruined by a successfully prosecution for failing to buy a train ticket for his journey. His reputation also suffered from attacks from Bertrand Russell and Winston Churchill, the latter attacking Joad's pre-war pacifism. As a student, I knew a little of Joad's work, being a myself a life-long pacifist, but I was in for a surprise. In a second hand book shop in Ashton Lane in Glasgow (now, need I say, a coffee-shop) I found a copy of his 1939 Penguin special &lt;SPAN style="FONT-STYLE: italic"&gt;Why War?&lt;/SPAN&gt; On sitting down to read it at home, a yellowing newspaper clipping fell out of the pages. It was an article from 1940 by Professor Joad entitled &lt;SPAN style="FONT-STYLE: italic"&gt;Why I changed my mind.&lt;/SPAN&gt; I have not changed mine, but I did once ride a train without buying a ticket. (Not in the USA, I quickly add, in case the INS are reading my blog.)&lt;/P&gt;&lt;img src="http://sqljunkies.com/WebLog/aggbug.aspx?PostID=7989" width="1" height="1"&gt;</description></item></channel></rss>