Donald Farmer on SQL Server Integration Services

pamboli's naive perambulations in the world of data integration, ETL, metadata and data warehousing

<January 2009>
SuMoTuWeThFrSa
28293031123
45678910
11121314151617
18192021222324
25262728293031
1234567


Navigation

SSIS Links

Subscriptions

Post Categories



General (RSS)

General
Integration services training, the towering inferno, some television, and more ...

Last week I was in Tampa, training some Ascend partners for three days. The structure of these courses is very free form, and generally I get as much out of them as the trainees! For the customers I guess they get the advantage of very hands-on work with a product team member. The loose structure means that we can drill into their individual issues quite easily. Over three days we covered getting started with SSIS, data quality, deployment, auditing and compliance (there were a number of financial customers), warehouse loading, performance and scripting. That's a lot to cover!

It continually surprises me that we can cover so much in three days. It says a huge amount about the usability and effectiveness of the application that this is even possible.

One of our great advantages going forward will be the broad user base, and the widely available skills. Think how many SSIS users there will be in a years time. In two years, think how many SSIS users will have quite deep experience of the application! At that point, we really become very established at the top end of the integration market, simply because of the body of skills which will be there to support the user community. Of course, that leaves us in the product team with a huge responsibility to support and encourage these users. And of course, to ensure that the product meets their needs and expectations - so far, so good on that front!

Well, it was not quite the towering inferno, but we did have a small fire my hotel in Tampa, at about 1am on my first night there. Not what I needed to catch up on sleep. At least it was in Florida and not chilly Seattle. Of course, when the alarm goes off you think it is only a drill. So I dressed properly, made sure I had my wallet and cellphone and headed for the door. When I looked out, the corridor was smoky! “Oh bugger!” does not quite capture my exact feelings,  but it's pretty close. We spent about 4 hours on the grass, while Tampa's finest did their bit. Hilariously, the fire investigators turned up in a Hummer at 2 o'clock in the morning, and still had their sunglasses on! Eventually we got back, but the elevators were out, so I had climb up fourteen floors to get back to my room. Still I got a full two hours sleep before the training course.

While I was away in Tampa, MSDN released my first MSDN TV broadcast. It covers how to build and debug your first SSIS package. Simple stuff, but the kind of thing that new users seem to need. You can see it here. http://msdn.microsoft.com/msdntv/episode.aspx?xml=episodes/en/20050505SQLServerDF/manifest.xml . Let me know what you think. There's another one coming on data cleansing which should be fun. And Kamal has done some too, which should be out soon.

A couple of people have asked me about the picture in the background of the TV introductions. It is a view of Victoria Terrace and Victoria Street in Edinburgh, from just outside the little garden of the church of St Columba by the Castle. The frame is an old wooden window, complete with cobwebs and glass, that we found in an alley in Duvall, WA. When I first moved to MS, I did not have that most prized of locations - a window office. So this was my window for 3 and a half years - and very nice it was too. 

Finally, on a personal note, I noticed Steve Ballmer sent out an excellent email about and public statement about Microsoft's approach to public policy initiatives, especially around the recently controversial equal rights bill in Washington State. There has been a lot about this in the news. One of the great strengths of Microsoft, both in the business and technical spheres, has been the ability to recognize mistakes or wrong directions, and to change course with confidence, integrity and considerable skill. These qualities have served us well as a company. This decision by Steve and the leadership is in that spirit, I think. I felt quite proud to see them make this change. Let's hope that in a new legislative season MS can play an active role in ensuring equality in Washington.

posted Tuesday, May 10, 2005 6:58 AM by donald farmer with 0 Comments

SQL Summit in Minneapolis

Kent has already posted about the SQL Server Summit in Minneapolis: it was a great event, so I hope you don't mind me giving my own account.

 

My first trip to Minneapolis proper (previously I had only ever seen the airport) looked like it was going to get off to a good start. My flight arrived somewhat late, and it was already late evening, but the SuperShuttle staff were friendly and appeared, (deceptively) to be efficient.   The driver was listening to some Ethiopian religious music, so we got into a conversation about the wonderful Aster Aweke, one of my favourite musicians. However, from there on things went downhill rapidly. A few more people got on the shuttle and I don't know if it was the driver or the dispatcher who made the call, but we delivered them to the suburbs before dropping me off downtown. It's somewhat disconcerting to be on a downtown hotel shuttle driving through what looked like countryside in the middle of the night. I thought I was being kidnapped, but at least I now know where Minnetonka is.

 

Finally, after over an hour-and-a-half on the shuttle I got to the hotel, at 11.45. I was hungry, wanted a shower, and still had to run through my demo for the keynote in the morning. But room service in the Hyatt stops at midnight, and, bizarrely, all the hot water in the hotel was being shut off at midnight too for some repairs. So I had a lukewarm bath, worked on my demos until about 2am, and set my alarm for 6. I didn't even get to meet up with Kent for a pint. I shouldn't complain - in the morning I discovered that Paul Flessner had had an even worse trip - a cancelled flight meant he had to drive from Bloomington to Chicago in a massive hailstorm, which reduced Ilya Bukshteyn from marketing to a nervous wreck.

 

But the summit was well worth it. The local team are clearly doing an impressive job of building an enthusiastic community of SQL Users. The keynote room held about 1300 people and was packed. My sessions (on SSIS, and Analysis Services) were in rooms for 300 and were standing room only. The sponsors and exhibitors were delighted and all the speakers got good audiences who were knowledegable and appreciative.

 

Paul's keynote was a fine start to the day. There's a real candour and honesty throughout the SQL Server team about our product and our processes, and it really works to build a community of users who feel they can commit to us. Paul was able to pitch his keynote in such a way that everyone in the audience, from the VP of Unisys sitting beside me, to the overburdened DBA just behind be, recognized their own issues in what he was saying. You find this a lot in the SQL Server world - Dave Campbell and Bill Baker are experts at it, too - that the business vision is driven intricately with a detailed technical grasp of what is needed to fulfill that vision on the ground.

 

I think some SQL Server enthusiasts do not realize how rare this is. But try attending an IBM or Oracle conference, or Siebel or SAP, and you'll find that there are really two distinctive approaches there: a glossy, high level, business-driven marketing message, and a technical practical message. In the SQL Server world, it is very different: you come away with the feeling that Paul or Bill or Dave could come down from the podium and help you tune that query, or optimize that dimension hierarchy, just as readily as they could map out your enterprise's business relationship with Microsoft. And it's not just about technical knowledge - which could be merely academic - it's also about having a sense of what is needed to drive technical decisions to achieve really practical goals. This mix of vision and practicality is what attracted me to the SQL Server team in the first place, and it's good to see we still have it in bucketloads.

 

My own demo in the keynote went pretty well - the idea was to start from scratch and in only five minutes build an SSIS package, a mining model and a report to analyze some customer information from Excel into a hot leads report for a sales team.  Despite no rehearsal at all, Paul and I got the stagecraft right, I think, and I was able to complete the demo in only 4 1/2 minutes, which surprised even me. And the audience got it too, I think, seeing how BI can actually add real value to raw data easily and efficiently. Hopefully it also underscored the huge value to developers of having all these tools together in a single environment.

 

So from an inauspicious start, my visit to Minneapolis ended up being really worthwhile. I met a ton of good customers (I have about 30 detailed issues to follow up on), had good partnership conversations with the likes of Unisys, Sogeti and Panorama which will all lead to good things, met up with old friends (hi Mike from Appsmart!), had awesome audiences at my sessions, and was able to meet up with the local team who did such a great job of organizing the conference. The team also took me out to a pub in Minneapolis where I was able to have a nostalgic pint of McEwans, and my shuttle driver back to the airport turned out to be a real football fan (soccer for the yanks) so we chatted away merrily about Scotland's World Cup chances. (How long before we change our national motto from Nemo me impune lacessit to There is still a mathematical possibility we could qualify?)

 

If every one-day summit was like this, I could do 365 a year.

posted Sunday, April 03, 2005 9:00 AM by donald farmer with 2 Comments

Staging Areas and Transfer Areas for your Data Warehouse

Staging Areas are a common requirement in a Data Warehouse architecture, but they raise some interesting issues for both the warehouse administrator and the data integration designer, whether working with SQL Server Integration Services or other integration and ETL tools.

 

Recently Kirk and Ashvini and I had a little mail thread about this with Allan Mitchell, one of our tireless MVPs (see www.sqlis.com for more of Allan's amazing support for SSIS.) As ever, Kirk and Ash were on the money technically, and I just waffled on about the history and philosophy of staging, but readers of SQLJunkies blogs are used to that by now. (As an aside, when are we going to rename this site IntegrationServicesJunkies? A much more accurate reflection of our influence.)

 

But to return to our sheep. I thought I would at least share some thoughts on staging areas and transfer areas, and why I increasingly emphasize the difference between them ...

 

What is a staging area? The best, and most amusing, description of this architectural feature that I know is naturally from Ralph Kimball, which you can and should read about here: http://www.intelligententerprise.com/040101/701warehouse1_1.jhtml Ralph uses the metaphor of a kitchen, and very persuasive it is too.

 

The data warehouse's staging area is very similar to the restaurant's kitchen. The staging area is where source data is magically transformed into meaningful, presentable information. The staging area must be laid out and architected long before any data is extracted from the source. Like the kitchen, the staging area is designed to ensure throughput. It must transform raw source data into the target model efficiently, minimizing unnecessary movement if possible.

 

I wonder about metaphorical determinism, because I typically use a different image to describe staging areas, and come to somewhat different, although, I think, complementary conclusions. I like to use an image from the history of railways for this problem and I make the distinction between transfer and staging areas.

 

In the early days of railways there were two significant gauges or widths of track: Stephenson's Standard used by most British rail companies, and Brunel's Broad gauge used by the Great Western. There were variations of this discontinuity in most countries: Oregon was broad, but Montana was standard, the southern states largely broad before the Civil War; Gujarat railways were standard, the rest of India unusually broad.

 

In order to move goods across a country you needed transfer stations where goods could be moved from one system to another That's why Reading in the UK became an important rail centre: it was the meeting place for different standards where goods could be transferred between systems. Or one required transfer technologies, such as the Ramsey Transfer at Pocatello, Oregon, which was able to lift a carriage up from one track, replace its rolling gear underneath, and replace the carriage on another track of a different gauge.

 

Data integration and data warehousing face similar problems. Different technologies such as flat files, or web services, LOB systems or different vendor databases have differing storage requirements, which is a pretty obvious need for a transfer technology. But there are other issues in play too.

 

For example, in a data warehouse you may well have a specialized tool for, say, high performance sorting. This may be able to read and write data to and from text files or to and from streams. In addition you may have a specialized tool for data quality, such as postal address standardization, which may also read and write text files and streams.

 

Streaming between these tools would be highly efficient, but may not be possible. Perhaps the expected formats in the stream are incompatible, or there may be no way to schedule the writing from one application with the reading from another or to address the streams in a compatible manner. But they can both read and write flat files: so we end up with a transfer area where data is landed to and read from disk in flat file format just to enable compatibility between two technologies.

 

So, if we can enable a technology which integrates these applications seamlessly (let's call it SQL Server Integration Services, just for now) we would not need staging areas, yes? Well not quite, and this is where the difference between transfer and staging becomes important ...

 

Even after railways standardized gauges, there was still a great need for staging warehouses for goods. Why? Because staging is not just about the need to physically transfer goods - or data - but because at times there is a procedural need to have it stay put for a while. In the industrial world it may be more economic to move timber in larger quantities than the sawmill can use at any one time, because it is economic to fell it all at once at a particular time of year. Or it may be procedural, like having to wait for customs to examine the goods.

 

In data warehousing, it may be necessary to extract your data from the mainframe between 5pm-8pm, but the data warehouse is not ready to receive it until 10pm. Or it may be necessary to do a thorough data quality audit and have that signed off by a DQ analyst as a human workflow step.

 

If you must have transfer areas for technological reasons, and you prefer to have staging areas for procedural reasons then it does make sense to conform them: to make your transfer area a staging area. But just having a transfer area for its own sake, where it adds no value and introduces additional complexities to the management of the integration process - this is a waste of time.

 

What additional complexities does a transfer area add? Well consider the logic needed to move data between the sorting component and the address cleansing component. It's not just a matter of landing the data to disk and reading it again. What if the writing operation fails? What if the reading operation fails to start? Does the transfer area need backed up? What RAID level is appropriate for it? Where should it be hosted? Who should have access to it? How do you log and audit the sorting operation and the data quality operation - do they have different log files? And so on.

 

What SSIS provides - almost uniquely - is the ability to easily do away with transfer areas, by integrating many different technologies into a single, auditable and manageable process with integration of all aspects including scheduling, logging, error handling and recovery. Even if we do not have the stock components to handle a technology, other technologies can still easily be added to the flow by customers or vendors, which is why we are getting excited partners queuing up to integrate with us. The high performance sorting and data quality technologies I mentioned earlier are examples of this. Once a technology lives in the SSIS control flow or data flow, it can participate as fully in our rich management and integration story as any of our own components.

 

So is there still a need for staging areas? Yes, but now it can be solely a procedural choice. To flip back to Ralph Kimball's metaphor, the staging area is now just a kitchen where the master chef can prepare the data for consumption - the work surfaces are clean and uncluttered, and the food storage is are efficiently to hand, but hygienically separated.

 

Interestingly, we can now turn the question around. If you have a staging area for procedural reasons, is there an advantage to using it also as a transfer area? Perhaps in some cases there is, such as taking advantage of that procedural stage to partition the flow into multiple raw files which can the be used in parallel for better performance in the next step. In the same way, a sawmill may grade choose to have its timber delivered ungraded in bulk then grade it themselves in storage before processing. Use staging areas to handle schedule incompatibilities between processes, to provide an auditing step, and to add value to data before delivering it to the warehouse.

 

I believe that only SSIS really provides this complete flexibility in the data integration world. Some of you will have seen presentations in which I suggest that SSIS removes some of the need for staging areas. I'll be sharpening up my language in future and keeping this distinction between staging and transfer areas much clearer.

posted Sunday, March 27, 2005 8:54 AM by donald farmer with