Oracle and SQL Server Integration Services
I have had some interesting conversations with SQL Server customers, partners and our field about SSIS and Oracle in the last couple of weeks. Some were of a tone which prudence and good taste suggest is better left unreported, but the others, if less juicy, were certainly worth discussing here. Typically they revolve around some combination of three key technical issues and three key scenarios: migration, integration surrounding Oracle and loading to Oracle.
Migration. I'm glad to say that Oracle to SQL Server migration is the most common scenario. Let's not be churlish here. Oracle is a damn fine database, but we come across a lot of customers who cannot wait to get off it and onto the SQL Server platform - which significant numbers find to be faster, more cost efficient, and better supported. Sure, it happens the other way round sometimes, but in general, Oracle to SQL Server migration is a common requirement and one which folks often come to us for help with - and naturally, we are delighted to help. Their key technical requirement is to get data out of Oracle and into SQL Server quickly and accurately. There are other requirements, such as accurately porting PL/SQL or Java code, and Microsoft has some great experience in that area, but strictly speaking these do not involve SSIS except on the periphery.
SQL Server surrounding Oracle. Another common scenario is where a customer already has a well-established Oracle system for their transactional or line of business system, or even an entrenched data warehouse. In such cases they may be wary of disruption and either permanently or temporarily keep the Oracle system running. However, requirements are rarely static in the world of IT and these companies find they want things that we are good at - perhaps to use Analysis Services for reporting and data mining, or to use SQL Server to build their data warehouse entirely, because of our great support for Business Intelligence and our ease of management. There are a ton of companies out there who are entirely Oracle shops, except that they have Analysis Services, or Reporting Services, too. It's always fun (and it happens quite often) in our Executive Briefings with customers to see even CIOs confidently say that all their databases are Oracle - only to have someone from IT, or their MS account rep say "Well, actually … we have SQL Server Analysis Services departmentally, and all the marts are running on SQL Server too." In this scenario, again, the issue is getting data out of Oracle quickly and efficiently and into the data mart or warehouse.
SQL Server to Oracle. What can I say? S*!& happens. Seriously, however, we do come across this scenario in positive ways. For example, a customer may have an established Oracle data warehouse, but they are sensibly moving their transactional or line of business systems to SQL Server. Perhaps they migrate the transactional system first, leaving the Oracle warehouse intact. Or perhaps they have an Oracle warehouse and implement a new line of business system in SQL Server and integrate the two. In this scenario they would like to be able to load data into Oracle as quickly and efficiently as possible. Does SSIS support them? I'll reveal the secret two-letter answer later ...
We come across three technical issues with these scenarios: bulk extracting from Oracle, querying Oracle and loading into Oracle. Let's now consider these.
Bulk extracting from Oracle. This is pretty straightforward. In most cases you can use the .NET provider or the Microsoft OLEDB provider for Oracle. Oracle do have their own providers, but we find that most users are happier with the performance of the Microsoft versions. We hear from customers that performance with these providers is generally pretty good: we have heard from "acceptable" to "incredible." I guess it depends on specific needs and expectations. For migration scenarios it is also possible to bulk export from Oracle and then load from the generated flat files into SQL Server. That is efficient, even though it incurs an additional cost of writing to and reading from disk in between. But most users find the .NET or OLEDB providers meet their needs.
Another huge strength of SQL Server here is the use of Replication. The ability to synchronize Oracle and SQL Server really rocks and the replication team in SQL Server has done amazing things to make this easy to do. If replicating to a staging table, SSIS can take up the story from there and add additional value.
Querying Oracle. Compared to bulk extracting, querying imposes some additional demands such as the need to parameterize queries and often the need to handle Unicode and type conversion problems. Here SSIS really appears to have advantages for users over other applications: at least our users report that we do. Parameterizing at first is a little tricky, because the OLEDB providers do not support parameterized queries. You can set parameters and parse the query, because the OLEDB UI is fairly generic, but the source fails when run against the server. This also happens with DB2. Fear not, gentle reader, for help is at hand in the dashingly handsome form of Property Expressions. The SQL statement you use to query the Oracle source can be set by a variable, and that variable can be composed using a property expression such as: "select * from Orders where OrderID > " + @LastOrderID. It's a parameterized query - and a very flexible one - in all but name.
Other requirements, such as Unicode handling, may be exposed through the provider. Alternatively, such issues can be handled explicitly in the SSIS data flow using derived columns or data conversions to conform Unicode / ANSI, code pages and data types.
Talking about things exposed through the provider, have a look at the .NET Oracle provider in the SSIS Import Export wizard. Due to some limitations in the way that .NET provider user interfaces are factored, we have to rely on a generic property bag to configure these in the wizard - because it has to live outside Visual Studio and the UIs are provided by Visual Studio, if you must know. So we just expose the raw properties. One of them is a boolean with the delightful name: Workaround Oracle bug 914652. There's nothing like washing dirty laundry in public! In this case it's an Oracle skidmark, not ours.
And so to our third technical issue. Loading into Oracle. Do we support fast loading into Oracle? The obvious answer is no, and the obvious tone with which express this negative is one of surprise, as if someone had requested a Gideon Bible in a North Korean hotel room. But there's more to it than meets the eye and the question raises some important issues about the positioning of SSIS in the market.
The first point to consider is why the request is made in the first place. Truth is, most customers report that Oracle's OLEDB and .NET providers really suck at loading data. Oracle just have not designed high performance interfaces - they prefer their own proprietary interfaces, for good proprietary reasons no doubt. So customers in my third scenario above often need to find a way to get data into the Oracle database. Yet here in SSIS, we have no passion to provide fast loading which would sustain Oracle as a database platform -the Microsoft Oracle providers are certainly not optimized for that. The best strategy from our point of view is for users to move away from Oracle completely and, indeed, we know of several cases where customers have ramped up their Oracle to SQL Server migration to avoid these cohabitation problems. If we provided an Oracle fast load solution out of the box, perhaps this results in a net loss for us in migration scenarios, because we extend the lifetime of the Oracle database, which frankly is a disservice to our customers. Besides, it really is Oracle's job to provide better interfaces.
As I said, Oracle do provide fast loading through proprietary means and it is possible to build SSIS packages which write to text files and then invoke the Oracle fast loader with an ExecuteProcess task. Some customers don't like the workaround, but some love it. In practice, it turns out to be not so bad as might be thought, because landing to disk before bulk loading provides a check point of the processed data which enables restartability of the fast load process. This can be very practical when enrolling the entire load in a single transaction for performance.
Some users take the view that SSIS cannot be a real ETL tool if we do not enable fast loading into other databases. They see ETL and data integration as being by their nature generic processes which should enable even-handed integration throughout the enterprise. I think there are two principal responses to this.
First of all, SSIS is fundamentally a data integration platform, not just an application. That means that we have the technical hooks and the strategic drive to enable a wide range of third parties to extend our platform, from individual developers to well-established ISVs. We are actively encouraging an integration ecosystem which provides end users with a range of integration technologies and features which no one application or company, not even Microsoft, could provide. This will be a cornerstone of our success. How does this relate to Oracle integration? Well already I know of a number of individuals and ISVs who are working on this very problem - writing wrappers around Oracle's bulk loaders to make them easy to use within SSIS, or writing deeper integration of Oracle's bulk interfaces with the SSIS data flow. I do not expect that Microsoft will write an Oracle fast loader - currently it comes in around number 999 in my list of 1000 features for next version, just slightly ahead of recompiling for Linux. But on the other hand I do fully expect that there will be excellent Oracle loading capabilities from third parties, who may well live long and prosper in that market.
Secondofly (thank you, Arrested Development, for this neologism) I think it is important to stress the advantages of SQL Server as the complete data platform. One of the key issues in data integration is the difficulty of pulling together all these disparate sources and applications that grow like Topsy in enterprises. SQL Server provides a solution for this, not by making Integration Services a kind of duct tape to hold it all together, but more fundamentally by enabling inherent integration through SQL Server's data platform capabilities. It's no surprise to me that Informatica's metadata product is called Superglue. My bottle of Superglue in my toolbox says it is for repairs throughout the household and shop. I believe we do better by our customers by enabling them to build enterprise architectures which are not broken in the first place. Good integrated engineering is better than glue.
Microsoft and Oracle both take this approach. (It's only in their most recent version that Oracle's Warehouse Builder enabled loading non-Oracle sources at all, while DTS has enabled generic integration for years.) The pure play Data Integration vendors will no doubt continue to address platforms in a more even-handed way. In many ways that will be their strength going forward. But such even-handedness comes at a price, quite literally, as witnessed by the expense of licenses and services for pure play vendors' integration tools.
I'm looking forward to many happy days with SSIS, moving customers from Oracle to SQL Server. For those who wish to load Oracle quickly and efficiently, I will be glad to direct them to partners who can provide those capabilities within our architecture. I will also be comfortable in the knowledge that not fast loading to Oracle is a relatively minor weakness, compared to the enormous strength we gain from having such partners, and the ease with which they can extend our capabilities and enhance SQL Server as the complete data platform for our customers.