Enjoy Every Sandwich

Thoughts on SQL, XML, .NET and sometimes beer.

<November 2008>
SuMoTuWeThFrSa
2627282930311
2345678
9101112131415
16171819202122
23242526272829
30123456


Navigation

Tools

List O'Links

Kent's Other Stuff

Subscriptions

News

Please read these
Notices and Disclamiers

Post Categories

Article Categories



Thursday, June 02, 2005 - Posts

XML. Its (going to be) the new RTF. And PPT.

As an Database geek, I've faced one problem over and over and over again: How do I store Office Suite documents in a database in a useable way. And by usable, I mean in such a way that I can easily query their contents and reshape them into smaller chunks of knowledge. One of my favorite examples of this is importing Word documents representing some knowledge capture and needing to extract out their abstracts. The idea is the abstract should give users enough information about the document to decide if reading the whole document is likely to help them solve the problem at hand. Of course, there's a number of ways of doing this, but none of them really strike me as very user friendly. For example, suppose you use some ASP.NET website to allow users to upload documents and search them. You could, for example, require the user to enter (or more likely, copy and paste) the abstract into a textbox and save that as field in some database field. Then you could use either matching or full-text searches on that column (or even the whole text of a document, if you like) and render a list of results.

As the user of such a system, though, I'd prefer to just submit the document and let some logic find the abstract and do what it needs to with it. In other words, make me do as little as possible!

With the combination of Office 2003 and SQL Server 2005, I've cobbled together workable solution. In the Word document template, I have a style called "abstract" that users can apply to a section of document. Users can then save their inherited documents as XML documents, which they in turn upload. The upload is (of course) to a SQL Server 2005 instance. The document itself is saved to an XML typed column. That allows me to use XQuery on those documents to pull out the abstract and use that as I like. I demonstrated some of this at Code Camp II in Boston last year.

Pretty cool, right? Kind of. There's a problem, though.

More correctly, its that Word 2003 wants to save, by default, to "Word Format" (a newer flavor of RTF, in a sense) instead of XML, and sometimes even the best users forget that they need to save in XML instead of Word Native format or even older standard RTF format. Naturally, my uploading page barks that the uploaded file isn't XML and asks them go re-save. Fine, other than it irritates the user, duplicates work and so on. But I don't have a good way around that today.

But with Office.Next, it appears I will. Why? John Durant points us at the announcement of XML as the new native format for Word, Excel and FINALLY PowerPoint.

John said: "It's a watershed moment for Office programmability." True, but this goes way beyond programmability, I think. Its potentially just as much as a watershed for Knowledge Management Systems. Sure, we've had this ability with OpenOffice for a while, but I, for one, and glad to Microsoft -- which really owns the IW productivity suite space -- make this move.

Looks like I'll be shuffling my TechEd 2005 schedule around a bit...

posted Thursday, June 02, 2005 6:54 AM by ktegels




Powered by Dot Net Junkies, by Telligent Systems